We solve Yandex.Interview tasks in a functional style

A few months ago, an article appeared in the blog of Yandex that discussed the passage of the algorithmic section of the interview. Among other things, this article referred to a special context containing tasks similar to those offered by Yandex to its candidates.







Having registered in the system, my attention was immediately attracted by the ability to solve problems on Haskell. The fact is that although I am fond of programming in this language, I have not progressed further than the implementation of tasks from various courses of educational on-line platforms. Having decided that their solution can be an interesting challenge and will increase my level as a developer, I proceeded to solve them.







Who cares what eventually came of it, welcome to cat.







A. Stones and jewelry



Two lines of lowercase Latin characters are given: string J and string S. The characters included in string J are “jewels” and included in string S are “stones”. It is necessary to determine how many characters from S are simultaneously “jewels”. Simply put, you need to check how many characters from S are in J.

The first task is a warm-up, we will solve it “on the forehead”. We define the function jeweleryCount :: String -> String -> Int , which, using the convolution of the list passed by the second argument, sums up all cases of the processed item in the first list. For these purposes, we define the elemInt function based on the elem function, which, unlike the latter, will return not True or False, but the number 0 or 1. In the main function, you only need to read two lines, pass them to the corresponding function and print the result. The verdict of the testing system is OK, we pass to the second task.







jeweleryCount :: String -> String -> Int jeweleryCount j = foldr ((+).(elemInt j)) 0 where elemInt sx = fromEnum $ elem xs main :: IO () main = do j <- getLine s <- getLine print $ jeweleryCount js
      
      





The source code for solving this and other tasks is also available in the github repository.







B. Consecutive units



It is required to find the longest sequence of units in the binary vector and print its length.

To solve this problem, we implement a recursive function that will go through the transferred list and calculate the length of the required sequence. With the arguments of the function, in addition to the list itself, we will pass the current maximum length and the number of consecutive units on the current call. First, we define the recursion base on the empty list, and then the recursion step itself.







To read the input, we define the function getUserInputs :: IO [Char] , in which we first read the number n - the size of the list, and then use the combinator replicateM to get a function that will call the function <<get> getLine n times and merge the results into a list .







 import Control.Monad (replicateM) onesCount :: [Char] -> Int onesCount xs = onesCount' xs 0 0 where onesCount' "" max curr | max > curr = max | otherwise = curr onesCount' (x:xs) max curr | x == '1' = onesCount' xs max $ curr + 1 | curr > max = onesCount' xs curr 0 | otherwise = onesCount' xs max 0 getUserInputs :: IO [Char] getUserInputs = do n <- read <$> getLine :: IO Int replicateM n $ head <$> getLine main :: IO () main = do xs <- getUserInputs print $ onesCount xs
      
      





We send the decision, the verdict is OK. We move on.







C. Duplicate Removal



An array of 32-bit integers ordered in non-decreasing order is given. It is required to remove all repetitions from it.

Let's start with a simple implementation. We define an initial function that reads a number, prints it, and returns it wrapped in the IO monad. We also define the deleteDoubles :: Int -> Int -> IO () function, which reads a number and prints it only if it is not equal to the second argument (we will pass the number read in the previous step there). After that, the function recursively calls itself and thus proceeds to the next number in the input stream. The recursion base is the number of numbers to be read, we will pass it the first argument.







 import Control.Monad initial :: IO Int initial = do a <- read <$> getLine print a return a deleteDoubles :: Int -> Int -> IO() deleteDoubles 0 _ = return () deleteDoubles ta = do b <- read <$> getLine unless (a == b) $ print b deleteDoubles (t-1) b main :: IO () main = do t <- read <$> getLine unless (t < 1) $ initial >>= deleteDoubles (t-1)
      
      





We send the solution, it passes all the tests, and it would seem that we can move on to the next task, but in my opinion the recursive call of the function working in the IO monad is more confusing than concise. Let's try to improve it.







Note that, generally speaking, you can first read the entire list of numbers (we will use the replicateM combinator already familiar with the second task), then pass it to a pure function that filters out all the repetitions, and finally print the result.







 import Control.Monad deleteDoubles' _ [] = [] deleteDoubles' prev (x:xs) | prev /= x = x:(deleteDoubles' x xs) | otherwise = deleteDoubles' x xs deleteDoubles (x:xs) = x:deleteDoubles' x xs getUserInputs :: Int -> IO [Int] getUserInputs t = replicateM t $ read <$> getLine main :: IO () main = do t <- read <$> getLine unless (t < 1) $ (deleteDoubles <$> getUserInputs t) >>= mapM_ print
      
      





I am sending a solution, and the first disappointment is that the program does not pass the 193 test due to exceeding the limit of used memory. The main mistake is to read the entire list into memory entirely. We will try to avoid this and will implement a certain hybrid of the first and second versions.







Note that the task of removing duplicates is somewhat reminiscent of a left-associative convolution: at each step we calculate a function that, depending on the current item read and some of its result, at the previous step decides to print, and then proceeds to the next pair of values.







A function that prints or does not print the result depending on its arguments, after which it returns its second argument, wrapped in the IO monad, is quite simple, let's call it step:







 step :: Int -> Int -> IO Int step fst snd = unless (fst == snd) (print snd) >> return snd
      
      





We figured out whether or not printing, depending on the values ​​passed, but how to organize the reading? To do this, we use the monadic convolution function foldM :: (Foldable t, Monad m) => (b -> a -> mb) -> b -> ta -> mb , which is applicable to the list of reading functions.

By the type of function foldM, we note that at each step the “unpacking” of the result of the previous application of the function occurs under the hood of foldM itself. Thus, at each step we only need to start a monadic calculation of the current list item (in fact, read the next number) using the bind operator ( >> = ) and pass it along with the previous number to step. As a result, we get the following program







 step :: Int -> Int -> IO Int step fst snd = unless (fst == snd) (print snd) >> return snd initial :: IO Int initial = do a <- read <$> getLine print a return a getUserInputs t = replicate t $ read <$> getLine main :: IO () main = do t <- read <$> getLine unless (t < 1) $ do init <- initial foldM_ ((=<<) . step) init $ getUserInputs (t-1)
      
      





D. Generation of bracket sequences



Given an integer n. It is required to derive all the correct bracket sequences of length 2 ⋅ n, ordered lexicographically (see https://ru.wikipedia.org/wiki/Lexographic_order ).

Only parentheses are used in the task.

It is advisable to obtain a solution that works in a time proportional to the total number of correct bracket sequences in the response, and at the same time uses a memory capacity proportional to n.

This task, like many others, in which it is necessary to derive sequences satisfying certain conditions (for example, the task of exchanging coins, arranging eight queens and others, can be read in more detail here ), is succinctly solved using the list monad. In short, this approach is based on monadic linking for lists, the meaning of which is to join together a set of operations performed on each element of the list.







Define the recursive function generate ':: Int -> Int -> [[Char]] , which takes the number of brackets to be put as the second argument, and the number of open parentheses that are already set to be set as the first argument. For the recursion step, we need two auxiliary functions: possible - returns a list of brackets that can be placed in the next step, and step - makes a recursive call to the generate 'function with the necessary parameters.







 import Control.Monad(mapM_) generate :: Int -> [String] generate = generate' 0 where generate' _ 0 = [[]] generate' an = [x:xs | x <- possible, xs <- step x] where step '(' = generate' (a + 1) (n - 1) step ')' = generate' (a - 1) (n - 1) possible | n == a = ")" | a == 0 = "(" | otherwise = "()" main :: IO () main = do n <- read <$> getLine let result = generate $ n * 2 mapM_ putStrLn result
      
      





We send the solution, and we understand that we did not take into account the restriction that was imposed on the amount of memory used by the program - the solution does not pass the 14th test due to exceeding the limit of used memory.







We modify the generate 'function so that instead of constructing the entire list of correct bracket sequences, it immediately displays them. To do this, we will have to add the third argument to the function - a fragment of the sequence constructed for the current step. I note that in this implementation we will construct the sequence in the reverse order - this will allow us to use the list constructor ( :) instead of the more expensive concatenation operator ( ++ ).







 import Control.Monad(mapM_) generate :: Int -> IO() generate = generate' "" 0 where generate' xs _ 0 = putStrLn $ reverse xs generate' xs an | n == a = step ')' | a == 0 = step '(' | otherwise = step '(' >> step ')' where step '(' = generate' ('(':xs) (a + 1) (n - 1) step ')' = generate' (')':xs) (a - 1) (n - 1) main :: IO () main = do n <- read <$> getLine generate $ n * 2
      
      





E. Anagrams



Two lines are given, consisting of lowercase Latin letters. It is required to determine whether these lines are anagrams, i.e., do they differ only in the sequence of characters.

To solve this problem, we will count how many times a letter occurs in each row and compare the results. We immediately understand that standard lists are not suitable for us, and it is necessary to use a data structure that would allow us to effectively access the element by its index. There are several types of data that would satisfy our conditions, we will use the standard immutable array Data.Array (there are still at least various mutable arrays, as well as Data.Vector ).







To construct the necessary arrays, we use the function hist :: (Ix a, Num b) => (a, a) -> [a] -> Array ab , which, according to the transferred list of elements and the range to which these elements should belong, forms an array, which stores the number of repetitions of elements from the list. This function, although not included in the Data.Array module, is often given as an example of using another, already library function, accumArray. We just have to copy its implementation and write main - the benefit of the comparison for equality for Array Char Int is already defined. I draw your attention to one nice feature - as an index we can use not only integers, but any representative of class Ix . In our case, Char is a natural fit for this role.







 import Data.Array hist :: (Ix a, Num b) => (a,a) -> [a] -> Array ab hist bnds is = accumArray (+) 0 bnds [(i, 1) | i<-is, inRange bnds i] main = do arr1 <- hist ('a','z') <$> getLine arr2 <- hist ('a','z') <$> getLine if (arr1 == arr2) then print 1 else print 0
      
      





F. Merge k sorted lists



Given are arrays of non-negative integers sorted in non-decreasing order, each of which does not exceed 100. It is required to construct the result of their merger: an array sorted in non-decreasing order containing all elements of the original k arrays.

The length of each array does not exceed 10 ⋅ k.

Try to make the solution work for the time k ⋅ log (k) ⋅ n, if we assume that the input arrays are of length n.

Merging two sorted lists is a classic list task and is covered in many Haskell programming courses. For example, it can be solved as follows.







 merge :: [Int] -> [Int] -> [Int] merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) | x < y = x:merge xs (y:ys) | otherwise = y:merge (x:xs) ys
      
      





Well, we can merge two lists. But what should we do with the list of lists? Convolve it with this function! Thus, we will combine all the lists into one, and we will only have to print it.







Decision
 import Control.Monad merge :: [Int] -> [Int] -> [Int] merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) | x < y = x:merge xs (y:ys) | otherwise = y:merge (x:xs) ys mergeLists :: [[Int]] -> [Int] mergeLists = foldl merge [] getUserInputs :: Int -> IO [[Int]] getUserInputs t = replicateM t $ do n <- getLine return $ tail $ read <$> words n main :: IO () main = do k <- read <$> getLine lists <- getUserInputs k let res = mergeLists lists mapM_ (putStrLn . show) res
      
      





However, this solution has two serious problems - the computational complexity is higher than the required one - O (k ^ 2 ⋅ n) instead of O (k ⋅ log (k) ⋅ n) , plus it uses a lot of additional memory. As a result, this solution fails test number 17 due to exceeding the limit of used memory - 17.27Mb instead of the allowed 10Mb.







While we will not pay attention to the fact that the numbers supplied to the input belong to a limited range of values, and we continue to search for solutions for a more general case.







The next step is to try to implement the approach that was proposed in the original article with analysis of these tasks. Let me remind you that it is based on the use of a data structure that provides an efficient way to extract the minimum element. As such a structure, select Data.Set . We initialize Set with the list of first elements, then at each step we will extract and print the minimum element, and then add the next element from the corresponding list. In addition, we will need a Data.Sequence structure to store the lists themselves. It was chosen for reasons that at each step it is necessary both to have quick access to the list by its index (which the list cannot provide), and to change the element of this element without the need to copy the entire structure (which in general cannot provide immutable Data. Array ).







Thus, we have the following program:







Decision
 import qualified Data.Sequence as Seq import qualified Data.Set as Set import Control.Monad import Data.Foldable mergeLists :: Set.Set (Int, Int) -> Seq.Seq [Int] -> IO () mergeLists set seq | Set.null set = return () | otherwise = do let ((val, idx), set') = Set.deleteFindMin set print val if null (Seq.index seq idx) then mergeLists set' seq else mergeLists (Set.insert (head (Seq.index seq idx), idx) set') (Seq.adjust tail idx seq) getUserInputs :: Int -> IO [[Int]] getUserInputs t = replicateM t $ do n <- getLine return $ tail $ read <$> words n main :: IO () main = do t <- read <$> getLine lists <- getUserInputs t let init_seq = Seq.fromList (filter (not . null) lists) let init_heap = Set.fromList (zipWith (,) (toList (fmap head init_seq)) [0..]) mergeLists init_heap $ tail <$> init_seq
      
      





We send the solution and find out that even though the program began to consume much less memory (10.26Mb instead of 17.27Mb on the 17th test), it still did not meet the limit. The reason for this lies in the fact that with this decision, one way or another, we have to read the entire input data into the memory. Let's try to avoid this using the third solution to this problem - sorting by counting.







We have already performed the count of the number of incoming characters when solving the previous anagram problem. Also, as in solving it, we will use Data.Array . First, we implement the addToArray :: Array Int Int -> [Int] -> Array Int Int function, which forms an array based on the existing one by increasing the values ​​at the indices that correspond to the values ​​from the list.







 addToArray :: Array Int Int -> [Int] -> Array Int Int addToArray acc elems = accum (+) acc [(i, 1) | i<-elems]
      
      





Then, we will use the approach known to us in the problem of removing repeats - using the monadic convolution, sequentially applying the addToArray function to k source arrays. And ... we get the same result of 10.26Mb on the 17th test. And then it's time to remember that foldl (whose analogue is foldM ) according to the accepted reduction order will first expand the entire chain of nested expressions and only then proceed to their active calculation. As you know, to combat this fact, the Data.List module implements the foldl ' function, which uses the function seq :: a -> b -> b , which first casts the first argument to its weak head normal form, that is, reduces it to the outer part - the value of the function or constructor, and then returns the second ( https://www.ibm.com/developerworks/ru/library/l-haskell4/index.html ). We have no choice but to implement the foldM ' function independently .







 foldM' :: (Monad m) => (a -> b -> ma) -> a -> [b] -> ma foldM' _ z [] = return z foldM' fz (x:xs) = do z' <- fzx z' `seq` foldM' fz' xs
      
      





As a result, the amount of memory used in the 17th test almost halved and amounted to 5.64Mb! Although the 17th and 18th tests were successfully passed, this implementation did not pass the 19th test for the same reason that the memory limit was exceeded - 10.25Mb.







Ok, let's move on - we have not tried Data.Array.Unboxed yet. This type of arrays is noteworthy in that, unlike the standard one, it can store values ​​themselves, rather than pointers to them, with their elements ( https://wiki.haskell.org/Arrays#Unboxed_arrays ). Due to this, such arrays take up less memory space and are more efficient. In order to use them, we only need to change the import and function types, since Data.Array and Data.Array.Unboxed implement one interface of immutable IArray arrays.







We are sending a solution - memory consumption has decreased 4.5 times to 2.26 MB, but it has not passed the time limit - the execution time was 1.09 seconds. With what it can be connected? Judging by the fact that the execution time of the remaining tests remains the same, I think that the reason is not that the unboxed array turned out to be slower than boxed, but in particular the testing system. It seems that the task is interrupted as soon as one of the restrictions is violated. However, in very rare cases, this implementation still passes the 19th test with a result of 0.98 seconds, but fails test number 20 also due to exceeding the time limit.







After that I tried to use the unsafe analogue of the accum function, which in theory should be faster, various buffering methods ( hSetBuffering :: Handle -> BufferMode -> IO () function), mutable IOArray arrays, but none of these methods yielded any results .







I am not inclined to believe that the limits for Haskell are set too tightly, and I hope that there is still a solution that will pass all the tests. In the project repository, I posted several different versions of the code for solving this problem (with Array and IOArray), perhaps this will be the starting point for a solution that will pass all the tests.







Conclusion



Even despite the fact that only five out of six tasks succumbed to me, I fulfilled my main task - to practice functional programming. Not the least role was played by severe restrictions on the resources consumed by the program, which forced us to look for more and more new approaches to solving problems. I hope their description will be useful for those who are just starting their way in functional programming







Was the functional approach convenient for solving such problems? Honestly, I have a double impression. On the one hand, the solutions to most problems turned out to be very concise, and the expressive tools of Haskell himself, as well as his rich standard library, played a significant role in this. On the other hand, one cannot but admit that in most cases the management of consumed resources can be a certain problem, which will not allow solving the problem in the given restrictions.








All Articles