How are the machine learning sections at interviews in Yandex

Each Yandex service is largely based on data analysis and machine learning methods. They are required for ranking web search results, and for searching images, and for the formation of recommendation blocks. Machine learning allows us to create unmanned vehicles and voice assistants, reduce useless downtime for taxi drivers and reduce waiting times for their customers. All applications and not list!







Therefore, we always feel the need for data analysis and machine learning specialists. One of the most important stages of an interview in Yandex for them is the general section on machine learning, which I will discuss in this article. An example of the model task for this section and the possible content of the answer for it I made out in the video, which recently became available on YouTube . In this article I will talk more about what we expect from a strong candidate in such a section and why we formulated precisely such criteria.







image







1. Yandex Machine Learning Interviews



Senior and leading employees at Yandex can independently turn tasks formulated in business terms into correctly posed machine learning tasks; choose the appropriate solution methods, form characteristic descriptions, build the process of updating the models and the correct control of their quality; finally, verify that the resulting solutions meet the original business requirements.







To a large extent, these people influence // the formation of // business requirements: people who work directly with data can know better than anyone in the world what characteristics of services affect their popularity and usefulness, what problems users need to solve, and at what indicators it will affect.







As a rule, our best employees also have expert knowledge in specific areas - for example, computer vision, building language models or models for advisory or search services.







We greatly appreciate our employees and their expertise and want external candidates to also meet this level. To test this, one or more sections may be devoted to special topics, such as computer vision or ranking training methods. One of the sections is obligatory devoted to “general” issues: statement of the problem, the formation of the objective function and the training sample, the acceptance of models. It is about her now and will be discussed.







Of course, the full range of requirements applies only to those candidates who apply for the positions of senior or leading specialists. Candidates who rely on middle or junior positions do not have to be able to do all of the above, but they should know that the relevant skills are extremely useful for career growth - both in Yandex and other companies.







Depending on the requirements of the unit, an algorithm section with writing code or even an architectural section may also be required.







2. Statement of the problem



So, the main task of the section is to check how much the candidate is able to independently deal with the task in its entirety, starting from its formulation and ending with acceptance questions in user experiments.







The section begins with the statement of the problem in business terms. Say, you may need to create a service that prompts nearby establishments, recommends certain products or ranks movies or music that are of interest to users.







You can start by identifying possible applications for the task in question. How many users will the resulting solution have, who are they, why do they need this functionality, how do they find out about it? The candidate can ask all these questions, or can offer his own vision of the answers (the latter option, of course, is preferable).







Based on the causes of the task, business metrics are formulated, and then metrics are suitable for optimization in the learning process or model selection. An indicator of an extremely high class of a candidate is if the choice of metrics for optimization is dictated by the physical meaning of the problem being solved. The simplest example of this kind is the use of DCG-like metrics for ranking tasks or AUC-like metrics for some specific classification problems.







Here it is also necessary to touch upon the issue of forming a training sample. What data is needed for its formation, how to get it? What is an event for our training? Is sampling required? If so, how to do it?







3. Machine learning methods



After the task is fully formulated, you can begin to discuss methods for solving it.







Here you need to choose a model that will build the solution, and justify your choice. It is worth talking about which loss functional is optimized in the process of building the model and why it is a good choice for optimizing the metrics that were discussed in the previous paragraph. It is also useful to consider the optimization method used.







The next item to be discussed is the feature space. A class specialist can immediately come up with several tens or even hundreds of signs in a new task, having previously broken them into several classes according to the types of data used (for example, signs can depend only on the user, or they can depend on the “user-object” pair).







An additional plus is the consideration of the cold start problem. When the Yandex.Taxi service has already been created, we can use information about real trips to optimize routing methods in the city; when there is already a Yandex search engine, you can use user actions to receive signals about which documents are relevant to your queries. But what if the service has not yet been created, and the problem being solved is critical for its functioning? We need to offer some way to build a reasonably good solution in this case.







4. Quality control



Finally, when the solution is ready, you need to make sure that it is good enough. If the new solution was preceded by some previous one, it is necessary to understand whether the new solution is better.







At this point, the candidate needs to demonstrate his ability to formulate experiments to test relevant hypotheses. Here you need to choose an experiment model and a way to test the statistical significance of the changes. Let's say it can be a regular A / B experiment on users of the service, or it can be an expert evaluation of the results of the work. What indicators should be monitored? What should be monitored and how to ensure the correctness of the experiment?







5. How to succeed in the section



The level of the candidate is completely determined by how independently and deeply he managed to state the solution of the task. A well-conducted machine learning section is indistinguishable from a section known to our Western colleagues as ML System Design, and a weak section may look like a discussion of a special issue - for example, LLH optimization by linear dividing rules.







At the same time, we well understand that in machine learning problems it is often unclear which method will work and if at least some will work. Therefore, feel free to discuss the problem with the interlocutor as if it was your colleague with whom you decided to discuss possible solutions to the problem that arose during normal work. We do not require that the solution described in the section is guaranteed to be good - we just want it to have a reasonable justification to believe that you can cope with a similar task in real conditions.







For example, we do not require a thorough knowledge of various statistical criteria, but we expect that you will recognize the importance of correctly testing hypotheses and be qualified to use these methods if necessary. Similarly, this section does not require a detailed description of the operation of machine learning methods, but we expect that you will be able to reasonably choose certain models for your tasks.







At the same time, you are free to delve into the areas well known to you. Spend more time in your answer and less talk about where your knowledge is not so deep. We understand that it is impossible to be a specialist in all areas and at the same time we value intellectual honesty. If a candidate understands his strengths and weaknesses well and is able to speak openly about this, this is a very good sign. In addition, this allows you to effectively spend section time: it will be spent more on identifying the strengths of the candidate.










Finally, I will give several sources that it will be useful to study, both in order to successfully work on projects related to machine learning, and in order to prepare for the machine learning section.









Well, our YouTube video:










All Articles