The Difference Between a Data Scientist and a Teenager in a Sports Car





Recently, a lot of courses have appeared, both academic and private, which aim to train data analysis and train specialists who can solve business problems using machine learning. If you look closely at the programs of these courses, they are all about the same, the difference is only in the training formats (online offline) and in the teachers.



The School of Data began doing such courses back in 2015. Moreover, they began to do according to the same scenario. We reviewed a large number of programs of various academic courses in machine learning, based on experience, chose only what is really needed to solve practical problems and made a large number of Jupyter notebooks in which we tried to make out mathematics and machine learning on our fingers.



We tried to teach primarily machine learning technologies, word processing methods, neural networks, analysis of network structures, recommender systems and other areas of data analysis. And it seems that the students' reviews were good, but still something was missing.



Considering that our main activity is the development of real tasks in the framework of the Data Studio , the students, first of all, we prepared for ourselves. We quickly realized that in practice, knowledge of data analysis and machine learning methods is, as mathematicians say, “a necessary but not sufficient condition.” That is why we very quickly updated the program of our classes, taking into account real needs.



Briefly, the conclusions we came to (and on the basis of which we are now building our training):





The following paragraphs will discuss all of these issues.



Most of the tasks in large companies that are now trying to solve using modern methods of data analysis and neural networks have been solved for a long time. Banks are the most successful cases in risk management. In telecoms, this is CRM / CBM, where the entire business model is tied to an increase in LTV subscribers. Retail works similarly - there are several tasks (forecasting RTOs, inventory management, promotions) that provide core business.



There are manufacturing companies in which the main tasks are to increase the stability of the regime, reduce losses and predictive maintenance on the one hand, and manage inventory balances and marketing on the other.



These tasks are not new, their analysts have been solving for a long time. Moreover, analysts who understand the subject area. Moreover, in most cases there are a considerable number of vendors that are de facto standards for individual tasks, such as pricing management (in the case of retail), or APC systems (in the case of production). Moreover, as a rule, optimization algorithms including machine learning in such systems are already in place.



To do something fundamentally new here and make money on it is extremely difficult. As the saying goes, "apples that fell from a tree" have already been harvested. It remains to search only for new business cases in which analytics gives an economic effect. There really are such examples - and there are more and more of them.



However, to find such examples and see the effect of analytics there is not easy. To do this, you need to be able to deeply understand the subject area of ​​a particular process (the description of which, often simply is not). Understand what data is generally needed, understand what exactly the business is done on. Understand whether analytics is needed here at all, whether some predictive algorithms are needed (more often - no), whether the business process needs to be changed (more often yes), whether there are operational levers (what is the point of predicting equipment shutdown if there are still no ways to avoid it ?).



So - in the process of implementing such a digital product, many questions arise that require an analytical approach, a certain culture of working with data, the ability to put hypotheses, ask yourself questions, and think in terms of a business owner. The fact is that this is not taught at Data Analysis Schools, it is not taught at Coursera. Yes, modern courses probably train good engineers and mathematicians, but no analysts, they don’t.



Moreover, knowledge of machine learning methods and neural networks is more likely to kill the culture of analytical thinking. Most modern Data Scientists - like children behind a sports car - consider themselves unique (they know a lot of smart words about xgboost, neural networks, etc.), they don’t know how to drive (but why, if the car does everything for you), and they only go fast because there is a lot of horsepower (strong iron, although here it is more likely to retrain).



As a result, we get about the following picture: some smart, dear people come, almost do not ask any questions, saying that the data will tell us about everything. They take some data, then come - they say that they built some kind of model, they call accuracy in percent and that's it. As soon as you start challenge - they say in strange words, crush intelligence, but there is no sense from them.



This explains that now among the contractors for digital transformation or data analysis - mainly consulting management consulting companies (not IT) dominate. Because they have a culture of analytics, a culture of business thinking, they always relieve headaches, offer solutions. They are not limited to building a machine learning model, they do real analytics that helps make a decision.



Another trend that is happening in the world right now is that even if the Data Scientist is less successful, it cannot be universal. In many companies, the initial centralized data analysis structure has become distributed. The central office has only the role of providing infrastructure, and the entire grocery part, real digital products are already made directly in business units. In this structure, respectively, the Data Scientist (provided that he is “correct”) becomes an expert in the subject area - the functional is transferred to him, which until then had been supported by the “old” analysts who worked before him. In case of success, he is also given the operational levers.



As a result, there is an increasing tendency to give successful analysts operational leverage in their hands and their responsibility is increasing. But only in one subject area. We predict (as confirmed by large companies in the market) that there will be no more universal analysts - the hype is over, it is time to be responsible for the result. Those who can solve business problems with the help of analytics will go to the grocery part, and those who can teach xgboost will go back to the academy or give lectures on machine learning.



That is why we have completely revised our courses (including because we take many of our graduates to our Data Studio ) and now:



0. To begin with, at the entrance we see in each of the students our future employees who will sail with us in the same boat and participate in large projects. Therefore, we are interested in the fact that the student in these 3.5 months is prepared as efficiently as possible. You can always have time to take the next course at Coursera, if there is a need to understand the details of a particular algorithm. However, getting the experience of real cases is much more difficult. And that is why:



1. The training is based on the case method. We take the real task, first we analyze the business model, the unit economy, we understand what quality, based on real numbers, we must achieve in this task. We evaluate the potential economic effect. And only after that we begin to deal with the technical part, gradually plunging into analytical methods, machine learning and neural networks. And what’s important - we do it only if it is really necessary in this task



2. We work with each student individually. Despite the fact that we are trying to recruit a homogeneous group, we understand that people are different - each has its own individual training plan and their homework. In our opinion, this is nonsense when a couple of dozen people solve the same problem. This is not effective even in terms of common sense. All students receive the teacher’s answers in the chat, the student will never be thrown one on one with the task.



The only thing we warn everyone in advance at the entrance is that the training will require a significant amount of time, you will constantly need to do homework, dive into the details, and often spend the weekend learning.



We understand that this is not a mass story. Data Studio has been successfully operating for several years, including because it is difficult to get into it. We are well aware that in the current realities it is easier to grow analytics than to take after courses with Coursera. That is why the most motivated students initially come to the Data School . Usually - the size of the group does not exceed 15-20 people, which allows you to make the training virtually individual.



Not to mention the fact that we completely thought out the whole technical side - pre-prepared Jupyter notebooks, an effective communication system for remote participants, online broadcasts - all this helps even remote participants to communicate directly with other children in class.



We do not teach Data Scientists - we train full-fledged people who can solve business problems with the help of analytics.



The beginning of the new course is on September 23rd. For project questions, please contact us at Data Studio .



All Articles