Computer vision sees emotions, pulse, breathing and lies - but how to build a startup on this. Conversation with Neurodata Lab





Our relationship with computer vision was not so noisy until it learned how to perform miracles with human faces. Algorithms replace people in photos and videos, change age, race and gender. This is the main online entertainment of recent years, and a source of anxiety. Today, apps storm the charts, tomorrow protesters saw pillars with cameras that recognize faces. And, it seems, we are only at the very beginning of the journey. What the computer can read from our faces will be more and more.



Earlier this month, we visited the office of Neurodata Lab. The main direction for the company is the recognition of human emotions. We tried to find out how this is done and why.



At My Circle, Neurodata Lab received an average rating of 4.6 and an average recommendation of 95% of its employees, who rated criteria such as professional growth, interesting tasks, good relationships with colleagues and the fact that the company makes the world a better place.





In 2016, ten actors - five men and five women - took part in unusual shootings. They went to an empty area, dressed in tight-fitting black suits, and on cameras in different corners of the room, on the background of a green wall, they depicted “nothing” - just their neutral state.



Then the actors played out short scripts. There were no replicas in the scripts, only descriptions of situations, so the actors improvised. In each scene, they needed to experience one of six emotions — anger, sadness, disgust, joy, fear, or surprise. The facial expressions and gestures of experienced actors often become stereotyped, suitable for the theater more than for real life, so here all the actors were students.







They were followed by a teacher from the film school, but not only. The main director was a scientist and researcher Olga Perepelkina. In addition to video and sound on the set, bioelectric reactions of the skin surface and other physiological characteristics were recorded. Each scene was shot by a different cast several times, and as a result, they collected about seven hours of material.



The actors, having finished their work, described where and what emotions they actually experienced during the game. Then another 21 people watched the videos, and in each of the videos they noted what kind of emotion the actor seemed to experience. At what point does this emotion begin, and when does it end.







Thus began work on the first Russian-language multimodal dataset for emotion recognition - RAMAS .



But the material obtained was suitable only for scientific research and experimentation - not for training industrial-scale algorithms.



- (Olga Perepelkina) We needed to collect a giant dataset. Not 7, but 107 and more hours. We built the Emotion Miner web platform, uploaded a bunch of videos that are publicly available on the Internet, brought tens of thousands of people from all over the world, and they began to tag the data. Thus, we marked 140 hours of video on 20 scales (not only emotions, but also various cognitive and social characteristics), and collected the world's largest emotional dataset.



- And how did you manage to find so many people to mark up?

- (O.P.) Everything is simple - we paid them money for the work. Conducted promotions, invested a small budget in marketing. In principle, it was not very difficult. Now almost 70 thousand people are registered on the platform. But in reality, about two thousand people marked out the dataset.






Products



Startup Neurodata Lab was created by entrepreneurs Georgy Pliev and Maxim Ryabov. They funded research not for scientific curiosity, but to find commercial application for the technology. Now Affective Computing or “emotional computing” is not the most popular area in the market of neural networks and computer vision. There is high competition in the field of face recognition. Entertainment apps come into focus one after another. And systems that work with emotions do not leave the status of “promising” for several years. However, according to forecasts by Gartner and other studies, she is forecasting rapid growth.



Neurodata Lab has been researching for about three years, collecting data and developing algorithms. Now they use research results in commercial products. For example, Neurodata Lab has developed an emotional AI for Promobot robots. The robot used an emotion recognition system to correctly respond to the cues of the people who turn to it. The demo was shown at CES this year.



The algorithm is used in call centers to monitor calls and evaluate the work of employees. Now all this is done manually - managers have to selectively listen to call records and check whether the employee was rude to the client, or kept within the bounds of decency. The system can do this automatically and in real time. Along the way, the emotional mood of the client is also assessed - he was satisfied with the treatment or not. The pilot of a similar product Neurodata Lab launched in Rosbank. The algorithm analyzes calls to measure customer satisfaction.



The second product branch is somewhat more global. The company makes its API - a whole set of tools for third-party developers. Now it includes an analysis of emotions, a face tracker, sound analyzers, with which you can split an audio recording with several voices into different audio tracks and separate the noise. A body tracker, a heart rate detector, a respiratory tracker from a person’s video, and other technologies or algorithms will appear soon.






Principle of operation



A person learns to define emotions unconsciously - from childhood, he begins to associate certain patterns of behavior with the emotions that people around experience. Having already learned this, he can analyze what signs it does. The most obvious is the expression that the mouth and eyes take. But on the face there are many facial muscles that create an incredible amount of expressive nuances. We perceive them automatically, although we can consciously catch our eyes on certain details.



The neural network also analyzes hundreds of hours of video tagged by people. And the signs by which the system classifies emotions do not always turn out to be obvious.



- (Andrey Belyaev) There are common patterns for some classes. For example, the classes “anger” and “surprise” are characterized by strong expression on the face - raised eyebrows, rounded eyes, smoke from the ears. The grid certainly responds to them, but not only. For example, with small eyebrows that look like raised, she will calmly determine the correct class, because she also responds to the dynamics of changes. One of the interesting classes in this regard is “sadness”. Most often, when a person is sad, his face does not change for quite a long time. Grid notices zero dynamics in expression and makes the assumption that it is either “neutral” or “sad”, and only then clarifies the remaining signs and concludes that the class is right.



- What about the sound? Certain frequencies, ranges, tones?

- (A.B.) Sound is more complicated. Each person has his own standard volume, you can not get attached to the strength of sound. Someone can speak quietly and evenly, but in fact he is terribly angry. And even if we visualize the sound and understand what the system pays attention to, we cannot explain it as well as with the face. The face has clear points: eyebrows, eyes, ears and more. But there is no sound. Sound is fed into the grid in the form of a spectrogram, and what specific pieces of it are responsible for what and at what moment is much more difficult to understand. Therefore, there is no standard answer, what the grid pays attention to when working with sound.



- How to record the pulse?

- (O.P.) Micro-changes in skin color are tracked. When the heart beats, the blood is saturated with oxygen, the oxygenation of the blood changes, and because of this, the skin color changes. It will not work out with the eye, but with the help of the algorithm it is possible.



- But this very much depends on the quality of the video.

- (O.P.) We have been sawing this algorithm for quite some time and are able to work not only with a cool camera, but also with a regular webcam. We are able to work when the screen flickers. For example, when a person watches a film, and his lighting intensity constantly changes. We are able to work with conditions when a person moves and talks.



The pulse is a periodic signal, and it is clearly monitored, and the lighting from the film does not change periodically. Therefore, a useful signal can be separated from noise. We compared this technology even with fitness trackers. Our algorithm does just as well - and even better than some of them.



- The system can see what a person does not see, but a person still recognizes emotions better. Why?

- (O.P.) A person does better because he takes into account contextual information. But for this, we also need a multimodal system that improves accuracy by analyzing both the face and voice, gestures, pulse, breathing, and semantic text analysis.



This is how human perception works. You see a man from the back, watch him sit, and think, "it seems he is sad." And our goal is to create an algorithm that could perceive emotions as a person - generally in any conditions, for all pieces of information.



But now the advantage of the system over humans is that it can analyze a large amount of data automatically. Let a person sometimes do better, but you won’t make him sit around the clock and listen with his ears, for example, calls to a call center.



- If I experience an emotion, but try to hide it, will the system understand this?

- (O.P.) Maybe.






How is the development going?



Neurodata Lab is a small company, which until recently existed only as a laboratory. It has a science department, a Data Science team, and a development department that packs new developments and discoveries into products. Each department is 5-6 people. In total, the team has about 30 employees.



Research scientists



Psychologists, physiologists and biologists work in the scientific department. There are only four people and three interns in the state, but they have built a whole international network of collaborations. For example, in Russia there are projects in conjunction with Moscow State University, the Higher School of Economics and the RANEPA. Abroad - with the University of Glasgow, the Paris University of Technology, the University of Geneva, the engineering laboratory in Genoa, which is engaged in the analysis of movements.



Scientists who do emotional computing are a whole community. They regularly gather for joint workshops in various universities around the world. Once every two years, a major conference is held devoted exclusively to emotional technology. This year Neurodata Lab will organize its own workshop at this conference.







- I wonder, what is the daily work of a researcher?

- (O.P.) First, they read articles. For example, we wanted to learn how to recognize a lie, and not just emotions, and we need to figure out what lies are, how a lie detector works, what has already been done in this area, what are the problems of a classical polygraph, how can it be fooled, which algorithms are the coolest, how the human psyche is arranged, what psychological features there are when a person lies, how physiology works, why (and whether) a person’s nose gets colder and his ears turn red when he cheats, and so on.



Then we conduct a huge number of experiments. In order to create a system that recognizes the pulse and respiratory rate from the video, it was necessary to collect a lot of data. Subjects constantly come to us, we have equipment and all sorts of things that measure a person’s pulse in a contact manner. We measure ECG, photoplethysmography, skin-galvanic reaction. We had fun experiments when we wanted to understand how blood flow moves across the face, and then we glued electrodes directly onto the face.







Finally, we show people different vidosiki. We are trying to scare them, or vice versa - to cheer. Researchers analyze data, consider statistics, write articles and patents based on these data. Then they come to the technical department to Andrei and say: “We read a cool thing, conducted an experiment, you can try to make an algorithm that will work like this.” Or Andrei comes to us and says: “We want to detect falls, we need to figure out how to collect data.” And the scientific department sits down and thinks how it can be done simply and quickly.



- Dream job.

- (AB) Some people think - others do.






Date Scientists and Developers



The Data Science department works in parallel with product development. Datasaentists train neural networks on Torch, when there is room for maneuver in research, and on MXnet, when you need to make a fast-working solution. After confirming all the hypotheses about the applicability of neural networks, the guys transfer them to TensorRT to increase the speed of work and give them to the development team for implementation in production.



Neurodata Lab has created its own cloud service, which can be accessed by other developers - for research or commercial projects.



- (A.B.) The software kernel that distributes tasks between neural networks is written in Python. We needed to write it quickly, but it turned out pretty well. He works with dockers, communicates through RabbitMQ, runs in Postgres, and the gRPC layer hangs on top, which allows you to create a secure connection with the outside world and gives other programmers and researchers access to our technologies.



Web written in Symphony. API implemented using gRPC. This is a cool Google thing that allows you to make a secure channel and exchange keys with the system - so that it gives access only to certain internal functions. For example, you can only give a key to tools that can detect faces and recognize emotions.



I'm working on an idea - I want to build my own small data center, where the inference will be spinning. And it will be based on Jetson Nano. This is such a small single-board computer for ten thousand rubles. Like the Raspberry Pi, only with a graphics card. With a processor, RAM and everything else, it costs 6 times cheaper than 1080Ti without taking into account the other components of the computer, but it also works about 6 times slower.



- And what will it give?

- (A.B.) Firstly, it’s cheaper, and it will work in approximately the same way. Secondly, it will cease to harm the environment so much. Thirdly, they do not need a lot of electricity. Six Jetson Nano, which together power almost like 1080 Ti, spend six times less energy, plus take up much less space.



- Why haven't the miners reached them yet?

- (A.B.) Miners need their video card to be able to do a lot of things at once. But for us it is not so important. We have lightweight tasks that need to be quickly done using small powers and return the result. When you have six such tasks, it is more reasonable to distribute them into six small cards than to put everything into one large and powerful one, where these tasks will be shoved by your elbows.






How is the recruitment team



In the spring, product managers came to the team, and now the startup needs developers. Backend providers that will support the web in PHP and Symphony, or convince you to move, for example, to Python or Go. The front-end, which will make pages for new web services, expand functionality and improve the usability of existing ones. A kernel developer who, in addition to high-level knowledge of Python, understands Data Science and the specifics of working with hardware, testers, C ++ developers for working with the SDK, and many others.







- How is your hiring going?

- (A.B.) For Scientists' date, I am throwing off a not very difficult, but rather indicative task, by which we can judge the ability to think and program. I do it myself in forty minutes. Junior manages in 4-6 hours. After that, we call up and discuss technical issues. I suggest he brainstorm over a new task. We hypothesize together, test together. I just watch how a person feels in an unfamiliar environment in terms of tasks. Does he understand how the process of model development is going on, what you may encounter there and what you should not be afraid of.



After these stages, about 10% of people remain. About 50 people usually respond to the joons. We call the five remaining for a final interview at our office, and we simply communicate with almost complete readiness to take on the team.



- And with the developers?

- (AB) But with the developers, everything is a little worse. We give them such a test: you need to deploy a small service on any framework you like, inside the docker. This service should communicate with other dockers, in which lies Postgres and RabbitMQ. There is a task to read a channel in a rebit, accept from there the task of filling the database, and write everything to the database. It would seem that this task is very simple, to do it for about an hour. But everything collapses when we say that we will transfer pictures for writing to the database.



It constantly turns out that everyone solves this problem in completely different ways. And each person almost always has some kind of new idea that I had never even seen or imagined before. But at the same time everyone does not inspect something. On the test cut off about half of the candidates. Then we also call the developers in the office. We begin to talk on general topics, to find out what's next, what you want and so on. And after that, unfortunately, we have almost 0% exhaust.



- By what criteria do you understand that a person does not have enough soft skills or he will not be able to work in a startup?

- (A.B.) In simple conversations from the category: "Listen, but imagine that ...". He begins to develop a thought, and you accidentally add that our deadlines are on, and there are two weeks left for a project that needs to be done for two months. Some say, "This cannot be allowed." Okay Others say: “This is very bad, but we will squeeze the maximum. Of course, we won’t do everything. Maybe half, but that's better than a quarter. In general, everything will be cool, because the worst thing is the unfinished project. " These are the people - yes right away. Case in relation to the task.






Ethical standards and moral dilemmas



Face recognition, emotional computing - all these are research and technology based on data. Questions from the category, “to whom should the data belong”, “who and how should control their collection” - is a modern border territory.



One of the compromises that everyone now more or less agrees on is impersonal collection. For example, in accordance with the European law on the protection of personal data GDPR on the street, you can hang a camera and collect generalized data on the emotions of people in general. But it is impossible to analyze the emotions of a particular person without his consent.



In Russia, the legislation is softer. In China, it seems that the ethical issues of collecting data about people do not care at all, so unlimited technologies are developing at a frantic speed.



And this creates new dilemmas. If progress is not stopped, then what makes us put sticks in its wheels?



- (O.P.) Of course, there is some kind of line. Technology should benefit humanity. And we cannot ignore ethical issues. But what are ethical standards? Where do they come from? There are things that are related to the rigidity of human thinking: "we are used to think like this, we are used to the fact that this is impossible, that this is bad." Any innovation is often perceived by society with hostility.



When cars appeared, people were afraid and said: “Oh my God, what kind of crazy carts we are, we will all die, return the horses!” But time passed, the society got used to it and now has no idea how to live without cars. The same thing with transhumanism and the transformation of the human body. "Hey, how is it that we really will ever agree to modify our children, raise them third hands, what a nightmare, what horror, ethical commissions should prohibit this." 50 years will pass, and we will all have 5-6 hands and two heads, and we will use the neural interfaces to read the thoughts of people around. Of course, society will argue and try to protect the human species from changing it. But ethics will at some point depart from technology. This has always happened in history, and it will be so in the future.



All Articles