Internship at Mars Digital Technologies. How we applied deep learning at M & M's





Hello!



Annually, about 200 students and graduates join Mars in Russia as trainees or participants in the leadership program. Dmitry Korzhimanov, completed an internship at the Mars IT hub this summer. In our blog, Dmitry talks about a project that he worked on during an internship.





A large number of companies offer internships for students starting in the third year. So I, like many of my classmates graduating from the third year, set out to find an internship for the summer. At the Moscow State University mechmath, students prefer IT and banking, which is not surprising, given the presence of a decent number of mathematical problems both there and there. I considered internships in different companies and chose Mars, as I was interested in the opportunity to work in IT projects in real production. In addition, Mars is an international company, which means that the work was ahead in a team from all over the world.



After passing technical and verbal tests, as well as an interview with a future mentor, they sent me an offer and, lo and behold, I became an intern at Next Generation Technologies. This department consists of 5 people from around the world and is primarily engaged in the introduction of machine learning technologies in various production and sales cycles of the company.



Honestly, I am not a master of machine learning algorithms, and at the time of the internship, I did not have a full understanding of what I would do and how I would implement projects. Yes, I had a good mathematical background and even knew what Random Forest was, but that was where my knowledge ended. Among other things, in Mars there are a large number of tasks related to computer vision, and our team successfully solves them. It turned out that for the first 2 weeks of the internship, I figured out how neural networks work, what OpenCV is, what metrics exist, how to work with PyTorch and TensorFlow, and many, many more things. The head was in full swing at the end of every day, because the information had to be absorbed in huge quantities, and the ideas of some algorithms turned out to be quite non-trivial. Fortunately, at the moment there are a lot of really useful resources such as Neurohive, Medium and the same Habr (I express my deep gratitude to everyone who writes articles on the topic of machine learning here, without you, immersion in the topic would be a lot more difficult task). Thanks to these resources, you can study a particular topic at a fairly high speed. Coursera, which I still watch lectures on, is also on the sidelines.



Having delved into some of the fundamental foundations of deep learning and having mastered a little, I was offered to take part in the project. We had to implement a program that measures objects in a photograph. Of course, the question arises: why is this needed by the FMCG company? However, the answer is quite simple: to maintain statistics on the size of raw materials and products in the factory. For example, Mars produces M & M's, and I think many have tried M & M's with peanuts. However, so that each candy is not too big and not too small, peanuts should also be a certain size.



As you know, in the world there is nothing ideal and parties come to the factory of completely different quality. Therefore, a special person has to manually (!) Make a selection from a batch of nuts and compile statistics on them. Naturally, at some point, the company realized that the process was rather inefficient and it would be nice to automate it. The task in fact is quite simple and all that is required here is to teach the model to find nuts or other objects, say, on an A4 sheet, as well as a landmark of a known size printed on it, by which the pixels will go into real units. It sounds pretty trivial, but in fact, difficulties usually begin to arise. I was instructed to engage in the implementation of the part related to the recognition of this very landmark and the translation of pixels into millimeters.



Large companies pay serious attention to maintaining their corporate identity, which also played a role in my project. As a guideline or reference point, which will be printed on paper at factories around the world, a circle was chosen with the letter โ€œMโ€ characteristic of the company.



To solve my problem, I immediately started looking for good examples of the implementation of convolutional neural networks that can quickly and clearly find objects in the photo. After reviewing the various options, I selected one of the Faster R-CNN implementations and studied its capabilities and architecture. The next step was training the model, for this, about a hundred photographs were generated with a randomized location of the reference point and, accordingly, a square that describes it. At this point, the reportlab package helped me a lot. This is a set of tools designed specifically for generating documents with a custom arrangement of objects on them.



For augmentation, the albumentations package was chosen, the capabilities of which are quite impressive. On Github you can read more about it, including a comparison with alternative libraries. I tried to make the transformations as diverse as possible so that there were no so-called โ€œblack boxesโ€ left. Rotations, zooming, changing contrast, adding noise and other standard techniques. Unfortunately, there was a problem with the Rotate tool, which was necessary for me, because no one gave a guarantee of the correct orientation of the letter M in the photo, and the model should have been able to recognize different options for its position.



No matter how I try to find the error, the function from the albumentations package stubbornly continued to make incorrect bounding box conversions, displacing it in the wrong key. As a result, I decided to use the classic imgaug package, which I had no problems with.



Having trained the network in several eras, I became interested in what result the model will produce on real examples. Having printed a sheet of paper with a reference point on it and taking some photos, I began to test the neural network. And yes, she really learned how to find what she needed, but she did it far from perfect. Those boundaries that the network showed were too inaccurate for high-quality measurement of objects, and here the rather unpleasant process of fitting parameters in search of the ideal configuration began. I changed the training step, tried to change the network architecture, finish it, change the augmentation settings, but the average IoU did not rise above 0.85.



The result of the Faster R-CNN







Alas, my inexperience played a trick on me. Having not preoccupied myself with the search for simpler tools, I was too keen on experimenting with the neural network, but no matter how I tried, it did not give an acceptable result. In the process, it turned out that the production simply did not use color printers, and all my efforts seemed to be in vain. But then my colleague told me that you can use the usual black circle and the Hough transform from the OpenCV package to detect it. I could use the same simple solution for the initial version, however, I did not pay due attention to the classical tools of computer vision. Having played a little with the parameters, I finally got what I wanted, and the problem was solved. The next step is to create an API using the Flask package and release the program on the Azure server, but this is a completely different story.



Final result with Hough transformation







In addition to the tasks of computer vision, the company has tasks from other areas of machine learning - from processing tweets to monitor the perception of a particular product, to developing programs that help veterinarians treat animals more efficiently. What was a pleasant surprise for me is the research part of the work. Often the tasks sound very simple, but it is not known how to find their high-quality solution. To do this, you have to study articles and current methods in the field of Machine Learning. For example, I had to understand the Bayesian methods of machine learning, which are very popular for solving medical problems. Thanks to this, I was able to study medical articles describing the application of these methods and help the team choose the right development vector.



At the moment, I am completing my internship. Since the new school year is coming, in September I switched to part-time work, which is not a problem. In addition, the company has the opportunity to work from home and, as it turned out, this practice is quite common here. Now our team is completing the project that I described in the article, and is moving to a new one related to computer vision. There are really many tasks and they are all completely different, so you wonโ€™t be bored.



An internship at Mars is now open, you can learn more and apply at: website .



All Articles