The book "Architects of intelligence"

image Artificial Intelligence (AI) is rapidly moving from science fiction to everyday life. Modern devices recognize human speech, are able to answer questions and perform machine translation. In various fields, from driving an unmanned vehicle to diagnosing cancer, AI-based object recognition algorithms are used, the capabilities of which are superior to human ones. Large media companies use robotic journalism, creating articles similar to copyright from the data collected. AI is obviously ready to become a truly universal technology, such as electricity.



What approaches and technologies are considered the most promising? What major discoveries are possible in the coming years? Is it possible to create a truly thinking machine or AI comparable to human, and how soon? What risks and threats are associated with AI, and how to avoid them? Will AI cause chaos in the economy and the labor market? Will superintelligent machines get out of human control and become a real threat?



Of course, it is impossible to predict the future. Nevertheless, experts know more about the current state of technology, as well as about innovations in the near future, than anyone else. You are waiting for brilliant meetings with such recognized people as R. Kurzweil, D. Hassabis, J. Hinton, R. Brooks and many others.



Yan Lekun



VICE PRESIDENT AND FOUNDER OF THE AI RESEARCH LABORATORY AT FACEBOOK (FAIR), COMPUTER SCIENCE PROFESSOR AT NEW YORK UNIVERSITY



Together with Jeffrey Hinton and Joshua Benjio, Jan Lekun is part of a group of researchers whose efforts and perseverance have led to the current revolution in relation to neural networks and deep learning. While working at Bell Labs, he invented convolutional neural networks. He received an electrical engineer diploma in Paris from ESIEE, and a doctorate in computer science from the University of Pierre and Marie Curie. After graduate school, he worked at the Jeffrey Hinton Laboratory at the University of Toronto.



Martin Ford: The explosion of interest in deep learning over the past 10 years is a consequence of the simultaneous improvement of neural networks, increasing the power of computers and the amount of data available?



Yang Lekun: Yes, but the process was more deliberate. Appeared in 1986–87. the backpropagation algorithm made it possible to train multilayer neural networks. This caused a wave of interest that lasted until 1995. In 2003, Jeffrey Hinton, Joshua Benggio and I came up with a plan to renew the community’s interest in these methods, because they were confident of their imminent victory. So we can say that there was a deliberate conspiracy.



M. F.: Did you already understand all the prospects? AI and deep learning are now considered synonymous.



I. L.: Yes and no. We knew that methods would form the basis of computer vision, speech recognition, and possibly a couple of other things, but no one expected that they would extend to understanding natural language, robotics, analysis of medical imaging, and even contribute to the emergence of unmanned vehicles. In the early 1990s I thought that the movement towards these things would be smoother, and they would appear a little earlier. We were waiting for the revolution that happened around 2013.



M.F .: And how did your interest in AI and machine learning arise?



Y. L.: From childhood I was interested in science, technology and global issues about the origin of life, intelligence, the origin of mankind. The idea of ​​AI enthralled me. But in the 1960–70s. nobody did this in France, so after school I went to study as an engineer.



In 1980, I really liked the book on the philosophy of Language and Learning: The Debate Between Jean Piaget and Noam Chomsky ("Language and Learning: A Discussion Between Jean Piaget and Noam Chomsky"), in which the creator of the theory of cognitive development and the linguist discussed nature and education , as well as the emergence of language and intelligence.



On the side of Piaget, MIT professor Seymour Peypert spoke at the origins of machine learning in the late 1960s. actually contributed to the cessation of work with neural networks. And after 10 years, he extolled the so-called perceptron - a very simple model of machine learning that appeared in the 1950s. and on which he worked in the 1960s. So for the first time I got acquainted with the concept of machine learning and was absolutely fascinated by it. The ability to learn, I considered an integral part of intelligence.



As a student, I read everything that I could find on machine learning and did several projects on this topic. It turned out that in the West no one works with neural networks. Some Japanese researchers worked on what later became known as this term. In our country, this topic was of no interest to anyone, partly because of what appeared in the late 1960s. books by Peypert and Minsky.



I began independent research and in 1987 defended my doctoral dissertation Modeles connexionnistes de l'apprentissage ("Connectionist learning models"). My manager Maurice Milgram did not deal with this topic and directly told me that he could officially become my consultant, but he could not help me technically.



In the early 1980s I discovered a community of people who worked on neural networks and contacted them. As a result, in parallel with David Rumelhart and Jeffrey Hinton, I discovered such a thing as the method of back propagation of error.



M.F .: That is, in the early 1980s. In Canada, there have been numerous studies in this area?



Y. L .: No, everything happened in the USA. In Canada, such studies were not yet conducted. In the early 1980s Jeffrey Hinton was an employee of the University of California at San Diego, where he worked with cognitive psychologists such as David Rumelhart and James McClelland. As a result, a book appeared explaining psychology with the help of simple neural networks and computer models. Then Jeffrey became an assistant professor at Carnegie Mellon University. He only moved to Toronto in 1987. Then I moved to Toronto and worked in his laboratory for a year.



M.F .: In the early 1980s. I was a student in computer science and I don’t remember that neural networks were used somewhere. Now the situation has changed dramatically.



Y. L.: Neural networks are not just on the sidelines of science. In the 1970s and early 1980s. they were actually anathematized. Articles were rejected for one mention of neural networks.



The well-known article is Optimal Perceptual Inference, which was published in 1983 by Jeffrey Hinton and Terry Seinowski. To describe in it one of the first models of deep learning and neural network, they used code words, even in the name.



M.F .: You are known as the author of a convolutional neural network. Please explain what it is?



Y. L.: Initially, this neural network was optimized for the recognition of objects in images. But it turned out that it can be applied to a wide range of tasks, such as speech recognition and machine translation. The idea for its creation was served by the features of the visual cortex of the brain of animals and humans, studied in the 1950s and 60s. David Hubel and Thorsten Wiesel, who later received the Nobel Prize in neurobiology.



Convolutional network is a special way of connecting neurons that are not an exact copy of biological neurons. In the first layer - the convolution layer - each neuron is associated with a small number of image pixels and calculates the weighted sum of its input data. In the process of training, weights change. Groups of neurons see small areas of the image. If a neuron detects a particular feature in one area, another neuron will detect exactly the same feature in the adjacent area, and all other neurons in the remaining areas of the image. The mathematical operation that neurons perform together is called discrete convolution. Hence the name.



Then comes the non-linear layer, where each neuron turns on or off, depending on whether the weighted sum calculated by the convolution layer turned out to be higher or lower than the specified threshold. Finally, the third layer performs a downsampling operation to make sure that a slight bias or deformation of the input image does not greatly change the output. This provides independence from deformations of the input image.



In fact, a convolutional network is a stack organized from layers of convolution, nonlinearity, and subsampling. When they are folded, neurons appear that recognize objects. For example, a neuron that turns on when the horse is on the image, another neuron for cars, a third for people, and so on, for all categories you need.



Moreover, what the neural network does is determined by the strength of the connections between neurons, that is, weights. And these weights are not programmed, but are the result of training.



The image of the horse is shown to the network, and if it does not answer “horse”, it will be informed that this is wrong, and prompted with the correct answer. After that, using the error back propagation algorithm, the network adjusts the weights of all the connections so that the next time the same image is displayed, the result is closer to the desired one. At the same time, you have to show her thousands of images.



M.F .: Is it a teaching with a teacher? As I understand it, now this is the dominant approach.



Y. L.: Exactly. Almost all modern deep learning applications use teacher training. The magic is that the trained network for the most part gives the right answers even for images that it has not been shown to before. But it needs a huge number of examples.



M.F .: And what can be expected in the future? Will it be possible to teach a car as a child, who only need to show a cat once and name it?



I. L.: In fact, you are not quite right. The first convolutional trainings really take place on millions of images of various categories. And then, if you need to add a new category, for example, teach a computer to recognize cats, a few samples are enough. After all, the network is already trained to recognize objects of almost any type. Additions to the training relate to a pair of upper layers.



MF: It already looks like the way children study.



Y. L .: No, unfortunately, this is not at all like that. Children get most of the information before someone tells them, "This is a cat." In the first few months of life, children learn without a clue about the language. They recognize the structure of the world by simply observing the world and interacting a bit with it. This method of accumulating knowledge is not available to machines. How to call it is not clear. Some use the provocative term “teacherless teaching”. This is sometimes called anticipatory, or inductive, training. I call it self-study. When teaching this type, there is no question of preparing to perform a task, it is simply observing the world and how it functions.



M.F .: Does reinforced learning fall into this category?



Y. L .: No, this is a completely different category. In fact, there are three main categories: reinforced learning, teacher training, and self-learning.



Training with reinforcement takes place through trial and error and works well for games where you can make as many attempts as you like. AlphaGo's good performance was achieved after the machine played more games than all of humanity in the last three thousand years. To problems from the real world, such an approach is impractical.



A person can learn to drive a car in 15 hours of training without crashing into anything. If you use the existing methods of training with reinforcements, the car, in order to learn how to ride without a driver, will have to fall off a cliff 10 thousand times before she understands how to avoid this.



M.F .: It seems to me that this is an argument in favor of modeling.



Y. L.: Rather, it is a confirmation that the type of training people use is very different from reinforced learning. This is similar to model-based reinforcement training. After all, a person who drives a car for the first time has a model of the world and can predict the consequences of his actions. How to make the machine to independently study prognostic models is the main unsolved problem.



M. F.: Is this what your work with Facebook is about?



I. L.: Yes, this is one of the things we are working on. We also train the machine to observe different data sources. We are building a model of the world, hoping for the reflection of common sense in it, so that later it can be used as a prognostic one.



M. F.: Some people think that deep learning alone is not enough, and in the networks there should initially be a structure responsible for intelligence. And you seem to be convinced that intelligence can organically emerge from relatively universal neural networks.



Y. L .: You exaggerate. Everyone agrees with the necessity of the structure; the question is how it should look. And speaking of people who believe that there should be structures that provide logical thinking and the ability to argue, you probably mean Gary Marcus and, possibly, Oren Etzioni. We argued with Gary about this topic this morning. His opinion is not well received in the community, because, without making the slightest contribution to deep learning, he critically wrote about it. Oren worked in this area for some time and at the same time speaks much softer.



In fact, the idea of ​​convolutional networks arose as an attempt to add structure to neural networks. The question is: which allows the machine to manipulate characters or, for example, corresponding to the hierarchical features of the language?



Many of my colleagues, including Jeffrey Hinton and Joshua Benggio, agree that sooner or later we can do without structures. They can be useful in the short term, because a way of self-learning has not yet been invented. This point can be circumvented by linking everything to architecture. But the microstructure of the cortex, both visual and prefrontal, seems completely homogeneous.



M.F .: Does the brain use something similar to the error back propagation method?



I. L.: This is unknown. It may turn out that this is not the back propagation in the form as we know it, but a similar form of approximation of the gradient estimate. Joshua Benjio worked on biologically plausible forms of gradient estimation. There is a chance that the brain estimates the gradient of any target function.



M.F .: What other important things are being worked on at Facebook?



Y. L.: We do a lot of basic research, as well as machine learning, so we mainly deal with applied mathematics and optimization. Work is underway on reinforced learning and the so-called generative patterns, which are a form of self-learning or anticipatory learning.



MF: Does Facebook develop systems that can maintain a conversation?



Y. L.: I have listed the fundamental research topics above, but there are also many areas of their application. Facebook is actively developing developments in the field of computer vision, and it can be argued that we have the best research group in the world. We work a lot on word processing in a natural language. This includes translation, generalization, categorization (finding out what topic is being discussed) and dialogue systems for virtual assistants, question and answer systems, etc.



M.F .: Do you think that one day an AI will appear that can pass the Turing test?



I. L.: At some point this will happen, but I do not consider the Turing test a good criterion: it is easy to fool, and it is somewhat outdated. Many forget or refuse to believe that language is a secondary phenomenon in relation to intelligence.



»More details on the book can be found on the publisher’s website

» Contents

» Excerpt



25% off coupon for hawkers - Intelligence Architects



Upon payment of the paper version of the book, an electronic book is sent by e-mail.



All Articles