Digital counterparts of famous politicians and actors are under the full control of the "puppeteer". Illustration: University of Washington, 2015
3D-graphics programs, coupled with neural networks, have reached such quality that the fake video is almost indistinguishable from the present. Soon it will not be possible to say with certainty that the person on the TV screen is a real politician, not a computer simulation.
In December 2015, scientists from the University of Washington presented the technology of "digital twins" : the creation of "living" 3D models from hundreds of photographs of one character. On celebrities and politicians on the Internet collected a huge photo archive. The program creates a model, and that is like a doll on a string - it can be controlled as you please, give different facial expressions, pronounce any speech with your lips.
Now, on the eve of the computer graphics conference SIGGRAPH 2017, the same group of researchers has published a new scientific paper with an advanced version of “digital counterparts”.
Now, when teaching the program, not only photos, but also videos are used, so that the training has become much more effective. To demonstrate the technology, scientists have chosen a famous character - the former American President Barack Obama. This is a good choice, because on the Internet a huge amount of HD-footage with it. Millions of video frames are available for learning neural networks.
The neural network studied in detail the features of Obama's mimicry: lip movements with each sound, the appearance of wrinkles near the eyes, changes in the shape of the eyebrows and tilt of the head. The mimicry of the experimental character was associated with the sounds that he utters: the neural network processed not only frames of video clips, but also sound tracks to them.
Thus, a weak AI learned to synchronize facial expressions and lip movement with any arbitrary speech that researchers provide to the input of a neural network.
In the teaser for the scientific work, real videos of Obama's speeches are compared with the result synthesized by the neural network.
It should be noted that the synthesized result is noticeably different from the original, but it still looks very realistic.
Researchers emphasize that in the past, to get "digital twins", people were forced to repeat the same phrases in front of cameras many times to record all combinations of morphemes and facial expressions. Now it can be done on publicly available video. True, not every person on the Internet has enough videos to fake his personality, but over time, users solve this problem themselves by uploading gigabytes of their photos and videos to social networks.
From a practical point of view, this technology will also be used. For example, one of the co-authors of the scientific work, Ira Kemelmacher-Shilzerman (Ira Kemelmacher-Shlizerman) says that she will improve the quality of video conferences by synthesizing the missing frames if they fall out of the video stream. If the sound goes smoothly and the video lags, then such a synthesis will complement the picture or increase its resolution. Of course, technology can be used in computer games and virtual reality, if the player communicates with a virtual character. Now the speech of a virtual character will become more realistic, and it can be a digital copy of some real person. For example, you can “revive” any historical person from the recent past only from his audio recordings. Of course, creating fakes for political purposes will be easier. If now they are molded in Photoshop and are thrown on the social network , then in the future fake videos will be shown on TV.
Authors recognize that the technology while works imperfectly. For example, if Obama turns his face from the camera a little, then parts of his mouth may separate from his face and overlap with the background. But these are minor errors that can be corrected by additional training of the neural network.
Another drawback of the model created is that it does not model emotions. Facial expressions are absolutely neutral and almost always the same. Thus, in some cases, the digital twin loses its realism: his expression of the face seems too serious for the frivolous words that he utters. Or vice versa - too frivolous for very serious speeches. However, such incidents happen to real politicians in real life.
The created technology is similar in principle to the work on the program for creating digital twins Face2Face , where facial expressions and the speech of one person are transferred to the face of another. In their scientific work, the authors from Washington compare the results of their neural network with the program Face2Face. They explain that in the case of Face2Face, a video stream is always required for imitation, and their model works only by sound recording.