By some parameters, machine vision is superior to human. According to others, perhaps it will never catch up with us.
When engineers first decided to teach computers to see, they took for granted that computers would see everything just like people did. The first suggestions for computer vision from the 1960s were “apparently motivated by the characteristics of human vision,” said
John Tsotsos , an IT specialist at York University.
Since then, much has changed.
Computer vision has outgrown the stage of castles in the air and turned into an actively developing area. Today, computers are ahead of people in some tasks of pattern recognition, for example, in the classification of pictures (“dog or wolf?”) Or the detection of anomalies in medical photographs. And the process of processing visual data by "neural networks" is increasingly different from the process used by people.
Computers beat us in our own game, playing it according to other rules.
The underlying neural networks are pretty simple. They receive an input image and process it in several stages. First they recognize the pixels, then the faces and contours, then the entire objects, and in the end they give out a hunch about what they slipped. These systems are called direct distribution neural networks because their work is similar to a conveyor.
We don’t know much about human vision, but we know that it doesn’t work like that. In our recent story, “The
Mathematical Model Reveals the Secrets of Vision, ” we described a new mathematical model that attempts to explain the main mystery of human vision: how the visual cortex of the brain recreates vivid and accurate representations of the world based on the meager information it receives from the retina.
This model assumes that the visual cortex is able to work due to a sequence of neural feedback loops that process small changes in the data coming from the outside world into a diverse range of images that appear before our inner perception. This feedback process is very different from the direct propagation methods that computer vision works with.
“This work demonstrates how complex the visual cortex is, and in some ways, different” from computer vision, said
Jonathan Victor , a neuroscientist at Cornell University.
However, in some tasks, computer vision is superior to human. The question arises: is it even necessary to build computer vision schemes based on the human?
In a sense, the answer to it will be negative. Information reaching the visual cortex is limited by anatomy: a relatively small number of nerves connects the visual cortex to the outside world, which limits the amount of visual data that the visual cortex has to work with. Computers do not have such bandwidth problems, so there is no reason for them to work with a lack of information.
“If I had infinite computing power and infinite memory, would I need to limit the flow of information? Probably not, ”said Tsotsos. However, he thinks that neglecting human vision is imprudent.
The classification tasks in which computers have been successful these days are far too simple for computer vision, he says. To successfully solve these problems, you only need to find correlations in massive data sets. For more complex tasks, such as examining an object from different angles of view in order to recognize it (about how a person gets to know a statue, going around it from different sides), such correlations may not be enough. For their proper execution, computers may have to learn from a person.
Last year,
in an interview with our magazine, pioneer of artificial intelligence,
Judah Pearl, spoke of the same thing in a more general context, arguing that correlation training would not be enough for the development of AI systems in the long term.
For example, a key feature of human vision is a delayed reaction. We process visual information and come to the conclusion about what we see. When this conclusion does not suit us, we look at what is happening again, and often this second look more accurately tells us what is happening. Computer vision systems operating according to the direct distribution scheme do not have such an opportunity, because of which they often fail miserably even the simplest tasks of pattern recognition.
Human vision has another, less obvious and more important aspect that computer vision lacks.
The human visual system has been improving over the years. In
the 2019 work , which Tsotsos wrote with colleagues, it was found that the ability to suppress noisiness in a scene oversaturated with details and focus on what they need appears in people only at the age of about 17 years. Other researchers have found that the ability to recognize faces is constantly improving up to 20 years.
Computer vision systems work by digesting huge amounts of data. The underlying architecture is fixed and does not change over time as it happens in the brain. And if the underlying learning mechanisms are so different, will the results be different? Tsotsos believes that computer vision systems will eventually face reckoning.
“Learning from these deep learning methods is as far from human learning as possible,” he said. “Therefore, it seems to me that a dead end awaits them.” They will reach the limit of development beyond which they can no longer go. ”