PartNet - a new semantic database of everyday objects, taking the degree of understanding by robots of the surrounding world to a new level
The database contains at least 26,671 3D models of 24 categories of objects, each of which is equipped with detailed three-dimensional information.
One of the human abilities that allows us to adapt so well to the world around us is to be able to understand different things at once in whole categories, and then use this generalized understanding to deal with some specific things that we have not encountered before. Imagine, for example, a lamp. No one has seen all the lamps in the world. But in most cases, when we first enter a new house, we can easily find all the lamps there and understand how they work. Of course, sometimes we can meet something very strange that will lead us to ask: “Wow, is that a lamp? And how to turn it on? ”But in most cases, our generalized mental lamp model saves us.
It helps us that lamps, like other categories of objects, by definition, have many common components. Lamps usually have bulbs. They usually have a lampshade. They also probably have a stand to keep them from falling, a tripod to rise above the floor, and a power cord. If you see an object that has all these signs, then this is probably a lamp, and when you understand this, you can make an educated guess about how to use it.
This level of understanding is often given especially badly to robots, which is unpleasant, because this is a very useful thing. You could even say that we can trust robots to work autonomously in an unstructured environment only when they can understand objects at a level close to that described. At the CVPR 2019 computer vision and pattern recognition conference, a team of researchers from Stanford, the University of California, the University of San Francisco and Intel announced the creation of PartNet , a huge database of everyday three-dimensional objects, broken down into parts and described to such a level that they hope the creators of the base, will help robots understand what a lamp is.
Examples of forms with described small details of objects from 24 categories
PartNet is a subset of ShapeNet, an even larger 3D base of 50,000 everyday objects. PartNet contains 26,671 objects from 24 categories (for example, doors, tables, chairs, lamps, microwaves, clocks), and each of the objects is divided into marked parts. Here's what it looks like in the case of two completely different lamps:
The properties of objects in PartNet are arranged by experts in hierarchical structures for each of the categories, for example, for lamps.The template includes objects of various types, such as a table lamp (left) and a ceiling lamp (right).The template was designed as a deep and comprehensive, covering structurally different types of lamps;at the same time, components that are identical in concept, such as a light bulb or lampshade, appear in different types.
Outstanding PartNet base makes the markup of all the small details. Databases of the ShapeNet type usually simply contain statements like “this whole bunch of things are lamps”, and the usefulness of such databases is limited. PartNet, on the contrary, offers a way to understand lamps at a fundamental level: what parts they consist of, what control they have, etc. This not only helps to much better generalize the detection of lamps that the computer has not met before, but also allows the autonomous system to guess how to interact productively with new lamps.
As you can imagine, creating PartNet was a very time-consuming task. Almost 70 “professional compilers” spent an average of 8 minutes each each of these 26671 3D forms describing 573 585 parts, and then each description was checked by at least one other compiler. To maintain uniformity, templates were created for each class of objects, which were supposed to minimize the set of parts, but at the same time ensure that the database comprehensively describes everything necessary to determine the entire class of objects. Components of objects are also hierarchically organized, and smaller components are part of larger ones. Here's how it is painted:
In order for this data to be useful outside PartNet, robots need to learn how to independently conduct three-dimensional segmentation, accepting a three-dimensional model of the object (created by the robot itself) and breaking it into parts that can be identified and associated with existing models of objects. This is difficult to do for many reasons: for example, you need to be able to identify individual parts from clouds of points that can be small but important (such as the handles of drawers), and many parts of objects that look like can be semantically different .
Researchers have made some progress in this area, but these issues require further work. PartNet will also help in this, providing a data set that can be used to develop improved algorithms. At some point, PartNet may become part of the foundation of systems that can even completely independently build similar 3D models, just like datasets for robomobiles go from human-collected to computer-assembled under human supervision. Going to such a level of semantic understanding of an unfamiliar and unstructured environment will be key to creating robots that can adapt to the real world that we have been waiting for for so long.