At the end of September, our first meeting for hardware developers was held - Yandex.Zhelezo. This is an important step in the new device manufacturers market for us. Participants (about 150 people) listened to the reports, talked and spent a lot of time at the stands where they could look inside the unmanned vehicle, defuse the “bomb” by cutting the necessary wires, disassemble Yandex.Station (record - 6 minutes 23 seconds), and also test On-board computer Yandex.Auto and smart home.
Just about the smart home platform and talk today. In the spring, we
launched it for all developers, and on Yandex.Zhelez platform development manager Marat Mavlyutov summed up the first results and showed how to establish device management. From the report you can learn about the terms of the voice API, ways of describing and interacting with the user's device.
- Let's talk about smart home. How to make your house a little smarter so that not a single cat remains hungry and all the gates in Yekaterinburg open?
Let's start from the very beginning, we will understand what a smart home is all about. It seems to us that this is a house where you do not need to look for the outlet that is hidden behind the curtain, or the switch. This is a house where you do not need to look for a remote control from the TV, a phone with an application that can control your kettle or light bulb. This is a house that understands you, in which a person uses the most native, natural interface for him - the voice.
Why did we climb there and what do we want to get? We really want to make sure that our voice assistant is just an assistant, so that he can not only turn on music or video, but help in the most natural, everyday things.
We also understand that we cannot write all the code in the world at all and integrate with all devices. That is why we want to provide this voice interface to developers, companies, people who already know how to make devices and make them cool. Ericsson says that by 2021 there will be 28 billion connected smart home devices worldwide. This means, for a minute, that if you imagine the whole globe as being Internet-connected, then each person will have an average of four devices.
Just before the launch, we conducted research to understand how people use smart homes, what they want to see, what they want to manage. We have chosen the three most top destinations:
- control of TVs, AV-receivers, media devices, etc.,
- control of light and lighting devices,
- temperature control - air conditioning, thermostat, battery, boiler, etc.
The next slide has statistics. For example, we started just four months ago, in May, and now we see that the average number of devices in our platform for each user is 3.8. I looked yesterday, it was 3.93. And two months ago this figure was 3.2. This means that people not only use smart homes, but also buy devices they like. We are proud of the following figure: 96% of users control their smart home using voice, although they all have an application through which these smart devices can also be controlled.
And we understand the limitations of the current API, there really is still very little that can be connected or described. But manufacturers, enthusiasts or developers were able to integrate with our platform so that we now see more than 800 different device models in it. These are exactly the models of devices: all kinds of teapots, air conditioners, televisions, etc.
I repeat, we have been in production for only four months, but such large distributed companies have already been able to integrate with our platform, and I believe that this is a great merit of the team. This suggests that our API is quite simple, such that people were able to integrate with Yandex in four months.
From indie developers and enthusiasts, we see people writing skills like smart home. Thus, they integrate our platform with other systems: openHAB, Homebridge, Home Assistant, for example, so that devices sharpened by the Apple ecosystem can also work with Alice. There are a few application cases from our partners. We thought that the smart home from Yandex will be aimed at those enthusiasts who are just starting to move this market forward. But people from completely different industries, almost without Yandex, came to us and said that they want to make installations with a smart home.
For example, there is a well-known case with the PIK developer and Rubetek. As one of the top offers for decorating apartments, they present a smart home on the Yandex platform. In such apartments, in such already existing showrooms, the user can come and ask Alice to make coffee, open the curtains or control the light. We are also working with office developers right now. They want, for example, to embed Alice in their meeting rooms so that they can buzz a meeting room, call another city or control, again, some kind of light fixtures. And we also start some experiments with hotels. You can ask for breakfast in your room, change your pillow for a warmer one or turn on some paid channel.
Let's now dive a little to the technical details. The scheme of working with a smart home is quite simple. There are many manufacturers of smart devices, and all of these devices can be controlled via a mobile phone. This means that all of these manufacturers have some kind of API with which the user clicks on the mobile phone, the mobile phone sends some requests to the cloud, respectively, of this manufacturer, and the device turns on, turns off, the brightness changes, some options.
Accordingly, it is in this direction that we want to integrate. We can say that the user does not poke on the phone, but speaks in a voice, for example. And send the exact same request to the cloud of this manufacturer. This is called cloud-to-cloud interaction. On the next slide it is described in detail.
That is, a person controls either using a mobile phone or voice. Further Yandex servers go from the cloud of the corresponding manufacturer, and the device turns on.
How does Yandex find out about the device the user has? For this we use the standard procedure. It is called Oauth2 account linking. The user just needs to go into the Yandex application, link what is called accounts. Roughly speaking, on the fingers, it works like this.
When we want to connect our account with Philips, the user enters their username, password, or tells us a special token, and we supposedly go with this token to the Philips user name.
The second major part of which the smart home protocol consists is voice intents. The first and most important is Discovery. We with a user token go to the cloud of the corresponding manufacturer, and the manufacturer tells us: the user has such devices. And then everything is simple. Query, Action. Yandex comes with a request to Query to find out what status the device is in now - the iron is off, on, or what temperature is on the air conditioner right now. And Action, this means that the current status needs to be changed. Unlink occurs when a user decides to break the bundle of accounts so Yandex completely forgot about all the devices that are.
Let's go down even deeper and see how all these intents work. First and foremost, these are device types - device type. Device types only affect device representations in the interface. These are special layouts in the mobile application. Plus, the most important thing is probably that the types of devices generalize some kind of voice representation, some voice commands. That is, it does not matter what the user calls his lamp, it should still respond to the word "light", for example. Or it should not be important at all, the user says “turn on the air conditioner” or “condo”. Moreover, this air conditioner can be called whatever.
And secondly, to understand how to control the device, we need to know that this device can. We call these things capabilities. That is, it's like a building block, which talk about what the device can do.
A little more about device type. At the very start, we had six, in my opinion. Now we have grown to such a quantity. For example, two weeks ago they released openable, and the guys were able to open their gates. They may now say: “Alice, open the gate” and not “Alice, turn on the gate”, for example.
Now let's talk about capabilities, what capabilities are available and how to make Alice understand how to manage your device.
The first, most important and simplest thing is on_off. Almost every device has this capability. To tell Alice that the device can turn on and off, just add these couple of Jason lines and define the retrievable flag. This flag means that you can find out from the current device whether it is on or off.
A simple example with a TV. You all probably have a TV at home, and looking at the TV remote, it’s impossible to understand if the TV is on or off, of course, if this remote is infrared.
The next type of capability that describes light fixtures is color_setting. It also has a retrievable flag. But most importantly, these two parameters are color_model. Using this parameter, the manufacturer tells us that he knows how to control color. This color can be in hsv or rbg mode.
And the second is the gradation of white. That is, it can be said that my light bulb can be cold white, warm yellow, and so on, so that the user can say: "please make the light warmer."
Next, we will go capabilities that generalize certain modes. A very good analogy with the interface is Radio buttons, when you need to choose one of several modes. Or think of a krutilochka-type washing machine, where there is definitely a “cotton linen”, “delicate wash” mode, etc.
Here it is important for us to find out exactly which instance of this mode is worth it. Current instance, there are already six of them. But it is precisely those that we implemented at the very first stage, this is an instance of the air conditioner - automatic operation, cooling, etc. Or, for example, there is instance, the operation mode of the fan is the slowest, average or, again, automatic.
And with regard to mode, we can say: "Please turn on the next mode." And it is very convenient for air conditioners or for the same washing machine.
Another capability is range. As applied to the analogy of interfaces, this slider from the minimum to the maximum value can regulate something. This slider also has instance. For example, this is temperature, volume, brightness, and so on, almost any range that can be described. This is a unit, because some people, as it turned out, to check the air conditioners, say the temperature in Fahrenheit. And these, accordingly, are completely different temperatures. When a person asks for inclusion in Fahrenheit or Celsius, this also needs to be understood.
The random access flag is probably well known to you, this is when we can give the exact figure on which to set the value in this range. A pretty simple example, again, with televisions. Volume can only be controlled up and down. And the temperature on the air conditioners can be specified accurately.
And the very description of range, when we know some minimum value, maximum value, and the small step with which we can change this value. On air conditioners, again, it can be a whole, or in fractions of tens.
The latter capability is similar to the interface, it's some kind of checkmark. Remember, in old computers there was such a turbo mode - you press, and the computer runs faster? Here you say: “Alice, turn off the sound”, we press the mute button and the sound disappears. We can say, probably, that this is some kind of binary mode.
And a combination of all these capabilities, all skills. We can describe all kinds of devices that are currently on the market.
For example, a smart light bulb. She knows how to turn on, off. She knows how to adjust the color, and she knows how to adjust the brightness. But if everything is simple with a light bulb, let's try together to describe some other device. For example, a kettle.
What do you think a smart kettle should be able to do? I gave you a hint, a screenshot from the Yandex application interface. How would you describe the kettle? Temperature. Yet? Yes, and turn it on and off. The kettle is a fairly simple device. He knows how to turn it on and off, and, for example, my favorite green tea, I want to make it at 85 degrees. Does it have water? Yes, a good point. Here we are waiting for information from the manufacturer.
I would like to tell you about other devices. One of the most sophisticated devices available right now is air conditioning. What do you think air conditioning should be able to do? Warm or cool. Adjust the temperature. Mode with curtains. Fan speed. All is correct. And the air conditioner, which can now be described in our capability, can do all this. He knows how to turn on, off, select the cooling mode, adjust the temperature and speed of the fan, with which it blows either cold or hot air. There may be plain air in the mode without air conditioning.
Let's go down even lower and see what kind of answers we expect from manufacturers when describing devices. This is intent discovery. There is a bit of YAML here, but it is pretty easy to read.
At the moment when the user associates the accounts, we ask the manufacturer what this user has, what devices - just to understand how to manage them.
First and foremost: we are waiting for the user_id of this user and a list of devices.
Only one device is described here: the user has a light bulb, devices.types.light. It can be, by the way, not only a light bulb. It can be some kind of RGB tape or a lawn mower with backlight. It doesn’t matter to us at all. The main thing is that it reacts to the word "light" and that in the interface we can draw the capability, which is responsible for light.
Our lawn mower seems to know how to turn on, off. She knows how to change the brightness. And she knows how to adjust the color. And not only color - this lighting fixture also has color temperature adjustment.
Suppose a user asks: “Alice, what is the state of my light bulb now?” Or “Alice, my light bulb is on?” Then we send such a question to the provider, we say that such a device needs to be asked what color mode it is in and whether the device itself is turned on.
If the user wants to change this mode, he can, for example, say: "Alice, turn off the light." There is a light bulb with the abc-123 ID, "Please turn it off," value false.
We are waiting for the device manufacturer on the other side of the cloud to answer us: okay, light abc-123, action_result, status DONE. So the light has turned off.
A little more about the scripts. We understand that users want to not only manage their devices individually, but also combine them into groups. For example, when they wake up, they say: “Alice, good morning,” and they want a certain combination of actions to be performed.
Accordingly, in the Yandex application, you can say so, and Alice will turn on some music, turn off the nightlight, the kettle will start working, boil water to make your favorite coffee.
About the plans. We understand that work in its current form, although it allows you to describe a huge number of devices, but, for example, does not allow you to receive information from sensors. We will not be able to configure any event in the scenarios so that we can understand by the sensor of leakage, smoke, or opening the door that something has happened. We aim to do it. We also understand that at the moment we are not able to control mediastream. The simplest example is that a courier came to your house with hot pizza, and a push with a photo of that person comes to your phone. This is super! Now it is impossible to do this with current capabilities - most likely, capabilities will appear that describe the cameras.
About sensors I will touch on a slippery topic - IFTTT. With IFTTT, we want to run a script not only in voice, as done now. We want to run them on different events: on the timer, schedules, sunrise or sunset and other events. And we really want the smart home to be not only for geeks who understand why they need to transfer their password from a Wi-Fi light bulb. We want to make sure that people who are absolutely not connected with the smart home, or who do not understand how it works, just buy some kind of kettle, washing machine, light bulb, it does not matter. And Alice said: “It looks like you have a device new to me in your house. Do you want to connect it? ”And that's it to work right away.
In conclusion, I want to say: to use this technology, a smart home, you only need to read the documentation and correctly describe your device using device types and existing capabilities. Thank.