Secrets, tips, tricks: how to create cool action for Google Assistant

The ecosystem around Google Assistant is growing fast. In April 2017, only 165 actions were available to users, and today only in English there are more than 4,500. How diverse and interesting the Russian-language corner of this universe will become depends on the developers. Dmitry Chechetkin, director of strategic projects at Just AI and author of “Yes, my lord,” one of the most popular games for the Assistant not only in Russia but also in the world (more than 240 thousand people have already played it), shares important life hacks for creating action games for Google Assistant. Why separate code and content from script? What should be remembered when working on a conversational interface? In general, we understand how to gash an action of a dream.







Shake but not mix: the role of script, content, and code



Any voice application consists of three components - an interactive script, content that the action interacts with, and programmable logic, i.e. code.



The scenario is perhaps the main thing. It describes what phrases a user can speak, how an action should react to them, what states it goes into, and how exactly it responds. I have been programming for 12 years, but when it comes to creating a conversational interface, I resort to various visual tools.



For starters, it doesn’t hurt to draw a simple outline of the script on paper. So you decide what and what follows in the dialogue. Then you can transfer the script to some product to visualize it. I use our visual constructor with NLU Aimylogic ( how to create an action for Google Assistant in Aimylogic ) to see how all the transitions in my dialog will work, check and validate the hypothesis itself and the idea of ​​what I want to implement. In Aimylogic, you can build a script without any in-depth programming skills, and you can immediately test the action. Google itself offers the Actions SDK for creating customized Dialogflow dialogs , and for simple and short scripts that do not require an extensive understanding of the language.







Programmable logic is often required. For example, your site may look cool, but in order for it to “know how”, it will have to refer to the code on the server - and the code will be able to calculate something, save and return the result. Same thing with the action script. The code should run smoothly, and better if completely free. Today there is no need to pay thousands of dollars for a code of 50, 100, 1000 lines to be available for your action 24/7. I use several services for this at once: Cloud Functions, Heroku, Webtask.io, Amazon Lambda. Free instances are available on the Google Cloud Platform.



The script can access the code using the simplest http calls we are all used to. But at the same time, the code and the script do not mix. And this is good, because you can keep both of these components up to date, expand them as you like, without complicating the work on the action.







The third component is content. This is data that can change all the time, without affecting the structure of the script itself. For example, quiz questions or episodes in our Yes Yes game. If the content lived with the script or with the code, then such a script would become more cumbersome. And in this case, no matter what tool you use to create an action, working with it will still be inconvenient for you. Therefore, I recommend storing the content separately: in the database, in a file in the cloud storage, or in a table that the script can also access through the API in order to receive data on the fly. Separating the content from the script and the code, you can attract other people to work on the action - they will be able to replenish the content independently of you. And the development of content is very important because the user expects from the action, to which he returns time after time, fresh and diverse content.







How to use ordinary tables in the cloud so as not to store all the content in the script itself? For example, in the game “First or Second” we used a cloud-based Excel spreadsheet where any of the project participants could add new questions and answers for the action. The Aimylogic script accesses this table using a single http request through a special API. As you can see, the script itself is small - because it does not store all the data from the table, which is updated every day. Thus, we separate the interactive script from the content, which allows us to work with the content independently and collectively replenish the script with fresh data. By the way, 80 thousand people have already played this game.



Checklist: Things to Remember When Creating a Conversational Interface



Any interface has components that the user interacts with: lists, buttons, pictures, and more. The conversational interface exists according to the same laws, but the fundamental difference is that a person communicates with the program by voice. From this we must build on, creating our own action.



The right action should not be able to do everything in the world. When a person speaks with a program, he cannot keep a lot of information in his head (remember how you listen to multi-storey personal offers from a bank or mobile operator by phone). Discard the superfluous and focus on one single, but the most important function of your service, which will be most conveniently performed using your voice, without touching the screen.

For example, you have a ticket service. You should not hope that the client will crank out the usual scenario with a voice - look for a ticket according to five or six criteria, choose between carriers, compare and pay. But an application that tells you the minimum price in the chosen direction may well come in handy: this is a very fast operation, and it is convenient to perform it by voice without opening the site, without having to go through the “form-filling” script each time (when you fill out the fields and select filters )



An action is about the voice, not about the service as a whole. The user should not regret that he launched the action in the Assistant, but did not go, for example, into the application or the site. But how to understand that one cannot do without a voice? To get started, try the idea of ​​action on yourself. If you can easily perform the same action without a voice, there will be no sense. One of my first Assistant apps was Yoga for the Eyes . This is such a virtual personal trainer that helps to do exercises for vision. There is no doubt that a voice is needed here: your eyes are busy with exercises, you are relaxed and focused on oral recommendations. To peek into the memo, distracting from training, would be inconvenient and ineffective.



Or here is an example of a failed script for a voice application. Often I hear about how another online store wants to sell something through a virtual assistant. But filling the basket with voice is inconvenient and impractical. And it is unlikely that the client will understand why he needs it. But the opportunity to repeat the last order by voice or to throw something on the go on the shopping list is another matter.



Remember about UX. The action should be along with the user: accompany and guide him in the course of the dialogue so that he easily understands what is required to be said. If a person comes to a standstill, he begins to think, “And what's next?” Is a failure. There is no need to hope that your user will always refer to the help. Deadlocks need to be monitored, and help the user with leading questions or tips. In the case of voice action, predictability is not a vice. For example, in our game “Yes, my lord”, each phrase ends so that the participant can answer either “yes” or “no”. He is not required to invent something on his own. And it's not that this is such an elementary game. It’s just that the rules are organized in such a way that everything is extremely clear to the user.





"He speaks well!" The action “hears” well thanks to the Assistant, and well “speaks” - thanks to the script developer. A recent update gave Google Assistant new voice options and a more realistic pronunciation. Everything is cool, but the developer should reflect on the phrase, its structure, sound, so that the user can understand everything the first time. Arrange stresses, use pauses to make action phrases sound human.



Never load the user. For action games, voicing news feeds or reading fairy tales to children, this is not a problem. But listening to a voice assistant’s speech endlessly when you want to order pizza is difficult. Try to make replicas concise, but not monosyllabic and varied (for example, to think through several options for greetings, farewells, and even phrases in case the assistant misunderstood something). The dialogue should sound natural and friendly, for this you can add elements of colloquial speech, emotions, interjection to phrases.



The user does not forgive stupidity. People often blame voice assistants for stupidity. And basically this happens when an assistant or an application for him cannot recognize different variations of the same phrase. Let your action be as simple as setting an alarm, it’s important that he nevertheless understands synonyms, different forms of words that are identical in meaning and do not fail if the user responds unpredictably.



How to get out of situations when an action refuses to understand? Firstly, you can diversify the answers in the Default Fallback Intent - use not only the standard provided, but also custom ones. And secondly, you can train the Fallback Intent with all sorts of spam phrases that are not related to the game. This will teach the application not only to adequately respond to irrelevant requests, but also increase the accuracy of the classification of other types of requests.







And one more tip. Never, never make a button menu out of your action to make the user's life easier - it annoys, distracts from the dialogue and makes you doubt the need to use voice.



Teach the politeness action. Even the coolest action should end. Ideally, goodbye, after which you want to return to him again. By the way, remember that if the action does not ask a question, but simply answers the user’s question, he must “close the microphone” (otherwise the application will not be moderated and will not be published). In the case of Aimylogic, you just need to add the “Script Completion” block to the script.

And if you are counting on retention, it is important to provide other rules of good tone in the script: the action should work in the context - remember the name and gender of the user and not ask again what has already been clarified.



How to work with ratings and reviews



Google Assistant users can rate action games and thereby influence their rating. Therefore, it is important to learn how to use the rating system to your advantage. It would seem that you just need to give the user a link to a page with your action and ask him to leave a review. But here there are rules. For example, do not offer to evaluate the action in the first message: the user must understand what he is rating. Wait until the application really fulfills some useful or interesting user mission, and only then offer to leave feedback.







And it’s better not to try to voice this request with your voice, with the help of speech synthesis - you only spend the user’s time. Moreover, he may not follow the link, but say “I bet five,” and this is not at all what you need in this case.



In the game "Yes, my lord", we display the link for the review only after the user has played the next round. And at the same time we do not voice the request, but simply display a link to the screen and offer to play again. I’ll pay attention again - offer this link when the user is guaranteed to receive some benefit or pleasure. If you do this at the wrong time, when the action does not understand something or slows down, you can get a negative feedback.



Three Secrets of Google Assistant



  1. The use of music. Of the voice assistants in Russian, only Google Assistant allows you to use music directly in the action script. The musical arrangement sounds great in game action games, and from yoga to music there are completely different sensations.



  2. Payment options inside the action. For in-app purchases, the Google Assistant uses the Google Play platform. The conditions for working with the platform for game action creators are the same as for mobile application developers - 70% of the transaction is deducted to the developer.
  3. Moderation . For successful moderation, the action must have a Personal Data Processing Policy. You need to place it on sites.google.com , indicate the name of your action and email - the same as that of the developer in the developer console, and write that the application does not use user data. Moderation of an action without transactions lasts 2-3 days, but moderation of an application with built-in payments can take 4-6 weeks. Details on the review procedure


In general, try our actions Yoga for the Eyes yourself, the games First or Second and Yes, my lord (and soon transactions will appear in it, and it will be easier for my lord to maintain her power and wealth!).



We also developed the Voice Quest “Lovecraft World” for Google Assistant - but it was not created in Aimylogic, but on Just AI Conversational Platform, a professional enterprise solution, so we will tell you about this action separately. This is an interactive drama in the mystical style “Call of Cthulhu”, where the scenes are voiced by professional actors, the plot can be controlled with the help of voice, and they can also make in-game payments.



All Articles