Text generation in Russian by templates

When I was just starting to work on my text game, I decided that one of its main features should be beautiful artistic descriptions of the actions of the characters. Partly I wanted to “save”, because I did not know how to schedule it. Saving did not work, but it turned out Python library ( github , pypi ) for generating texts, taking into account the dependence of words and their grammatical features.



For example, from the template:

[Hero] [passed | hero] past an inconspicuous yard and suddenly [noticed | hero] children playing. They ran with wooden swords, staves and masks of monsters. Suddenly one of the players stopped, put [toy | hero.weapon | vn] [hero.weapon | vn] , shouted: “ [I | hero] [great | hero] [Hero] ! Get it! ”- and rushed to the“ beast ”. They fell to the ground, jumped their arms and legs, and then stood up, took off their masks and laughed. [Grunted | hero] and [himself | hero] [Hero] , but did not [ start | hero] go out to the little one.
We can get this text (changing words are bold):

Hallr walked past an inconspicuous courtyard and suddenly noticed children playing. They ran with wooden swords, staves and masks of monsters. Suddenly one of the players stopped, put a toy gilded sword , shouted: “ I am great Hallr ! Get it! ”- and rushed to the“ beast ”. They fell to the ground, jumped their arms and legs, and then stood up, took off their masks and laughed. Hallr himself grunted , but did not go out to the little one.
Or such:

Fievara walked past an inconspicuous courtyard and suddenly noticed children playing. They ran with wooden swords, staves and masks of monsters. Suddenly one of the players stopped, put out a catarrh toy , shouted: “ I am the great Fievara ! Get it! ”- and rushed to the“ beast ”. They fell to the ground, jumped their arms and legs, and then stood up, took off their masks and laughed. Fievara herself grunted , but did not go out to the little one.


Couple reservations
Disclaimer 1 . I am not a linguist and the library was written “to work”, and not “to exactly comply with all the rules of the language”. Therefore, I apologize in advance for inaccuracies in terminology or an incomplete interpretation of the rules of the Russian language.



Disclaimer 2 . The library was developed about 5 years ago, now alternative means of text generation could appear (or grow to a normal state). For example, something interesting may be in the software for localization.



On the complexity of text generation



The Russian language is complex in many of its aspects. In particular, words have a large number of morphological forms. For example, adjectives can have a full and short form, varying by gender, number, case, animation and degree of comparison. The choice of a specific form depends on other words in the sentence. We say “beautiful woman”, but “beautiful man”. The word “beautiful” in this case depends on the words “man” / “woman” - its form is determined by the gender of the main word.



Therefore, difficulties begin already when we try to contact someone based on their gender. When compiling texts for websites, letters, games, one has to either come up with very neat wording (avoiding the user's gender), write several texts at once, or use markup languages ​​of varying degrees of versatility.



I wanted something more than a simple dependence on the gender of the player, and even so that users themselves could add new texts (and the "average" user is quite illiterate, as we all know :-)). Therefore, not finding the right software, I decided to do it myself.



Library features



UTG (universal text generator - not a very modest name) allows you to create templates for generating text with the following:





When generating text from a template:





Additionally implemented:





The library “knows” about the existence of nouns, adjectives, pronouns, verbs, participles, numbers, prepositions and “quotes” (unchangeable text).



The following properties of words are taken into account: part of speech, case, animation, number, gender, verb form, time, person, type, adjective category, adjective degree, pronoun category, voice, preposition form, adjective form, participle form, noun form ( in addition to the normal form, nouns have a countable ).



Template format and usage example



Let's look at a simple template:

Yesterday [mob] [bitten | mob] [hero | ext] .
Depending on the values ​​of the variables, the template may appear as such a phrase:

Yesterday, a hyena bit Hallre.
so and so:

Fireflies bit a ghost yesterday.
Consider the template in more detail:





More sample templates.
Some technical examples can be found in tests .



If you are interested in more examples, you can see them on the toy website. A link to it can be found by rummaging in my profile, or by writing in a personal.



Both variable and dependent words in the template are highlighted identically and have the following format:





You can specify any additional properties as you like, they will be applied in the order of definition, for example:



[ 1| 2|,| 3|,,]







In most cases, the following formats are enough:





Note:





Code example
Python 3 required



Installation



 pip install utg python -m unittest discover utg
      
      





The code.



 from utg import relations as r from utg import dictionary from utg import words from utg import templates from utg import constructors ####################################### #     ####################################### coins_forms = [#   '', '', '', '', '', '', #   '', '', '', '', '', '', #   (  , #     autofill_missed_forms) '', '', '', '', '', ''] # : ,   coins_properties = words.Properties(r.ANIMALITY.INANIMATE, r.GENDER.FEMININE) #      ,    coins_word = words.Word(type=r.WORD_TYPE.NOUN, forms=coins_forms, properties=coins_properties) #        . #      , #          : # - utg.data.WORDS_CACHES # - utg.data.INVERTED_WORDS_CACHES ############################## #     ############################## #       # (     utg.data.WORDS_CACHES[r.WORD_TYPE.VERB]) action_forms = (['', '', '', '', ''] + [''] * 15) # : ,   action_properties = words.Properties(r.ASPECT.PERFECTIVE, r.VOICE.DIRECT) action_word = words.Word(type=r.WORD_TYPE.VERB, forms=action_forms, properties=action_properties) #       (  ) action_word.autofill_missed_forms() ############################################## #       ############################################## test_dictionary = dictionary.Dictionary(words=[coins_word, action_word]) ################ #   ################ template = templates.Template() # externals —  ,      template.parse('[Npc] [|npc] [hero|] [coins] [|coins|].', externals=('hero', 'npc', 'coins')) ############################## #    ############################## hero_forms = ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] # : ,   hero_properties = words.Properties(r.ANIMALITY.ANIMATE, r.GENDER.MASCULINE) hero = words.WordForm(words.Word(type=r.WORD_TYPE.NOUN, forms=hero_forms, properties=hero_properties)) npc_forms = ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] # : ,   npc_properties = words.Properties(r.ANIMALITY.ANIMATE, r.GENDER.FEMININE) npc = words.WordForm(words.Word(type=r.WORD_TYPE.NOUN, forms=npc_forms, properties=npc_properties)) ########################## #   ########################## result = template.substitute(externals={'hero': hero, 'npc': npc, 'coins': constructors.construct_integer(125)}, dictionary=test_dictionary) ########################## #  ########################## result == '   125 .'
      
      





About dictionaries



As you may have noticed, UTG requires the formation of a dictionary. This is done “by hands” since at the time of development:





If you want to use a generator with a lot of words, before you drive them in manually, try using pymorphy2, or look for a ready-made dictionary and export from it.



Total



I hope the library will be useful.



If you have ideas for its development (or even better, a desire to participate in it) - write in a personal message, do pull requests, post bugs to the github.



All Articles