For example, from the template:
[Hero] [passed | hero] past an inconspicuous yard and suddenly [noticed | hero] children playing. They ran with wooden swords, staves and masks of monsters. Suddenly one of the players stopped, put [toy | hero.weapon | vn] [hero.weapon | vn] , shouted: “ [I | hero] [great | hero] [Hero] ! Get it! ”- and rushed to the“ beast ”. They fell to the ground, jumped their arms and legs, and then stood up, took off their masks and laughed. [Grunted | hero] and [himself | hero] [Hero] , but did not [ start | hero] go out to the little one.We can get this text (changing words are bold):
Hallr walked past an inconspicuous courtyard and suddenly noticed children playing. They ran with wooden swords, staves and masks of monsters. Suddenly one of the players stopped, put a toy gilded sword , shouted: “ I am great Hallr ! Get it! ”- and rushed to the“ beast ”. They fell to the ground, jumped their arms and legs, and then stood up, took off their masks and laughed. Hallr himself grunted , but did not go out to the little one.Or such:
Fievara walked past an inconspicuous courtyard and suddenly noticed children playing. They ran with wooden swords, staves and masks of monsters. Suddenly one of the players stopped, put out a catarrh toy , shouted: “ I am the great Fievara ! Get it! ”- and rushed to the“ beast ”. They fell to the ground, jumped their arms and legs, and then stood up, took off their masks and laughed. Fievara herself grunted , but did not go out to the little one.
Couple reservations
Disclaimer 1 . I am not a linguist and the library was written “to work”, and not “to exactly comply with all the rules of the language”. Therefore, I apologize in advance for inaccuracies in terminology or an incomplete interpretation of the rules of the Russian language.
Disclaimer 2 . The library was developed about 5 years ago, now alternative means of text generation could appear (or grow to a normal state). For example, something interesting may be in the software for localization.
Disclaimer 2 . The library was developed about 5 years ago, now alternative means of text generation could appear (or grow to a normal state). For example, something interesting may be in the software for localization.
On the complexity of text generation
The Russian language is complex in many of its aspects. In particular, words have a large number of morphological forms. For example, adjectives can have a full and short form, varying by gender, number, case, animation and degree of comparison. The choice of a specific form depends on other words in the sentence. We say “beautiful woman”, but “beautiful man”. The word “beautiful” in this case depends on the words “man” / “woman” - its form is determined by the gender of the main word.
Therefore, difficulties begin already when we try to contact someone based on their gender. When compiling texts for websites, letters, games, one has to either come up with very neat wording (avoiding the user's gender), write several texts at once, or use markup languages ​​of varying degrees of versatility.
I wanted something more than a simple dependence on the gender of the player, and even so that users themselves could add new texts (and the "average" user is quite illiterate, as we all know :-)). Therefore, not finding the right software, I decided to do it myself.
Library features
UTG (universal text generator - not a very modest name) allows you to create templates for generating text with the following:
- variables (e.g. character name);
- dependencies of words on variables (for example, an adjective on a noun);
- Dependencies of some variables on others;
- Explicit properties of words and variables (for example, you can specify that the character name is inserted in the parent case);
When generating text from a template:
- The necessary properties of the main word are transferred to dependent words. For example, the gender of a noun is transferred to the adjective.
- The form of dependent words is consistent with numerals (taking into account the form of dependent words).
- Prepositions are modified if necessary (for example, about me / about you), the pretext for this should be marked out.
Additionally implemented:
- A dictionary for storing the necessary words.
- A template repository for storing them by type and choosing random ones.
The library “knows” about the existence of nouns, adjectives, pronouns, verbs, participles, numbers, prepositions and “quotes” (unchangeable text).
The following properties of words are taken into account: part of speech, case, animation, number, gender, verb form, time, person, type, adjective category, adjective degree, pronoun category, voice, preposition form, adjective form, participle form, noun form ( in addition to the normal form, nouns have a countable ).
Template format and usage example
Let's look at a simple template:
Yesterday [mob] [bitten | mob] [hero | ext] .Depending on the values ​​of the variables, the template may appear as such a phrase:
Yesterday, a hyena bit Hallre.so and so:
Fireflies bit a ghost yesterday.Consider the template in more detail:
-
- plain text. -
[mob]
- a variable, instead of which the name of the monster is substituted. -
[|mob]
- a word dependent on a variable, part of its properties will vary depending on the properties of the name of the monster (for example, a number). The text generator automatically recognizes the properties of the word form and tries to save them (for example, the elapsed time will be recognized and saved, so you do not need to specify it). -
[hero|]
- a variable, instead of which the name of the hero is substituted. Additionally indicated that the name should be in the accusative case.
More sample templates.
Some technical examples can be found in tests .
If you are interested in more examples, you can see them on the toy website. A link to it can be found by rummaging in my profile, or by writing in a personal.
If you are interested in more examples, you can see them on the toy website. A link to it can be found by rummaging in my profile, or by writing in a personal.
Both variable and dependent words in the template are highlighted identically and have the following format:
-
[
Is the opening square bracket. -
- dependent word or variable identifier. The generator first checks for the presence of a variable with this name; if there is no such variable, the word is searched in the dictionary. -
|
- vertical bar - separator, needed if we specify additional properties. -
- the variable on which the form of the word depends may be absent. -
|
- vertical bar - separator, needed if we specify additional properties. -
- a description of the required form of the word (case, gender, and so on). A list of them can be found on the project pages in github and pypi. -
]
Is the closing square bracket.
You can specify any additional properties as you like, they will be applied in the order of definition, for example:
[ 1| 2|,| 3|,,]
In most cases, the following formats are enough:
-
[]
- insert a variable in normal form (e.g., a noun in the nominative singular). -
[|]
- insert a variable with the specified properties. -
[|]
- insert a word, matching it with a variable (for example, the adjective "beautiful" with a noun in gender and case). -
[||]
- insert a word, matching it with a variable and specifying additional properties.
Note:
- Specifying properties for words and variables is valid only at the place of insertion, therefore, in order to get the phrase “beautiful hero”, we must indicate the accusative case explicitly for two words:
[|hero|] [hero|]
. - The text generator can “guess” the properties of a word by its form, for example, in the phrase
[hero] [|hero]
you can not specify the time of the verb. - Properties specified later overwrite properties specified previously. For example, in the phrase
[|hero] [hero|]
, the accusative of the adjective will not be established, as it will be replaced by the nominative case of the variable hero. - A list of word properties can be found on the library pages in github and pypi.
Code example
Python 3 required
Installation
The code.
Installation
pip install utg python -m unittest discover utg
The code.
from utg import relations as r from utg import dictionary from utg import words from utg import templates from utg import constructors ####################################### # ####################################### coins_forms = [# '', '', '', '', '', '', # '', '', '', '', '', '', # ( , # autofill_missed_forms) '', '', '', '', '', ''] # : , coins_properties = words.Properties(r.ANIMALITY.INANIMATE, r.GENDER.FEMININE) # , coins_word = words.Word(type=r.WORD_TYPE.NOUN, forms=coins_forms, properties=coins_properties) # . # , # : # - utg.data.WORDS_CACHES # - utg.data.INVERTED_WORDS_CACHES ############################## # ############################## # # ( utg.data.WORDS_CACHES[r.WORD_TYPE.VERB]) action_forms = (['', '', '', '', ''] + [''] * 15) # : , action_properties = words.Properties(r.ASPECT.PERFECTIVE, r.VOICE.DIRECT) action_word = words.Word(type=r.WORD_TYPE.VERB, forms=action_forms, properties=action_properties) # ( ) action_word.autofill_missed_forms() ############################################## # ############################################## test_dictionary = dictionary.Dictionary(words=[coins_word, action_word]) ################ # ################ template = templates.Template() # externals — , template.parse('[Npc] [|npc] [hero|] [coins] [|coins|].', externals=('hero', 'npc', 'coins')) ############################## # ############################## hero_forms = ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] # : , hero_properties = words.Properties(r.ANIMALITY.ANIMATE, r.GENDER.MASCULINE) hero = words.WordForm(words.Word(type=r.WORD_TYPE.NOUN, forms=hero_forms, properties=hero_properties)) npc_forms = ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] # : , npc_properties = words.Properties(r.ANIMALITY.ANIMATE, r.GENDER.FEMININE) npc = words.WordForm(words.Word(type=r.WORD_TYPE.NOUN, forms=npc_forms, properties=npc_properties)) ########################## # ########################## result = template.substitute(externals={'hero': hero, 'npc': npc, 'coins': constructors.construct_integer(125)}, dictionary=test_dictionary) ########################## # ########################## result == ' 125 .'
About dictionaries
As you may have noticed, UTG requires the formation of a dictionary. This is done “by hands” since at the time of development:
- I did not find any generally accessible qualitative morphological dictionaries.
- The pymorphy library was still the first version and quite often squinted (especially with the accusative case), because of which I had to abandon it.
If you want to use a generator with a lot of words, before you drive them in manually, try using pymorphy2, or look for a ready-made dictionary and export from it.
Total
I hope the library will be useful.
If you have ideas for its development (or even better, a desire to participate in it) - write in a personal message, do pull requests, post bugs to the github.