From Beginner to Style Icons: how we made awards in 2GIS





Every day, 2GIS users help us maintain data accuracy: they inform about new companies, add traffic events, upload photos and write reviews. Previously, we could only thank them with words or arrange a giveaway. But over time, words are forgotten, and not everyone gets gifts. Therefore, we decided to make sure that all those who care about 2GIS see their contribution to the product and our gratitude for this.



So there were awards - virtual medals that we accrue for various kinds of tasks: upload photos to cafe cards, write reviews about theaters, specify the working hours of organizations and so on. Users see the rewards earned in their personal 2GIS profile and on the “My 2GIS” tab in the mobile application. There we show how much is left until the next achievement.



To implement this feature, we learned how to process a stream of events with a volume of 500 thousand records per hour (in places up to 50 thousand records per second) and analyze data from several services. And also - they added a little metaprogramming in order to simplify the configuration when developing new awards.



Together with the Rapter , we ’ll tell you what is under the hood of the award process.



Concept



In order to understand the complexity of the feature, you need to understand how the technical problem sounded. Then - consider the idea of ​​implementation and the general scheme of system components. This is what we will do in this section.



Requirements for Abstracts



Requirements - a rather boring thing, so we will not paint all the nuances, we will concentrate on the most important things:





Architectural idea



The idea of ​​implementation is not very complicated. It can be expressed thesisally:





Key Entities



The diagram below presents the main entities of the subject area and their relationship:







Two zones are distinguished in the diagram:





The entities in the diagram:





Service structure



Now consider the main components of the service and their dependencies:









Each of the components is isolated in terms of logic and areas of responsibility, which avoids unnecessary integrations and deadlocks when modifying data. Below we consider only part of the scheme associated with the processing of events and converting them into rewards.



Event handling



Content



Rewards are primarily a data aggregation service. Each master system generates several types of events. As a rule, each type of event is closely related to the state of the content, its status model. So, a photo can be moderated, deleted, blocked, hidden or active. All these are different events, which are handled by a separate worker who specializes in a particular source. At the moment, there is an interaction with the following sources (master systems):









Events generated by the master system fall into the Kafka topic in the order of changing their statuses, which makes it possible to move the progress of the award for the user not only forward, but also roll it back. For example, if the photo was in the “active” status, and then for some reason acquired the status of “blocked”, the progress on the award should change downward. Award progress is an interpretation of internal objects called content counters.



Counters may vary for different data. For example, for events about a photo, they are as follows: the number of approved, the number on moderation, the number of blocked, and for events of opening cards it is necessary to consider only the number of cards opened by the user. Based on the current values ​​of the content counters, for a particular user, within the framework of a particular award, the answers to the following questions are determined:





Filters and Rules



The job counters of a particular award are changed only if an event has arrived with the desired type of content, as well as with the necessary data necessary to receive the award.



In order to skip only the content that is suitable for the award, we run each event through a series of filters and rules.



A filter is a certain restriction that is imposed on content. He is only concerned with answering the question: “Is a new event suitable for this condition or not?”

A rule is a special filter, the purpose of which is to say: “If an event fits the condition, how should the counters change?” The rule includes an algorithm for changing counters. Each award contains only one rule.



The implementation of filters and rules is in the project code, and the description of which filters (rule) belong to a particular award is in the database in JSON format. We did not come to such a decision right away. Initially, filters and rules could not be set using the configuration through the database, the award was fully described in the code, only its identifier was stored in the table. This decision gave a number of significant drawbacks:





At the start of the application, we match by the name of filters loaded from the database and put them into a specific reward. Filter description example:



[ { "name":"SourceFilter", "config":{ "sources":["reviews"] } }, { "name": "ReviewsLengthFilter", "config": { "allowed_length": 100 } } ]
      
      





In this case, we will take only those reviews (this is indicated by the first description object from the filter array), the text of which contains more than 100 characters (the second filter in the list).



Example rule description:



 {"name": "ReviewUniqueByObjectRule","config":{}}
      
      





This rule will allow you to change the counters only if the user wrote a review for the object, while only one review will be taken into account for one object.



Bss



Let us dwell separately on working with the stream of BSS events. There are at least three reasons for this:





All this greatly influenced the processing of data from the BSS topic, since if we need realtime, then we need very close processing time.



To reduce the described differences, a separate service was created that prepares such events. The service can work with the whole variety of message formats coming from analytics. The essence of his work is as follows: the entire BSS stream of events is read, from which only those that are needed for the Awards are taken. Such a service filter significantly reduces the load (after filtering, the flow rate is ≈300 events per second) from the BSS-stream processor Rewards, and also generates events in a single format, leveling the disadvantage associated with the history of the development of internal analytics.



Awards



So, we figured out how to handle events and calculate progress on assignments. Now it’s time to review the process of issuing awards to users.



The first question that arises is: why allocate the output to a separate worker, can't it be recounted when processing each event? Answer: possible, but not worth it.



There are several reasons for allocating extradition to a separate process:



  1. Transferring the recount to each ConsumingWorker, we get the Race Condition for the operation of updating progress by reward, because each handler will try to update the progress based on the known state of the tasks, and others will actively change this state.
  2. Each ConsumingWorker batch processes events from Kafka in a transaction. By adding an insert to the user’s rewards table, we will call extra locks at the database level, which will inhibit other handlers.
  3. In the process of issuing awards, there is a logic for sending notifications, which will only slow down the processing of the flow of events, which is undesirable.


With the reasons for the emergence of a separate AchievesWorker (handler for issuing awards) figured out. Now you need to deal with two important parts of the processing:



  1. There is a set of quests in the reward. There is a set of counters for these tasks. How to understand how much the reward has been completed and how to express this code?

    Example: You need to write 3 reviews or upload 3 photos. The user has 1 review and 2 photos. What is the progress of the award? Answer: 3, because the user will definitely be sure that you need 3 in total.
  2. We have a separate processor for issuing awards. Each time, recounting several dozen awards for each authorized user, that is, several tens of millions, is unlikely to succeed quickly. How can he learn about the progress of which particular users and on what tasks has changed since the last processing?


We will consider each part separately.



Flow of progress



For a better understanding of how you can describe how to transform the progress of tasks into progress by reward, we divide the rewards into categories and look at the transformations.



“Complete one mission per X units.” Example: drive 10 km on the navigator.







"Complete several tasks for X units each." Example: upload 5 photos and write 5 reviews in cards - only 10 pieces of content.







"Complete several tasks for X units in total." Example: write 5 reviews or upload 5 photos.







“Complete several tasks grouped by type.” Example: upload 5 units of content (photos or reviews) and drive 10 km on the navigator.







Theoretically, there could be more complex nested combinations. However, in real conditions, it is not possible to explain to the user in two or three sentences the complex logical combination that must be performed to receive the award. Therefore, in most cases, these options are enough.



We called the conversion method a strategy and tried to make it more or less universal by working out a formal description in the form of a JSON object. You could, of course, think of writing in the form of a formula, but then you would have to use similarities to eval or describe the grammar and implement it, and this is clearly an overcomplication. Storing the strategy in the source code for each award is not very convenient, because the description of the award will break (part in the database, and part in the code), and it will also not allow collecting awards from ready-made components in the future without participation of the development.



The strategy is presented in the form of a tree, where each node:





To describe the above examples, one operation is enough - sum. Sum is great for clearly showing the user a progress with a single number, but other operations can be used if desired.



Here is an example strategy description for the last category:



 { "goal": 15, "operation": "sum", "strategy": [ { "goal": 5, "operation": "sum", "strategy": [ { "objective_id": "photo" }, { "objective_id": "reviews" } ] }, { "goal": 10, "operation": "sum", "strategy": [ { "objective_id": "navi", "normalization_factor": 0.001 } ] } ] }
      
      





Required Updates



There are several handlers that relentlessly analyze events by users and apply changes to the progress of tasks. A regular search of all users with each award will lead to an analysis of several tens of millions of awards - not very encouraging, provided that real updates will be measured in thousands. How to learn only about thousands and not waste millions of CPU?



The idea of ​​how to recalculate progress only on those awards that have actually changed, came pretty quickly. It is based on the use of vector watches.



Before the description I will remind of entities:





The implementation looks like this:





Immediately it is worth noting that the solution has limitations on the number of entries in the tables of awards and tasks, as well as on the frequency of changes in progress on tasks, but with a couple of tens of millions of awards and the number of updates less than a thousand per minute, it is quite possible to live with such a solution. Somehow we will separately tell about how we optimized the issuance for the contest “ 2GIS Agents ”.



findings



Despite the fact that the article turned out to be quite voluminous, a lot of nuances remained behind the scenes, since it would not be possible to talk about them briefly.



What conclusions have we made thanks to the Awards:






All Articles