How to collect cohorts of users in the form of graphs in Grafana [+ docker image with an example]

How to collect cohorts of users in the form of graphs in Grafana [+ docker image with an example]






How we solved the task of visualizing cohorts of users in the Promopult service using Grafana.







Promopult is a powerful service with a large number of users. Over 10 years of operation, the number of registrations in the system has exceeded one million. Those who have come across similar services know that this array of users is far from homogeneous.







Someone registered and “fell asleep” forever. Someone forgot the password and registered a couple more times in six months. Someone brings money to the cashier, and someone came for free tools . And it would be nice to get some profit from everyone.







On such large data arrays as ours, it makes no sense to analyze the behavior of an individual user and make micro-decisions. But catching trends and working with large groups is possible and necessary. What we, in fact, are doing.







Summary



  1. What is cohort analysis and why is it needed.
  2. How to make cohorts by month of user registration in SQL.
  3. How to transfer cohorts to Grafana .


If you already know what cohort analysis is, and how to do it in SQL, skip to the last section.







1. What is cohort analysis and why is it needed



Cohort analysis is a method based on a comparison of different groups (cohorts) of users. Most often, our groups are formed by the week or month in which the user started using the service. The user’s lifetime is calculated from here, and this is an indicator on the basis of which a rather complicated analysis can be carried out. For example, understand:









2. How to make cohorts in SQL?



The size of the article and common sense do not allow us to give our real data here - in the test dump, statistics for a year and a half are: 1200 users and 53,000 transactions. So that you can play with this data, we have prepared a docker image with MySQL and Grafana, in which you can feel it all yourself. Link to GitHub at the end of the article.







And here we show the creation of cohorts with a simplified example.







Suppose we have a service. Users register in it and spend money on services. Over time, users fall off. We want to know how long users live, and how many of them fall off after the 1st and 2nd month of using the service.







To answer these questions, we need to build cohorts by month of registration. Activity will be measured by expenses in each month. Instead of costs, there may be orders, a monthly fee, or any other time-bound activity.







Initial data



The examples are made in MySQL, but for the rest of the DBMS there should not be significant differences.







User table - users:







userId RegistrationDate
one 2019-01-01
2 2019-02-01
3 2019-02-10
four 2019-03-01


Spending table - billing:







userId Date Sum
one 2019-01-02 eleven
one 2019-02-22 eleven
2 2019-02-12 12
3 2019-02-11 13
3 2019-03-11 13
four 2019-03-01 14
four 2019-03-02 14


Select all user charges and registration date:







SELECT b.userId, b.Date, u.RegistrationDate FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
      
      





Result:







userId Date RegistrationDate
one 2019-01-02 2019-01-02
one 2019-02-22 2019-01-02
2 2019-02-12 2019-02-01
3 2019-02-11 2019-02-10
3 2019-03-11 2019-02-10
four 2019-03-01 2019-03-01
four 2019-03-02 2019-03-01


We build cohorts by months, for this we will convert all dates to months:







 DATE_FORMAT(Date, '%Y-%m')
      
      





Now we need to know how many months the user was active - this is the difference between the month of debiting and the month of registration. MySQL has a function PERIOD_DIFF () - the difference between two months. Add PERIOD_DIFF () to the request:







 SELECT b.userId, DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth, DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth, PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
      
      





userId Billingmonth RegistrationDate MonthsDiff
one 2019-01 2019-01 0
one 2019-02 2019-01 one
2 2019-02 2019-02 0
3 2019-02 2019-02 0
3 2019-03 2019-02 one
four 2019-03 2019-03 0
four 2019-03 2019-03 0


We count the users activated in each month - we group the records by BillingMonth, RegistrationMonth and MonthsDiff:







 SELECT COUNT(DISTINCT(b.userId)) AS UsersCount, DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth, DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth, PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId GROUP BY BillingMonth, RegistrationMonth, MonthsDiff
      
      





Result:







UsersCount Billingmonth Registrationmonth MonthsDiff
one 2019-01 2019-01 0
one 2019-02 2019-01 one
2 2019-02 2019-02 0
one 2019-03 2019-02 one
one 2019-03 2019-03 0


In January, February and March, one new user appeared - MonthsDiff = 0. One January user was active and in February RegistrationMonth = 2019-01, BillingMonth = 2019-02, and one February user was active in March.







On a large data set, patterns are naturally better seen.







How to Transfer Cohorts to Grafana



We learned how to form cohorts, but when there are a lot of records, analyzing them is no longer easy. Records can be exported to Excel and create beautiful tables, but this is not our method!







Cohorts can be displayed as interactive graphs in Grafana .







To do this, add another query to convert the data into a format suitable for Grafana:







 SELECT DATE_ADD(CONCAT(s.RegistrationMonth, '-01'), INTERVAL s.MonthsDiff MONTH) AS time_sec, SUM(s.Users) AS value, s.RegistrationMonth AS metric FROM ( ##  ,   SELECT COUNT(DISTINCT(b.userId)) AS Users, DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth, DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth, PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId WHERE u.RegistrationDate BETWEEN '2018-01-01' AND CURRENT_DATE GROUP BY BillingMonth, RegistrationMonth, MonthsDiff ) AS s GROUP BY time_sec, metric
      
      





And upload the data to Grafana.







Example chart from the demo :







How to collect cohorts of users in the form of graphs in Grafana [+ docker image with an example]








Touch:







The github repository with an example is a docker image with MySQL and Grafana, which can be run on your computer. The database already has demo data for a year and a half, from January 2018 to July 2019.







If you wish, you can upload your data to this image.







PS Articles about cohort analysis in SQL:







https://chartio.com/resources/tutorials/performing-cohort-analysis-using-mysql/







https://www.holistics.io/blog/calculate-cohort-retention-analysis-with-sql/








All Articles