But am I not doing bullshit again? How and why to implement quality metrics

Hello, Habr! Once we used the ā€œIt seems to be betterā€ metric to assess the quality of our releases. But then we decided to trust in something more reliable. In this article, Iā€™ll talk about how I searched for a metric guide, did not find it and created my own.







Does it happen that you do seemingly useful work for a project, but donā€™t understand whether this brings benefits? So we once wrote autotests, but could not objectively say whether the releases of the monolith and other services in which active development is better have become better.



Looking for metrics



I looked on the Internet and for some reason did not find articles or ready-made guides on how to choose the right metrics, how to collect them and what to do next with them. But while I was looking for information, I found useful videos and articles that helped me cope with this difficult task. Links to them will appear in the article.



I hope this article will be useful for those who are thinking about measuring anything on their project, but donā€™t know where to start. The article contains personal experience, information from articles, videos and paid courses.



One second before creating a quality measurement system
Before we decided to create a system of quality metrics, we already measured on an ongoing basis:



  • The time spent on the release of the monolith (from the moment of creation of the release branch to the merge of this branch in the master).
  • The number of monolith release rollbacks to the master due to bugs.
  • Time spent in Stop the Line .
  • The number of launches of the monolith pipeline stage in TeamCity with all autotests until it turned green.


As you can see, we measured only what is connected with the monolith. For other services, they did not measure anything.



We implement a quality measurement system in 11 steps



Here is a checklist of 11 steps that will help you implement everything and not miss anything.



Step 1. Define the purpose of your measurements



Understand why you want to start measuring something. Measuring just like that, for the sake of measurement, does not make sense.



For example, we wanted to know how we are moving towards the quality goals that we set ourselves earlier. We also wanted to see the dynamics of indicators after the effort. By themselves, the numbers of the current state do not mean anything. These are just numbers. But, observing the figures in dynamics, we can see the influence of our actions.



Step 2. Define Targets



You need to understand what you are striving for. Reduce testing time? Reduce the number of critical bugs in the prod? Increase test coverage?



In my case, there were no problems with setting target indicators, as our company has quality goals. These goals became the basis for future metrics. Our goals:





Step 3. Decide on the metrics



Think about how you realize you're moving toward your goals.

At this stage of the work, the article ā€œ The Most Important QA Metrics ā€ helped me.



For our system, I chose such indicators
  • Time to release . This indicator measures the time (in working hours) between the merge of the previous release branch in the master and the merge of the current release in the master.



    We divided this time into 4 stages: preparation of the stand, landscaping of the pipeline stage, manual regression testing, deployment to the prod.



    We divided this time into stages in order to see in detail the consequences of our actions and to be able to accurately determine the bottleneck in our process.



    Stages of the ā€œRelease Timeā€ metric
  • The coefficient of "Problem releases" for all services . This is the ratio of ā€œproblematic releasesā€ to the total number of releases, all this multiplied by 100%. A ā€œproblematic releaseā€ is a release in which there was a rollback of the release, a hotfix or a datafix.

    Ratio of problematic releases to total releases
  • The density of hotfixes for a service for a monolith is the ratio of the number of hotfixes for a service to the total number of hotfixes.
  • Manual regression time for the mobile application . This is the time from the beginning of manual regression to its completion.




Important! Do not take many metrics at once. Three or four is enough for a start. When the process improves, you can add more if necessary.



Many metrics are difficult to manage. The probability is growing that the system will not take off. And if the process does not take off the first time, then the next time it will be more difficult to start, since you and the employees will have negative experience.



Step 4. Decide on the units



Different indicators can be read in different units. You must immediately choose that all metrics have one unit of measure, otherwise you may encounter misunderstanding and misinterpretation.



We have problems with this item. We counted the release time in hours, including night hours, but excluding weekends. At the same time, the target value was release in 4 hours. Quite often there were situations when we created the release-xxx branch at 16:00 today, and ended at 10:00 the next day. In our metric, it was considered 18 hours, but in fact, active actions were carried out only 3 hours, if not less.



If we continued to count in this way, we would never have reached the ā€œ4 hoursā€ indicator in our metric. Having confronted the choice, increase the target to 12 hours or take into account only working hours, we chose the second.



Step 5. Analysis of the selected metrics for suitability



In the video ā€œ Simple Practice Testing Metrics, ā€ the speaker suggested a cool way to analyze suitability metrics. You need to answer 9 questions for each metric and make a decision.



Suitability Analysis
  • The purpose of the measurement . This indicator should be related to the business goal. The metric "Time to Release" is related to the business goal - release in 4 hours.
  • For whom this metric is intended . Who will look at this metric? Product obner, developers, managers, testers, scrum masters?



    The product ouncer (because it is important for him to understand how many releases per sprint we manage to roll out), developers (because they want to understand when their code will be on the prod) and testers (since time is on testing directly affects this metric).
  • What question is answered by the user metric . Formulate the questions that you get answered with this metric. The ā€œRelease Timeā€ metric answers the question ā€œHow often do we release?ā€
  • State the idea of ā€‹ā€‹the metric and its description. Briefly but clearly describe the metric. I described the ā€œRelease Timeā€ metric as follows: ā€œWe want to be released as often as possible, this metric will show how quickly we release. Release time is the time in business hours from 9:00 to 18:00, excluding weekends and holidays. The start of a release is considered to be the creation of a release branch or merge of the previous release into the master, the end of the release is the injection of the release branch into the master. Break the time into separate stages, for example: preparation for release, passing autotests, manual testing, calculation for sales Ā»
  • Necessary conditions . List the conditions or restrictions for collecting metrics here. Who, when, where will the data for metrics come from. In my case, I know where to watch releases of all parts. Monolith - merge release-xxx branches into the master. Website - potatoes in Kaiten.io on the release board. Applications - I donā€™t know yet, but Iā€™ll find out "
  • Initial measurements. But I did not understand this point and do not know how to describe it. Who understood or knows what can be discussed here, write in the comments.
  • Indicate the formula for calculating the metric. For the metric ā€œTime to Releaseā€: how much time in working hours has elapsed from the merge of the previous release to the master to the merge of the current release to the master (excluding weekends and holidays). As a result, we get the working hours that we spent on the release.
  • Decision Criteria. Determine what you will do when you see changes to this metric. Describe your reaction. My answer on the ā€œTime to Releaseā€ metric is: ā€œYou need to respond to the metric by searching for bottlenecks and eliminating these bottlenecksā€
  • Periodicity. How often will we collect the metric. We were going to check our metric weekly, but in fact we do it more often.




After such a simple analysis, it immediately becomes clear whether you need this metric or not. There is a deeper understanding of the metric itself and its value for the company and you.



Step 6. Align metrics with stakeholders



Show selected metrics to those they will affect. Discuss the limitations that you found out during the analysis phase, as well as ways to eliminate them, or at least reduce them. It is especially important to obtain the consent and approval of those who will collect and fill out these metrics.



I discussed my metrics in 3 stages: with testers, developers, and product overs. Only after everyone agreed explicitly that these metrics show the quality of the system, was I able to move on to the next step.



Step 7. Visualize the results



People will not read the tables and watch the dynamics on their own. Therefore, you need to take care of visibility.



I made a table in Google Sheets, wrote formulas, and I was pleased to present the table to my colleagues. Our CTO suggested visualizing these metrics. More precisely, to make sure that the current state of the system is clear in 15 seconds: has it become better compared to the previous sprint or has the quality decreased.



Together, we visualized the indicators. Then I asked people to tell what they saw on this chart. Judging by the answers, we have achieved the goal.







This is how the visualization of the release quality metric looks like. Everything is clear, you can immediately see how it is now and how it was, whether the number of problems exceeds the number of releases, it has become better or worse compared to previous releases. In an ideal schedule, the blue line should tend to infinity, and the red line should tend to 0.





Visualization of the relationship of ā€œproblematic releasesā€ to the total number of releases



Step 8. Observe the frequency of collecting metrics



It is important to establish the process of collecting metrics, to work on the frequency. If there is no process, then your dashboard will lose its relevance and die. It is important that there are stakeholders who will do this. But if you are concerned about this, then the person concerned is already there.



Step 9. Again and again inform people about the results.



No matter how beautiful your dashboard is, people will not go there and look at the metrics. Once everyone will see, as this is something new, but not on an ongoing basis.



We solve this problem in three ways.
  • A story about metrics on the common part of our sprint review.
  • Conclusion of graphs on the monitor in the corridor, which everyone sees every day, so that the numbers and graphs are always before your eyes.
  • Publish a dashboard brief on Slack. The main thing is to show the dynamics when publishing such reports: it has become better or worse compared to the previous sprint. And if you publish this before the team retro, it can give the guys topics for discussion.




Step 10. Analyze and make decisions



You need to look at the metrics, make decisions based on them. You can use metrics as an additional argument in favor of writing additional tests or focusing on technical debt, rather than on business features, etc.



Step 11. Automate



Automate metrics collection as much as possible. If you use the popular TaskMS and TestMS version control systems, CI / CD systems, most likely they all have an open API with which you can easily pull out this information. If you canā€™t do it yourself, ask the developers for help. You may need to change some processes for this. This is normal. And this is a low price for the benefits that you get by starting to collect metrics.



For example, we have a bot that helps releasemen roll release and reduces their routine.



Summary and Conclusions



Making decisions that affect the quality of the product based on your inner feelings is a bad idea. Feelings can deceive and push you to the wrong decision. So just get the metrics and quality assessment system.



But remember that a metric establishment is like a pet establishment. In addition to the profit from communicating with a new friend, you get a certain responsibility and obligations to him. Therefore, start the metrics consciously, with an understanding of their need and readiness to overcome the difficulties that await you on the way.



All Articles