Henry Ford once said: "The best car is a new car." So we in the Tinkoff group of companies think about software releases. Inertness in the process of delivering features and urgent fixes sooner or later leads to a large technical debt to the customer and most often ends with stagnation of the project as a whole.
Guaranteeing a high time to market while maintaining quality is no easy task. From my point of view, you cannot immediately build rails on which it will be possible to quickly and conveniently deliver changes many months after the start. A project’s growth is usually accompanied by an increase in the number of people working on it, which means it creates a potential source of chaos within your releases.
Our experience is hardly worth considering as instructions for success, but a number of ideas seemed interesting enough to me to share with you. Let's get started.
The starting point of our history is technically as follows. The system consists of several services, the code base of which is rummaged by about 30 people - several teams that are divided by business areas, for example, servicing individuals and legal entities. I will try to ignore the technical and architectural details of the project in order to focus on the release process itself.
We work on Gitflow , so the article will be primarily interesting to those who chose this particular delivery method.
Problems
The first problem we encountered is related to insufficient automation of processes . The implementation of all release activities in our team is tied to the role of a release manager (RM).
Here are some of them:
- Leaving release branches and creating tags.
- Merging with master and develop branches.
- Build and deploy artifacts that are included in the release.
- Communication with specialists involved in the process - for example, with QA or admins.
These routine tasks require more and more resources as the project expands (in our case, an increase in the number of services), so the first step is to automate everything that can be automated. We integrate with the CI / CD tool in order to allocate and merge release branches, launch test builds and deploy artifacts; with task-tracking-tool and corporate messenger for timely notification of the participant responsible for the next step - for example, change the status of the task and configure the hook to send a notification.
The release manager will still have to manually resolve potential conflicts arising from merging branches, but quick and frequent releases should reduce their number to nothing.
The next problem is testing .
One of the build criteria for us is the successful execution of the tests. It is customary to divide tests into at least two types: unit and integration.
Unit tests allow you to check for correctness of individual modules of the program source code, which in practice most often comes down to checking one or more methods that have an obvious logical connection.
Integration tests usually check the operability of a whole cascade of such modules, that is, the functioning of an entire feature on the client side. For example, if a rest-interface was implemented in a task, then we will check the operability of authorization, deserialization of the request itself, the validity of the transferred fields, integration with other services and databases, as well as the business logic itself. At first glance, it might seem that such tests are very self-sufficient and are able to cover all potential problem areas. There is no need to understand how each individual brick works, and the called interface encapsulates all the logic in order to get a simple answer on the output: whether it works or does not work.
In fact, they create a number of deferred problems, here are some of them:
- The participation of a large number of tested components proportionally affects the assembly time and execution of such tests.
- Encapsulating the test logic often leads to the fact that it is difficult to guarantee the correctness of the test result. Often we customize the test for the result, and even more often the result matches the expectation due to random side effects.
- The relevance of test data is lost.
- Integrations with third-party systems, especially on test environments, often fall. This negates the time spent on running, as it is not always obvious: this is a temporary drop or breakdown caused by our changes.
For most of the problems have already come up with a solution. But, as usual, solutions do not come without additional restrictions or new problems.
Choosing the right tests for you and implementing them correctly is a very difficult task. In addition, it is important to strike a balance between coverage quality and build speed in order to optimize your releases.
In our case, we settled on a hybrid. We continue to raise all the necessary components for a full feature test, simultaneously washing all possible integrations. We use Pact to save API contracts, and Testcontainers to test integration with the database.
Such an approach to writing tests as a result resulted in a solution to the third problem - a long time for manual testing of a task . The stability of hybrid tests has led to the idea of attracting a QA engineer at the stage of specifying the task for compiling test cases - this will allow them to be skipped at the stage of manual testing. Integration with useful products such as TestRail and Allure has become a kind of bridge between the developer and the tester. A contract is created, the execution of which is step by step reflected in the report generated during the test assembly.
It remains to connect reports to your task-tracking tool for transparent task tracking. A clear story will also reduce the time it takes to compile and implement tests for future related tasks.
Thus, QA engineers save enough time to focus on checking exceptional cases and integrating with other systems.
This is the last problem. For manual testing, all tasks merge from the feature branches in develop and launch a deploy to the test bench.
Firstly, with this approach, you can’t talk about pure testing of the feature, since in parallel the related changes of other developers may fall into develop.
Secondly, when releasing the release branch, it may turn out that QA did not have time to test some of the tasks. There is a choice: roll back the changes affected by these tasks, or slow down the release until the end of testing.
Such a choice will have to be made constantly, unless you isolate your test environment . It is important that there are no changes in the component under test other than those made by the task. In our case, we need to be able to pick up one or several services whose branches have gotten changes, and manage routing inside the cluster, directing to these instances only the requests we need from QA. The task is complicated if balancing mechanisms are already used on the cluster, which also have to be taken into account.
Having realized this opportunity, we began to conduct manual testing directly on separate feature-branches: deploy the desired service for the duration of the test, integrating into the overall environment in isolation. By merging only ready-made tasks into the develop branch and getting rid of locks, we ourselves began to determine which changes should be included in the release and which should not.
It is worth noting that such a solution is unlikely to become a silver bullet for those teams that make changes to almost the same files. In this case, the storage of features is inversely proportional to the number of conflicts that arise during the merger. In practice, with fairly frequent releases, this happens extremely rarely in a team with fifty developers.
Instead of output
Summarizing, we can identify the main approaches that can help with accelerating releases:
- Automation of routine operations.
- Transparency of processes for all involved.
- The necessary balance between speed and quality in writing automated tests.
- Reduced time for manual testing.
- Isolation of the environment for testing.
It is important to understand that in the choice of various approaches, principles and strategies it is always worth starting from the context of your problem. There are many equally reliable and more sophisticated ways to expedite the delivery of your software to the client. An attempt to respond to emerging difficulties has led to the conclusions described above. For us, they were the first step towards a bolder approach to release releases .