Legacy services in your infrastructure

Hello! My name is Pasha Chernyak, I am a leading developer at QIWI, and today I want to talk about the inevitable. About Legacy.



Let's start with the question: what is a Legacy service? Is a Legacy service a service that the developer has not touched for a week / month / year? Or is it a service that was written by a less experienced programmer, for example, specifically by you, but a year ago? And now you're cooler and more experienced. Or, after all, is a Legacy service a service that you decide to never commit again and are slowly preparing a replacement for it? In any case, leaving this service unattended and not updating is a time bomb that can explode later.







Before moving on to how we work with our Legacy-services in QIWI, I will tell you how we put things in order with the services in the Wallet. For two years now, I have been responsible for its performance. If there is any problem, they always call me first. I usually don’t have the audacity to call someone else at 11 pm, so I had to sit down and understand all the services of our domain.



But I, like any person, like to sleep at night, so I tried to deal with the operation: "Guys, why are you calling me?" To which he received a rather concise answer of the form "Who else?" Because I’m repairing services, and the guys just don’t know who to call.



Therefore, in one of the retrospectives of the Wallet backend team, we decided that we need to compile a plate on which is written a list of our services, microservices and monoliths of the wallet, and those responsible for them. Tablets are generally useful, to a reasonable extent.



In addition to information about who is responsible for what, there were answers to questions: who is the owner of the service, who is responsible for its development, for architecture and the life cycle. The people responsible for this service are people who can repair it if something happens. The owner of the service has the right to leave +2 in the commits, those responsible must also be present at the review before this service takes over the new commit.



As time went on, new practices began to be applied, for example, migration to Kubernetes, all sorts of checkstyle, spotbugs, ktlint, the availability of logs in kiban, autodiscovery services instead of specifying addresses directly and other usefulnesses. And everywhere our table allowed us to maintain the relevance of our services. For us, this is a kind of checklist that says that this service knows how to do this, but this is not there yet. But we went further, realizing that we lack information about our services, for which we monitor where the source codes of the service lie , where the assembly tasks are launched in TeamCity, how they are deployed, where the source codes of end2end tests, photos from groomings about architecture, about decisions made are stored. Ideally, I wanted all this information to lie somewhere and be at hand when needed. Therefore, our plate has become a departure point for finding information.



But QIWI, while retaining the spirit of a startup, is a big company. We are already 12 years old, and teams are changing: people are leaving, people are coming, new teams are being formed. And we found on our domain several services that we inherited. Something came with developers from other teams, something just somehow indirectly related to the Wallet, so the service is now on our balance sheet. Deal with what and how it works - why? The service works, and we have product features that must be washed down.



As happens



But at some point in time, we find that the service ceases to fulfill its function, something has broken - what should be done in this situation? The service just stopped working. Absolutely. And we learned about this, firstly, by chance, and secondly, six months later. It happens. The only thing we knew was on which virtual machines the service was deployed, where its sources lie, and that’s all. We make git clone and plunge into the thoughts of the person who wrote this several years ago, but what do we see? No Spring Boot familiar to us, although we are used to everything, we have a full stack and all that. Maybe there is a Spring Framework? But no.



The guy who wrote all this was harsh and wrote everything in pure Java. There are no familiar tools for the developer, and the idea arises - it would be necessary to rewrite it all. We have microservices, but from each toaster we hear the familiar “Guys, microservices are what you need!”. If suddenly something is wrong, you will calmly take any language and everything will be fine.



The thing is that now we do not have a customer who is responsible for this service. What were his business requirements, what should this service do in general? And the service is tightly integrated into your business processes.



Now tell me, how easy is it to rewrite a service without knowing its business requirements? The service is unclear how it is logged; whether there are metrics is unknown. What they are, if any, is all the more unknown. And while in the service a huge number of classes of obscure business logic. Something is included in some kind of database, about which we also do not know anything yet.



Where to start?



From the most logical - with the availability of tests. At least some kind of logic is usually written there and conclusions can be drawn about what is happening. Now TDD is fashionable, but we see that the same 5 years ago everything was almost the same as now: there are almost no unit tests, and they won’t tell us absolutely nothing. Well, except perhaps some kind of check how some xml is signed with some kind of custom certificate.



We couldn’t understand anything by the code, and we sent a look to see what the virtual machine was there. We opened the service logs, found an http-client error in them, a self-signed certificate that was sewn into the application’s resources was unscrupulously rotten. We contacted our analysts, they asked for a new certificate, they issued it to us and the service works again. That would seem to be all. Or not? Still, the service works, it performs some function that our business needs. We have some application development standards that you most likely have. For example, do not store the logs on the node in the folder, but store in some kind of storage, such as an elastic, look at them in the kiban. You can recall the golden metrics. That is, the load on the service, the number of requests for the service, whether it is alive or not, how HealthCheck goes with it. At least, these metrics will help you find out when it can be decommissioned and forgotten like a bad dream with a clear conscience.



What to do



Therefore, we add such an old service to the tablet, and then we go looking for volunteers from among the developers who will take care of the service and put it in order: they will write at least some information about the service, add links to dashboards in graphan, to assembly tasks, and understand how Deploy the application, do not upload files using ftp with your hands.



The main thing is how much all this useful volunteering will take? One sprint for a more or less experienced developer, for example, during a 20% technical debt. And how much time did it take to understand all the deep-rooted logic of communicating with a certain state system and bring it to newer technologies? I can’t vouch for this, maybe a month, or maybe two team work. This I say from the experience of integration at the current time with some new service.



At the same time, there is no exhaust of business value. Absolutely. To take the support service and spend a little time on it is normal. But after our standard dances with the service, we added it to the table, added information about it and, perhaps, someday we will rewrite it. But now it meets our service standards.



As a result, I would like to bring to a plan what to do with Legacy services.



Rewriting legacy from scratch is a bad idea

Seriously, you don’t even have to think about it. It is clear that I would like to, and some advantages are seen, but usually this is not necessary for anyone, including yourself.



Directory

Dig out the source codes of your applications, make a directory that will indicate what and where it lies and how it works, enter the project description there (conditional readme.md) to quickly understand where the logs and metrics are. A developer who will deal with this after you will only say thanks.



Understand the domain

If you own a domain, try to keep your finger on the pulse. It sounds corny, yes, but not everyone makes sure that services are in a single key. But working in one standard is actually significantly easier.



All Articles