Moved from Terraform to CloudFormation - and regretted

Presenting the infrastructure as code in repeatable text format is a simple best practice for systems that you don't need to carry around with. This practice has a name - Infrastructure as Code , and so far there are two popular tools for its implementation, especially in AWS: Terraform and CloudFormation .









Compare experience with Terraform and CloudFormation







Before joining Twitch (aka Amazon Jr. ), I worked in one startup and used Terraform for three years. In a new place, I also used Terraform with might and main, and then the company pushed the transition to everything a la Amazon, including CloudFormation. I worked hard on developing best practices for both, and I used both tools in very complex workflows across the organization. Later, after carefully considering the consequences of switching from Terraform to CloudFormation, I became convinced that Terraform was probably the best choice for the organization.







Terraform Horrible



Beta Software



Terraform has not even released version 1.0, and this is a good reason not to use it. Since I first tried it myself, it has changed a lot, but then terraform apply



often broke after several updates or just after a couple of years of operation. I would say that "now everything is different", but ... so it seems everyone says no? There are changes that are incompatible with previous versions, although they are appropriate, and even the feeling is that the syntax and abstractions of resource storages are now what you need. The instrument seemed to be better, but ...: -0







AWS, on the other hand, did a good job of maintaining compatibility with previous versions. All, probably, because their services are often tested well within the organization and only then, renamed, published. So "tried hard" is still weakly said. Maintaining compatibility with previous versions of the API for such a multivariate and complex system like AWS is incredibly difficult. Anyone who has had to support publicly available APIs that are used just as widely must understand how difficult it has been for so many years. But the behavior of CloudFormation in my memory has never changed over the years.







Meet the leg ... it's a bullet



As far as I know, it is not possible to remove a third-party CloudFormation stack resource from my CF stack. The situation is similar with Terraform. It allows you to import existing resources into your stack. The function, one might say, is awesome, but with great strength comes great responsibility. One has only to put the resource on the stack, and while you are working with your stack, you cannot delete or change this resource. Once it came around. Somehow, on a Twitch site, someone, without plotting anything wrong, accidentally imported an AWS security group into their own Terraform stack. I entered several commands and ... the security group (along with incoming traffic) disappeared.







Terraform Great



Partial recovery



Sometimes CloudFormation cannot completely transition from one state to another. At the same time, he will try to return to the previous one. Sorry, this is not always feasible. Then debugging what happened is scary - you never know if CloudFormation will be delighted that it is cracked - even for repair. And it will turn out or not to return to the previous state, he really does not know how to determine, and by default he hangs for hours waiting for a miracle.







Terraform, on the contrary, is inclined to recover from unsuccessful transitions much more elegantly and offers advanced debugging tools.







Clearer changes in document state



“Okay, load balancer, you're changing. But how?”



—A worried engineer ready to press the accept button.

Sometimes I need to do some manipulations with the load balancer in the CloudFormation stack - for example, add a port number or change a security group. ClouFormation changes display weakly. I, like on needles, double-check the yaml file ten times to make sure that I didn’t erase anything that was needed, but did not add anything extra.







Terraform is much more transparent in this regard. Sometimes it is even too transparent (read: gets it). Fortunately, the latest version included an improved display of changes - now you can clearly see what is changing.







Flexibility



Write software from the opposite.

To put it bluntly, the most important distinguishing feature of long-lived software is its ability to adapt to change. Write any software from the opposite. I often pierced that I took a "simple" service, and then began to push everything into a single CloudFormation or Terraform stack. And of course, months later it was revealed that I understood everything wrong, and the service is actually not simple! And so I need to somehow break a large stack into small components. When you work with CloudFormation, it is possible to do this only after reconstructing the existing stack, and I do not do this with my databases. Terraform, on the other hand, made it possible to dissect the stack and divide it into more understandable smaller parts.







Modules in git



Sharing Terraform code across multiple stacks is much easier than sharing CloudFormation code. With Terraform, you can put code in a git repository and access it using semantic version control. Anyone with access to this repository can reuse the shared code. CloudFormation is equivalent to S3, but it does not have the same advantages, and there is not a single reason why we should completely abandon git in favor of S3.







The organization grew and the ability to share shared stacks reached a critical level. With Terraform, all this is easy and natural, while CloudFormation will make you jump through the rings before you get something similar.







Operations as code



"Let's script and okay."



—An engineer 3 years before inventing the Terraform bike.

When it comes to software development, Go or a Java program is not just code.









Code as Code







After all, there is still the infrastructure on which it works.









Infrastructure as code







But where is she from? How to monitor it? Where does your code reside? Do developers need permission to access?









Operations as code







Being a software developer is not just about writing code.

Not AWS One: You must be using other providers. SignalFx, PagerDuty, or Github. Maybe you have an internal Jenkins server for CI / CD or an internal Grafana control panel for monitoring. Infra as Code is chosen for various reasons, and any one is equally important for everything related to software.







When I worked at Twitch, we accelerated services within Amazon’s AWS mixed embedded systems. We stamped and supported many microservices, increasing operating costs. Discussions were held in approximately the following vein:









... 3 years later:









The moral of the fable is this: even if you are head over heels in all Amazon's , you still use something not from AWS, and these services have a state that the language uses for configuration in order to synchronize this state.







CloudFormation lambda vs git modules terraform



lambda is CloudFormation's solution for custom logic issue. With lambda, you can create macros or a custom resource . This approach presents additional difficulties that are not present in the semantic version control of git modules in Terraform. For me, the most pressing issue was managing permissions for all of these custom lambda (which are dozens of AWS accounts). Another in importance was a problem like “what happened before - a chicken or an egg?”: It was associated with the lambda code. This function itself is infrastructure and code, and it itself needs monitoring and updates. The last highlight in the coffin was the difficulty of semantically updating lambda code changes; it was also necessary to make sure that the actions of the stack without a direct command do not change between starts.







I remember somehow I wanted to create a canary deployment for the Elastic Beanstalk environment with a classic load balancer. The easiest way would be to do a second deployment for EB next to the production environment, taking another step: by combining the automatically scalable canary deployment group with the deployment LB into the production environment. And since Terraform uses the ASG beantalk as output , it will require 4 extra lines of code in Terraform. When I asked if there was a comparable solution in CloudFormation, they pointed me to a whole repository in git with a deployment pipeline and more: all this for the sake of what the unfortunate 4 lines of Terraform code could do.







He better detects drift



Make sure reality meets expectations.

Drift detection is a very powerful function as operations as code because it helps to make sure that reality meets expectations. It is available with both CloudFormation and Terraform. But as the working stack grew, CloudFormation's drift search returned more and more false positives.







With Terraform, you have much more advanced lifecycle hooks for drift detection. For example, you enter the ignore_changes command directly in the definition of an ECS task if you want to ignore changes in the definition of a specific task without ignoring changes in the entire ECS deployment.







CDK and the future of CloudFormation



CloudFormation is difficult to manage on a large, cross-infrastructure scale. Many of these difficulties are recognized, and the tool needs things like aws-cdk , a framework for defining a cloud infrastructure in code and passing it through AWS CloudFormation. He will be curious to see what aws-cdk will have in the future, but it will be difficult for him to compete with the other benefits of Terraform; to tighten CloudFormation, global changes will be required.







So Terraform does not disappoint



This is "infrastructure as CODE," not "as text."

My first impression of Terraform was pretty bad. I think I just did not understand the approach. Almost all engineers at first involuntarily perceive it as a text format that needs to be converted into the desired infrastructure. DO NOT DO LIKE THIS.







Common truths of good software development apply to Terraform



I have seen how many practices adopted to create good code are ignored in Terraform. You studied for years to become a good programmer. Do not give up this experience simply because you work with Terraform. Common truths of good software development also apply to Terraform.







How can the code not be documented?



I came across huge Terraform stacks with no documentation at all. How can code be written in pages - completely without documentation? Add documentation that explains your Terraform code (emphasis on the word "code" here), why this section is so important, and what you do.







How can you deploy services that were once one big main () function?



I met very complex Terraform stacks, presented as a single module. Why don't we deploy software like this? Why break up large functions into smaller ones? The same answers apply to Terraform. If your module is too large, you need to break it into smaller modules.







Doesn't your company use libraries?



I saw how engineers, spinning up a new project using Terraform, stupidly mopped up huge pieces from other projects into their own, and then picked them until it started working. So, would you work in your company with the “combat” code? We do not just use libraries. Yes, not everything should be a library, but where are we without shared libraries in principle ?!







Don't you use PEP8 or gofmt?



Most languages ​​have a standard accepted formatting scheme. In Python, this is PEP8. In Go - gofmt. Terraform has its own: terraform fmt



. Use on health!







Will you use React without knowing JavaScript?



Terraform modules can simplify some part of the complex infrastructure you are creating, but this does not mean that you can skip it at all. Want to use Terraform correctly without understanding resources? You are doomed: time will go, but you will not master Terraform.







Do you code singletones, or introducing dependencies?



Dependency injection is the recognized best practice for software development, which is preferred by singletones. How is this useful in Terraform? I met Terraform modules depending on a remote state. Instead of writing modules that extract from a remote state, write a module that accepts parameters. And then pass these parameters to the module.







Do your libraries do ten things well, or one thing great?



Libraries that focus on a single task that perform just fine work best. Instead of writing large Terraform modules that try to do everything all at once, make parts of them that do one thing well. And then combine them the way you want.







How do you make changes to libraries without backward compatibility?



The general Terraform module, like a regular library, needs to somehow inform users about changes without backward compatibility. When such changes in libraries occur, it is annoying, and just as annoying when changes without backward compatibility are made in Terraform modules. It is recommended to use git tags and semver when using Terraform modules.







Is the production service launched on your laptop or in a data center?



Hashicorp has tools like terraform cloud for launching your terraform. These centralized services facilitate the management, audit, and approval of terraform changes.







Don't you write tests?



Engineers admit that the code needs to be tested, but they themselves often clog up for checks while working with Terraform. For infrastructure, this is fraught with insidious moments. I advise you to "test" or "create examples" of stacks using modules that can be properly deployed for verification during CI / CD.







Terraform and microservices



The life and death of microservice companies depends on the speed, updating, and destruction of new microservice work stacks.

The most common negative point related to microservice architectures and which cannot be eliminated in any way is connected with work, and not with code. If you take Terraform, only as a way to automate only the infrastructure side of the microservice architecture, then you are depriving yourself of the true advantages of this system. Now everything is just like code .








All Articles