β€œHope is a bad strategy.” SRE intensive in Moscow, February 3-5

We are announcing the first SRE practical course in Russia: Slurm SRE .







At the intensity, we will build, break, repair and improve the site-aggregator for the sale of movie tickets for three days.













We chose the ticket aggregator because it has a lot of refusal scenarios: an influx of visitors and DDoS attacks, the fall of one of the many critical microservices (authorization, reservation, payment processing), the inaccessibility of one of the many cinemas (data exchange about available seats and reservations), and further down the list.







We will formulate the Reliability concept of our aggregator site, which we will continue in Engineering, we will analyze the design from the point of view of SRE, we will select metrics, we will set up their monitoring, we will eliminate the incidents that occur, we will conduct training for team work with incidents in conditions close to combat, we will organize debriefing .







The program is run by Booking.com and Google.

This time there will be no remote participation: the course is built on personal interaction and teamwork.







Details under the cut







Speakers



Ivan Kruglov

Principal Developer at Booking.com (Netherlands)

Since joining Booking.com in 2013, he has worked on such infrastructure projects as distributed delivery and message processing, BigData and web-stack, search.

Now he is engaged in issues of building an internal cloud and Service Mesh.







Ben tyler

Principal Developer at Booking.com (USA)

Engaged in the internal development of the Booking.com platform.

Specializes in service mesh / service discovery, batch job scheduling, incident response and postmortem process.

Speaks and teaches in Russian.







Eugene Varavva

Google Wide Profile Developer (San Francisco).

Experience from highly loaded web projects to research in computer vision and robotics.

Since 2011, he has been engaged in the development and operation of distributed systems at Google, participating in the full life cycle of the project: conceptualization, design and architecture, launching, minimizing and all the intermediate stages.







Eduard Medvedev

CTO at Tungsten Labs (Germany)

He worked as an engineer at StackStorm, was responsible for the ChatOps functionality of the platform. Developed and implemented ChatOps in the automation of data centers. Speaker at Russian and international conferences.







Program



The program is being actively developed. Now it looks like this, by February it can improve and expand.







Theme β„–1: Basic principles and methods of SRE









Theme number 2: Design of distributed systems









Theme β„–3: How to accept the SRE project









Theme β„–4: Design and launch of a distributed system









Topic # 5: Monitoring, Observability and Alerting









Theme β„–6: The practice of testing the reliability of systems









Theme # 7: Practice incident response









Topic # 8: Workload Management Practice









Topic # 9: Incident Response









Theme β„–10: Diagnosis and problem solving









Topic # 11: System Reliability Testing









Theme β„–12: Independent work and review







Recommendations and requirements for participants



SRE - teamwork. We strongly recommend that the whole team take the course. Therefore, we give big discounts for ready-made teams.







Course price - 60 000 β‚½ per person.

If the company sends a group of 5+ people - 40 000 β‚½.







The course is built on Kubernetes. To pass you need to know Kubernetes at a basic level. If you don’t work with him, you can go through Slurm Basic ( online or intensive November 18-20 ).

In addition, you need a good command of Linux, know Gitlab and Prometheus.







check in



If you have a difficult idea to participate, for example, for the CEO, technical director and development team to come to the course, and they will practice based on the managerial vertical, write to me in PM.








All Articles