Slurm DevOps. 3rd day. ELK, ChatOps, SRE. And the secret prayer of the developer

The third and last day of the first, but not the last Slurm DevOps has arrived.







We did not expect to be able to repeat the SlOm DevOps. But unexpectedly for us, all the speakers agreed to come to Slurm in February, and the feedback showed how to finalize the program. There is an understanding of how to make the intensive program more holistic and detailed, and some topics more practical. So in February we are going to have the DevOps Slurm in Moscow. Details will be closer to December. The announcement will certainly appear on Habré.













September 6, on the third day of Slurm, four speakers spoke.







Vladimir Guryanov, engineer / team lead in Southbridge, whose performance on the second day of Slurm DevOps really liked the participants of the intensive. Vladimir is an active supporter of the DevOps-approach in work, trying to implement it everywhere.







Pavel Selivanov, a recognized Slurm star, mastermind of the first Slurm by Kubernetes. Students wrote about him that "it would be great if he led the entire program." Paul is a Certified Kubernetes Administrator. He has vast practical experience in implementing Kubernetes - more than 25 projects in a team and individually.







Eduard Medvedev, CTO at Tungsten Labs, developed and implemented ChatOps in automating data centers. After his speech on Slurm, many participants thought about implementing ChatOps in their companies. Now successfully acts as a security consultant.







Ivan Kruglov, Principal Developer at Booking.com, the real guest star of the conference. It was for the sake of his performance that some participants signed up for the SlOm DevOps. At Booking.com he was engaged in such infrastructure projects as distributed delivery and message processing, BigData and web-stack, search. Now on the list of his tasks is building an internal cloud and Service Mesh.







We took extensive interviews with Eduard Medvedev and Ivan Kruglov - we will publish it on Habré as soon as possible.













The audience, with all their thoughtful appearance, showed a slight fatigue. The previous two days of intensive work were forced to work to the limit, the heads demanded rest and a weekend. But the themes and speakers of the third day dispersed fatigue and a drowsiness. Especially Site Reliability Engineering and Ivan Kruglov.







At the end of Slurm's second day , it was decided to postpone the monitoring of infrastructure from Prometheus to tomorrow. The intensity turned out to be too intense - not all participants kept pace.













And so the third day began with a speech by Vladimir Guryanov. He briefly explained why monitoring is actually needed. Described and classified monitoring types. He raised the issue of monitoring notifications.







The topics “How to build a healthy monitoring system” and “Human-readable notifications” very lively entered the audience. Vladimir concluded the presentation with the topic Health Check, which is worth paying attention to and how to equip automation based on monitoring data.













In order to stir up sleepy participants and activate their learning abilities to the maximum, following Vladimir Guryanov, the public's attention was taken by Pavel Selivanov with the theme “Application Logging with ELK”. He showed Slurm our best logging practices and reviewed the ELK stack.







After the first coffee break, full of communication and cookies, Slerm participants took places in the audience.







The performances of Guryanov, Selivanov and the alkaline of the purine series of caffeine did their insidious business. Caffeine made it to the brain’s adenosine receptors, replaced there the purine nucleoside adenosine, which is responsible for the inhibition processes - which simply deprived Slerm participants of the chance to “get too lazy” and “take a nap”. Not everyone understood what happened. But everyone cheered up.







Thus, the audience was one hundred percent ready for further training and active absorption of knowledge. And to the speech of Eduard Medvedev.













Eduard spoke on the topic of infrastructure automation with ChatOps, talked about the integration of instant messengers with pipelines.













The final of the third day of Slurm and Slurm DevOps as a whole was the performance of Ivan Kruglov, Principal Developer on Booking.com. Ivan immediately grabbed the audience’s attention, confessing that he had more than 140 slides in the presentation, carefully hinting that Slerm participants would not make plans for either Friday or the weekend.













In an intense, lengthy and deep presentation, Ivan Kruglov touched on the subject of DevOps and SRE, who they are to each other, how they relate. He talked about “scary terms from the world of SRE”: SLA, SLO, Error Budget and some others.



















Then came the practice and even more practice - monitoring SLI and SLO, applying Error Budget and managing interrupts and operating load (apigateway, service mesh, circuit brackers). And much, much more.















Secret prayer of the developer.







Since the SRE topic is extremely extensive and you can talk about the nuances for at least a few days, it was decided that in February at the next DevOps Slurm we will devote even more time to SRE and its practical application, as the most relevant and sought-after technology.







Sabbath, [6 . 2019 ., 18:25:30]:  !!   ,        :) aaa, [6 . 2019 ., 18:27:07]:   UI\UX  mr. Dmitry, [6 . 2019 ., 18:28:47]: ,       -  ,  ,   .   -    
      
      





After the speeches, a series of questions came up, both offline and in the Slurm working chat:







  , [6 . 2019 ., 23:24:54]:   ,  items  .  , . : 297 432 Maksim Aleksandrov, [7 . 2019 ., 0:11:58]:  .       (nvps) ?     prometheus ?  , [7 . 2019 ., 0:24:15]: 2.21K  prometheus? ,   - service discovery      .  zabbix    ,         .   docker  k8s  zabbix   .   ,  +    ,          zabbix.
      
      





Slerm participants shared their impressions:







 Alexander B, [6 . 2019 ., 21:11:03]:   ,  "",      .     ,        )                     -    . Roman D, [6 . 2019 ., 20:49:05]: ,   .      -                  ,    .  , [6 . 2019 ., 20:49:30 (06.09.2019, 20:50:07)]:   ,   -    ,   , ,   
      
      





 Max Grechnev, [6 . 2019 ., 19:42:57]: !   !   ) Smith Wesson, [6 . 2019 ., 19:58:11]:   !  ! Igor Averin, [6 . 2019 ., 19:58:12]: !   !  !
      
      





After the conference, we asked the participants to leave feedback in the form of Google Docs. The results have pleased and inspired us.

















Thanks to everyone who was with us - offline, in the Selectel conference room, and online. And thanks a lot to the readers of Habr. “The slurm inspires !” ( C )








All Articles