Traveling through the Selectel data center. Dinosaur on Fire, VMware, C2F5H and the Invisible Werewolf

The work of the system administrator is based on the belief that data center engineers know their job. We are building failover clusters, but what will this failover cost if the power goes out? What difference does it make how quickly the server processes the request if the channel from the data center drops to the traffic exchange point? How to raise the server if it is physically overheated?







But I would like not to believe, but to know how exactly fault tolerance is created at the iron level. Where do those “nine” equipment reliability come from, which we are talking about when formulating the SLA Kubernetesov. What happens when a project burns in the truest sense of the word.







We were lucky to walk around Selectel data center on the third day of Slurm DevOps, to look into the holy of holies and even take some pictures for memory. We also asked about company legends that Selectel employees never tell anyone. And as it turned out, they themselves do not remember.







Our Southbridge company has been associated with Selectel for a long time partnership. Now we support 58 projects hosted on the provider's servers. When a client needs a server located in Russia, we recommend Selectel, because by experience we consider him the most reliable and convenient provider of IT infrastructure.







Go!













While climbing to the fourth floor - the trickiest ones took the elevator, the most athletic ones went upstairs - colleagues from Southbridge reminded me that I should definitely learn about Selectel legends: about the werewolf, about the restless spirit that wandered and howled when building a new data center building. I have always been interested in the mythology of large companies, which remained from the turbulent stage of birth and first growth.







At the very beginning, the company had one data center on Flower 1 in St. Petersburg. The data center served the company Vkontakte. We saw him from the window when we went up to the fourth floor. He once stopped on modernization nine to ten years ago - and since then has been working continuously. In terms of reliability, it belongs to the Tier II.







Information for consideration (c) "Seventeen Moments of Spring":



The main indicator of the data center is fault tolerance. There are 4 categories in total - from Tier I to Tier IV. Belonging to a certain category indicates the level of redundancy, physical security and reliability.



Tier I (Redundancy - N, fault tolerance - 99.671%) - there are no raised floors in the data center, no backup power sources and uninterruptible power supplies, and engineering infrastructure is not reserved. During a scheduled or emergency repair, the data center stops.



Tier II (Redundancy - (N + 1), fault tolerance - 99.749%) - there is a small level of redundancy, raised floors and backup sources of power supply are installed in the data center, repair work causes the data center to stop working, as in the Tier I.



Tier III (Redundancy - 2N, fault tolerance - 99.982%) - it is possible to carry out repair work (replacing system components, adding and removing failed equipment) without stopping the data center. All systems are reserved; there are several power distribution and cooling channels.



Tier IV (Redundancy - 2 (N + 1), fault tolerance - 99,995%) - double redundancy and redundancy of the system are in requirements. It is possible to carry out any work without stopping the work of the data center. Engineering systems are twice reserved, that is, both the primary and secondary systems are duplicated.

Ahead of us was a powerful grille, a door with an electronic lock and a full-height pinwheel made of thick metal profiles. And behind it is the data center space itself.













The data center in which we were located is newer than the data center in the neighborhood - it was built in 2015. And it belongs to the category of Tier III.







Selectel now has two operating centers on Tsvetochnaya, three more in Dubrovka, two data centers in Moscow, which are considered as one data center in the company. Only six.







The building has four floors. Offices are located on the first floor and some equipment is located. The fourth floor is partially allocated for offices, but most of it is occupied by technical premises.







Before the provider entered here, production was located in the building. Employees of the data center themselves do not remember the production of what exactly - either film, or clothing. The company bought the building in order to eliminate the risks of complex property relations if the building is owned by a third party.







Despite the fact that production used to be located here, there were machine tools and other heavy machinery, Selectel further strengthened the floors. Even in the conference room on the ground floor, where the intensive Slurm DevOps ( 1 , 2 , 3 ) took place, we paid attention to the reinforced supports.







We go to the data center only in shoe covers - the usual rule for such premises. For putting on plastic onuchi stands "shoe rack". We are sincerely imbued. The attendant offered us a choice - to put on shoes ourselves or entrust the extremities to the gluttonous apparatus.













Our choice was predictable. Igor Olemsky, Director of Southbridge: "We are for automation . " Anton Tarasov, administrator of Southbridge: “If it were so with socks, I would be the happiest person on the planet .













While they were wearing shoes, Southbridge developers were actively wondering exactly where the VMware servers were located. Everyone was interested in looking at what equipment this technology works on.







As soon as we entered the technical area, they immediately announced the rules: “We don’t eat, we don’t drink, we don’t smoke. We don’t put our hands anywhere, into any shields, into any racks, air conditioners, remotes. We hold our hands in front of us, like a tyrannosaurus. "







On the fourth floor there are three server rooms. All equipment is on the raised floor. It is necessary for cold air to flow from below, as well as for communication to be made that does not require constant access. These are power lines and cooling pipelines.













As soon as we entered the small server room, a buzz hit us. The character of the famous cartoon with sawdust in his head would surely say: "This is well, well, good reason!" Since we were not used to it, we hardly heard each other for the first couple of minutes. Explanations of the guide, too, barely guessed, I had to crowd closer.







Around are racks, racks and even more racks ... They are lined up in strict rows. In the server data centers, we met different ranks: by 10 racks, 12, 20, 30. Depending on the configuration of the room, the area rented by the client, and the tasks.



















In the data center in all server rooms of the cooling system it looks like this: from above and on the sides the cooled space is limited by the rack structure, the front part is closed by perforated doors. Air conditioners drive cold air under the raised floor - and air rises under pressure into the racks.













It is enough to go between the rows to feel how the air temperature drops sharply by five degrees, you can even feel the temperature limit. The joints in the raised floor are so tightly fitted that the conditioned air has nowhere to go, except for the path specially provided for cooling.







In the server itself, the temperature is maintained somewhere around 22 ± 2 degrees Celsius. In the "cold" corridor, the temperature can drop to 16-17 degrees. There were two “cold” corridors in the small server room. Accordingly, the corridors between them are called "hot." They are slightly warmer than the server room average - the air passes through the racks and heats up from the equipment.













There are racks for rent to customers. Engineers connect the power - the client calls in with the equipment and does what he wants, within the framework of regulations and legislation. Racks can be rented different. As many as 47 units, half, four-section. They are physically separated - different locks are used. You can rent only 10 units. Who has very little equipment, this will be quite enough. Accordingly, less power - it turns out cheaper.







If the client rents, for example, a “quarter” in the lower section and you need to lay the cable, he will be pulled through a special metal channel. And customers in the upper parts of the rack will not get access to other people's communications in any way: neither to power, nor to copper, nor to optical.







In the server room are air conditioners in the amount of three pieces. Only two of them work. If one air conditioner is taken out for maintenance or a breakdown occurs, the engineers will turn on the spare. This reserve stock is a Tier III specification requirement.







For example, there are uninterruptible power supplies. There are a certain number of them, suppose 12. But it works 6. The server can work for an hour on batteries if the electricity ceases to flow to the data center. But if 6 UPSs hypothetically break down, then the engineers will turn on six more. There are always twice as many nodes in the data center for reliability.













This data center for the project can consume up to 10 MW. But now there is only 1.5. So far, only the fourth floor is used for equipment - the second and third at the construction stage. And the fourth is still not completely filled: it is designed for 250 racks, and 200 is occupied. There is room to grow.







In total, Selectel uses 14.4 MW in all data centers. A rack in operation 1,200.













In addition to the main racks, which are used for various projects, mainly for rent for customers, service racks are located in the server racks, where only Selectel equipment is installed. There are cross racks for passive connection. They are without power, only optical fiber - for connecting equipment between platforms and between rooms. In each server room there is the same cabinet with crosses. Cross can go to another room, to another server room on the ground floor, as soon as it is built, it can go to a neighboring data center or even to a data center on Dubrovka.







The company has several such fibers. If one is interrupted, the data center will start working on the other without a pause. All paths that are laid are always reserved.







If they will make a connection between this data center and the neighboring one, engineers will lead one link through the cross through the air between the data centers, and the second link will lead through the sewer through another cross. And no matter what happens, there will always be a backup channel.







Since there is a lot of equipment in the data center, employees strictly monitor fire safety. The data center has several scenarios for dealing with fires. Selectel has fire extinguishers in every room, both office and technical. And people are specifically trained to work with them. If the fire is local, you can deal with it yourself.













But if it burns strongly, for example, the power supply at the server or the compressor circuit with oil, then fire extinguishers can not always cope. For such cases, the data center has a gas fire extinguishing station. From it, yellow pipes run down the ceiling into each room.







In a serious fire, all people are taken out of the server room. Near each door is a yellow button. The door closes tightly, a button is pressed, a countdown of 30 seconds is given. Hladon-125 gas is supplied - pentafluoroethane, chemical formula C2F5H. It inhibits the combustion process - and the fire stops immediately. When extinguishing a fire in the data center, neither liquids nor powder are used because they will ruin the equipment.







In a large server room we were forbidden to take photographs. Therefore, I will tell from memory what they saw. In total, this data center has one small server and two large ones.







The first large server room has one “cold” corridor, which is made for Selectel projects and for customer rental. It is much longer than in a small server room. On some racks there are individual security measures - on one of the racks we noticed an electronic lock with a pin code and a video camera on top.







We looked at how the “lease of allocated space” service looks from the inside. You can buy any quadrature on the site - of course, from the one that is available. And the client there can place any racks and equipment that meet the standards.







A very large area, which belongs to one client, was examined through the enclosing fence. There were German racks on special order. There is also a small separate warehouse.







According to the stories of our guide, this service is not necessarily so large. You can put two racks and surround the cage. And access to them will only be with you. Typically, such requirements arise if it is a bank or a client works with financial institutions.













We looked into the premises of the fire extinguishing station. This is where the cylinders with Freon-125 stand. The equipment is configured so that, depending on the size, gas from a certain number of cylinders is sent to each room.













On the left along the corridor there is an electrical panel room. But we don’t have access there, just in case they don’t take excursions - otherwise it will be uncomfortable, and the smell will not disappear for a long time.







There are uninterruptible power supplies and panels. It is in this room that the food for the entire building comes. And already from here there is a wiring in all rooms. Bus ducts go to the server rooms, which can be seen under the ceiling in the corridor.







Two bus ducts are sent to each server. One goes under the ceiling, one goes under the raised floor - this is how the reservation condition is met. The entire building is powered by two input rays from the power plant. If one input is disconnected, then the data center will work from the second.













If two are disconnected at once, then all equipment passes to rechargeable batteries. 750 batteries are located in a special room. A little further there is another room of the same kind - and there are as many more. The data center will be able to live on them for 1-3 hours, depending on the load, but it takes only 2 minutes to switch to a diesel engine.







Gigantic diesel generator sets are located in separate rooms. Each stands on a platform about one knee high - as I understand from the explanations, this is a separate tank with fuel for each diesel engine. Plus in the data center there are several tanks that are buried underground and are designed for several tens of tons of fuel.







As fuel degrades, it is periodically replaced. If the fuel runs out in the diesel tank, the pump pumps fuel from the tanks. If suddenly a nuisance occurs and the pump breaks, there is still a spare.













Absolutely all systems are duplicated - Internet communication channels, cooling, power supply, emergency fire extinguishing systems and alternative power supply.







We asked a question about telecom operators. The company's engineer said that they constantly use 5-6 operators for uplinks. And there are quite a few routes. Plus, the provider has connections with almost all traffic exchange points in St. Petersburg and Moscow. In Moscow, the largest is the M9. And in St. Petersburg - B18 and Kantemirovskaya.







If the fuel in the underground tanks comes to an end, they bring another tank. Selectel has a contract with a fuel company. The data center can endlessly live on diesel, it's just more expensive.







We asked how Selectel works with the human factor - because it is he who is the greatest danger, and no reservation will help.







- How do you work with human errors?







- We try not to repeat them. We predict possible errors. We carry out training, exercises. For example, training on switching to a diesel generator: we test people, switch to diesels in the process, sometimes we transfer the entire load to them. Plus there is a knowledge base.







We got to VMware. On cloud servers, only Intel platforms are used, 2 terabytes SSD. Naturally, reservation is just for everything. For example, we saw close: in each server two network cards, two links are stuck in each. One link goes to the switch that is at the top, another link goes to the switch of the next rack. Two power supplies per module are used.













In the data center, there are mainly Russian CMO racks. In client desks on the rented area there are different solutions.







A little further down the corridor from the second large server room we saw an elevator. There are two elevators for lifting equipment - one ton and two tons. The loading area is made separately - it is located next to the conference room on the ground floor.













In the elevator room, we saw a “small” box with a Juniper MX 2010 router. The dream of any admin: three AC power supplies, 1 RE module (Routing engine): 1800x4 (CPU 1.8 GHz QuadCore, 16 GB RAM), 1 module SFB (Switch Fabric Board).







Colleagues argued where to put it. We decided that at home he would look best. It will be possible to distribute wi-fi to home appliances. Cumbersome and solid - a serious router for serious admins. And when you get tired, you can sell and buy an apartment in a large city.

























There is an even larger, more powerful and productive model - the MX 2020.







How does the router work? Modules, line cards are inserted into it - they are unusually high and very narrow. And such line cards are very different - they can have 8, 24, 48 ports. Ports can be both “tens” and “hundred”. Depending on what your needs are and what financial opportunities.







In the MX 2020 there are 32 slots for line cards: 16 on the top and 16 on the bottom. And relatively speaking, if you insert 10 line cards, and each one has 48 ports, then the result is 480 ports. We stick in “twenty-five” transceivers - and we multiply 480 ports by 25 gigabits. This is one of the options. You can put and "weave."







When they left the technical premises, they lingered a bit at the “snack point”, where Selectel engineers regained strength at night. They asked if Tier III category coffee machines were duplicated in the data center. Two coffee machines at each point - each has two power supplies ... and so on.













Igor Olemsky asked:







- How is a high level of staff qualification achieved in comparison with other data centers?







- There is a certain entry threshold. Not everyone will be accepted. Naturally, every person who comes here will undergo further education. Many different directions. Constantly undergo internal training from various people - the technical director, for example. The conference room hosts many outdoor events. And employees often participate in them, gaining knowledge. After returning from the conference, the employee prepares a presentation - and tells colleagues that he learned what he learned there.







The guide also said that they are now introducing DCIM (Data Center Infrastructure Management). This is a system in which absolutely everything is entered in the data center to manage and track life cycles. All server equipment, network equipment, racks, clients, traces.







To the question, how many Selectel servers did our attendant look up to the ceiling - either counting, or requesting permission from the highest authorities, and answered with a short pause: "We now have about 40,000 servers for 6 data centers . "







Our guide works in the engineering department - he is engaged in the maintenance of server rooms and sees to it that everything works without failures. It was cooled according to requirements, diesels did not leak, the voltage was on the batteries.













He told us a little how their work is built. On holidays and weekends, there is always a person on the site to, in a pinch, trace how the data center is switching to diesel generators. If something happens, there is always one person in the control center, while the second one goes out to check the server ones.







For safety reasons, attendants eat different foods. If you order something, then from two different places.



















Of course, we asked what he could tell about force majeure and fails. The dialogue itself was just wonderful.







- Tell me something bad?

“Autumn has come,” the Selectel engineer answered honestly.

- Tell us some bad fact that you have well decided.

“Or they made a bad decision.”

- Or didn’t decide.







Under such pressure, our attendant gave up and told two stories.







There has been only one fire for the entire data center. On the fourth floor, the technical room caught fire. Nothing serious, but everyone remembered the fire because one employee ran out of the building in an embrace with the most valuable that the company has. With a large dinosaur, the symbol of Selectel. And now this toothy burner is standing in the conference room.













It immediately became interesting to us, and how the Selectel employees are fighting not with the fire itself, but with firefighters:







- And when firefighters arrive, they know that the data center can not be filled with water?

- They don't care. They are firefighters, the engineer answered sadly.







But as he explained, the data center is trying to cope on their own. In addition to fire extinguishing stations, there are various fire extinguishers, even those that give a stream of 8 meters. Constantly conduct training of staff. If something flammable happens, Selectel employees meet the firefighters, explain that they managed, and say that there is no need for water, no need at all.







Our escort spoke about yet another case when everything lay down firmly and completely. Then the fiber-optic cable - between Tsvetochnaya and Dubrovka failed. As a result, it turned out that next to one section of the optical fiber, a crowd gathered and began to shoot cables from the air. And they killed the fiber optic. Then the company for a long time solved the problem. And on the net, people began to say that the engineers of the provider had crooked hands, like a dinosaur. Since then, plush dinosaurs began to be made with straight legs. Just in case.







When we met with Selectel employees, we were told the story of the Selectel werewolf - either a boy brought up by a bigdata and speaks at the brain faculty, or a system administrator who throws himself into the developer on the full moon. Like, howling at night, rustling behind the false ceiling, sometimes it can be seen from the corner of the eye: shaggy, unkempt, with red eyes.







We decided to sort it out.







“What is the story of the werewolf?” They say someone howls at night and in the corridors saw a bearded creature with red eyes? Maybe during the reconstruction of the data center you walled up the room with the administrator inside?







- This is a lie, the red-eyed bearded man is a projection of Satoshi Nakamoto, he exists at once in all the data centers of the world. And howl those who bought bitcoins in December seventeenth.







Judging by the evasive answers, the theme of the Seltekelovsky werewolf is covered in the darkness of the NDA. We never found out if it exists, but we looked at the data center from the inside.








All Articles