First, we’ll tell you how we came to the conclusion that the usual anti-virus protection tools are not suitable for a public cloud and other approaches to protecting resources are required.
Firstly, as a rule, providers provide the necessary measures to guarantee the protection of their cloud platforms at a high level. For example, we at #CloudMTS analyze all network traffic, monitor the security logs of our cloud, and regularly perform pentests. Cloud segments given to individual customers must also be reliably protected.
Secondly, the classic version of the fight against cyber risks involves the installation of an antivirus and its controls on each virtual machine. However, with a large number of virtual machines, this practice can be inefficient and require significant amounts of computing resources, thereby additionally loading the customer’s infrastructure and reducing the overall performance of the cloud. This has become a key prerequisite for finding new approaches to building effective antivirus protection for customers' virtual machines.
In addition, most of the anti-virus solutions available on the market are not adapted to solve the problems of protecting IT resources in a public cloud environment. As a rule, they are heavyweight EPP solutions (Endpoint Protection Platforms), which, moreover, do not provide the necessary customization capabilities on the side of the cloud provider’s clients.
It becomes obvious that traditional anti-virus solutions are poorly suited for working in the cloud, since they seriously load the virtual infrastructure during updates and scans, and also do not have the necessary levels of role management and settings. Next, we will analyze in detail why the cloud needs new approaches to antivirus protection.
What antivirus should be able to do in the public cloud
So, let's pay attention to the specifics of working in a virtual environment:
Efficiency of updates and mass scheduled checks. If a significant number of virtual machines using traditional antivirus initiate an update at a time, the so-called “storm” of updates will occur in the cloud. The power of the ESXi host, which hosts several virtual machines, may not be enough to handle a flurry of similar tasks that are launched by default. From the point of view of the cloud provider, such a problem can lead to additional loads on a number of ESXi hosts, which will ultimately lead to a decrease in the performance of the cloud virtual infrastructure. This may affect, among other things, the performance of the virtual machines of other cloud clients. A similar situation may arise when starting a mass scan: simultaneous processing by the disk system of a lot of the same type of requests from different users will negatively affect the performance of the entire cloud. With a high degree of probability, a decrease in the working capacity of storage systems will affect all customers. Such spasmodic loads do not please either the provider or its customers, as they affect the "neighbors" in the cloud. From this point of view, a traditional antivirus can be a big problem.
Safe quarantine. If a file or a document potentially infected with a virus is detected in the system, it is sent to quarantine. Of course, an infected file can be deleted immediately, but this is often not acceptable for most companies. Corporate enterprise antiviruses that are not adapted to work in the provider's cloud usually have a common quarantine zone - all infected objects fall into it. For example, found on the computers of company users. Clients of the cloud provider "live" in their own segments (or tenants). These segments are opaque and isolated: customers do not know about each other and, of course, do not see what others are placing in the cloud. It is obvious that in the general quarantine, which will be accessed by all anti-virus users in the cloud, a document containing potentially confidential information or trade secrets can potentially get into it. This is not acceptable for the provider and its customers. Therefore, there can be only one solution - this is a personal quarantine for each client in its segment, where neither the provider nor other clients have access.
Individual security policies. Each client in the cloud is a separate company, whose IT department sets its own security policies. For example, administrators define scan rules and antivirus scan schedules. Accordingly, each organization should have its own control center to configure antivirus policies. At the same time, the settings should not affect other clients of the cloud, and the provider should be able to make sure that, for example, anti-virus updates are performed normally for all client virtual machines.
Organization of billing and licensing. The cloud model is flexible and involves paying only for the amount of IT resources that was used by the customer. If there is a need, for example, in view of the seasonality factor, then the amount of resources can be quickly increased or reduced - all based on current needs for computing power. Traditional antivirus is not so flexible - as a rule, a client buys a license for a year for a predetermined number of servers or workstations. Cloud users regularly disconnect and connect additional virtual machines depending on their current needs - accordingly, anti-virus licenses must support the same model.
The second question is what exactly the license will apply to. Traditional antivirus is licensed by the number of servers or workstations. Licenses for the number of protected virtual machines do not quite fit within the cloud model. The client can create any available number of virtual machines from available resources, for example, five or ten machines. Most customers do not have this number; it is not possible for us, as a provider, to track its change. Licensing by CPU is not technically possible: clients receive virtual processors (vCPU), which should be licensed. Thus, the new anti-virus protection model should include the possibility for the customer to determine the required number of vCPUs for which he will receive anti-virus licenses.
Compliance with the law. An important point, since the applied solutions must ensure compliance with the requirements of the regulator. For example, often the “inhabitants” of the cloud work with personal data. In this case, the provider must have a separate certified cloud segment, which fully complies with the requirements of the Law on Personal Data. Then companies do not need to “build” the entire system for working with personal data on their own: purchase certified equipment, connect and configure it, and pass certification. For cyber protection of ISPD of such clients, the antivirus must also comply with the requirements of Russian law and have a FSTEC certificate.
We examined those mandatory criteria that antivirus protection must meet in a public cloud. Next, we will share our own experience in adapting an anti-virus solution for working in the provider's cloud.
How can I make friends antivirus and cloud
As our experience has shown, choosing a solution for description and documentation is one thing, and putting it into practice in an already running cloud environment is a completely different task in terms of complexity. We will tell you what we did in practice and how we adapted the antivirus to work in the provider's public cloud. The antivirus solution vendor was Kaspersky, which has anti-virus protection solutions for cloud environments in its portfolio. We settled on Kaspersky Security for Virtualization (Light Agent).
It includes a single console for Kaspersky Security Center. Light Agent and Security Virtual Machine (SVM) and KSC Integration Server.
After we studied the architecture of Kaspersky’s solution and conducted the first tests together with the vendor’s engineers, the question arose of integrating the service into the cloud. The first implementation was carried out jointly at the Moscow cloud site. And that’s what we understood.
In order to minimize network traffic, it was decided to place SVM on each ESXi host and “bind” SVM to ESXi hosts. In this case, the light agents of the protected virtual machines access the SVM of the particular ESXi host on which they are running. A separate administrative tenant has been selected for the main KSC. As a result, KSC subordinates are located in the tenants of each individual client and turn to the superior KSC located in the management segment. Such a scheme allows you to quickly solve problems arising in tenants of clients.
In addition to the issues with raising the components of the antivirus solution itself, we faced the task of organizing network interaction through the creation of additional VxLANs. And although the solution was originally intended for enterprise clients with private clouds - with the help of engineering ingenuity and technological flexibility of NSX Edge, we were able to solve all the problems associated with the separation of tenants and licensing.
We worked closely with Kaspersky engineers. So, in the process of analyzing the solution architecture in terms of network interaction between the system components, it was found that, in addition to access from light agents to SVM, feedback is also needed - from SVM to light agents. This network connectivity is not possible in a multitenant environment due to the possibility of the existence of identical network settings of virtual machines in different tenants of the cloud. Therefore, at our request, colleagues from the vendor redesigned the mechanism of network interaction between the light agent and SVM in terms of eliminating the need for network connectivity from SVM to light agents.
After the solution was deployed and tested on the Moscow cloud site, we replicated it to other sites, including the certified cloud segment. Now the service is available in all regions of the country.
The architecture of the IB solution as part of a new approach
The general scheme of the anti-virus solution in a public cloud environment is as follows:
The anti-virus solution working scheme in a public cloud environment #CloudMTS
We describe the features of the work of individual elements of the solution in the cloud:
• A single console that allows customers to centrally manage the protection system: run checks, monitor updates and monitor quarantine zones. It is possible to configure individual security policies within your segment.
It should be noted that although we are a service provider, we do not interfere with the settings set by clients. The only thing we can do is to reset the security policies to standard if a migration is necessary. For example, this may be necessary if the client accidentally tightened them or significantly weakened them. A company can always get a control center with default policies, which it can then configure on its own. The downside of Kaspersky Security Center is that so far the platform is available only for the Microsoft operating system. Although lightweight agents can work with both Windows and Linux machines. However, Kaspersky Lab promise that in the near future KSC will work under Linux as well. One of the important features of KSC is the ability to manage quarantine. Each client company in our cloud is personal. This approach eliminates the situation when a document infected with a virus accidentally falls into the public domain, as could be the case with a classic corporate antivirus with general quarantine.
• Light agents. As part of the new model, a light agent Kaspersky Security is installed on each virtual machine. This eliminates the need to store an anti-virus database on each VM, which reduces the amount of disk space used. The service is integrated with the cloud infrastructure and works through SVM, which increases the density of virtual machines on the ESXi host and the performance of the entire cloud system. The light agent builds a job queue for each virtual machine: check the file system, memory, etc. But SVM is responsible for performing these operations, which we will talk about later. The agent also acts as a firewall, monitors security policies, sends infected files to quarantine, and monitors the overall “health” of the operating system on which it is installed. All this can be controlled using the already mentioned single console.
• Security Virtual Machine. All resource-intensive tasks (anti-virus database updates, scheduled scans) are handled by a separate Security Virtual Machine (SVM). She is responsible for the work of the full-fledged anti-virus engine and its databases. A company's IT infrastructure may include multiple SVMs. This approach increases the reliability of the system - if a machine fails and does not respond for thirty seconds, agents automatically start looking for another.
• KSC integration server. One of the components of the main KSC, which assigns its SVMs in accordance with the algorithm specified in its settings to light agents, and also controls the availability of SVMs. Thus, this software module provides load balancing on all SVM cloud infrastructure.
Cloud Algorithm: Reducing Infrastructure Load
In general, the algorithm of the antivirus can be represented as follows. The agent accesses the file in the virtual machine and checks it. The result of the verification is stored in a common centralized database of SVM verdicts (it is called Shared Cache), each entry in which identifies a unique sample file. This approach allows you to ensure that the same file is not scanned several times in a row (for example, if it was opened on different virtual machines). A file is scanned again only if it has been modified or a scan has been started manually.
Implementing an antivirus solution in the provider's cloud
The image shows the general scheme for implementing the solution in the cloud. The main Kaspersky Security Center is deployed in the control zone of the cloud, and an individual SVM is deployed on each ESXi host using the KSC integration server (each ESXi host has its own SVM associated with special settings on VMware vCenter Server). Clients work in their cloud segments, which host virtual machines with agents. They are managed through individual KSC servers subordinate to the main KSC. If it is necessary to protect a small number of virtual machines (up to 5), the client can be granted access to the virtual console of a dedicated KSC dedicated server. Networking between client KSCs and the main KSC, as well as light agents and SVMs, is done using NAT through EdgeGW client virtual routers.
According to our estimates and the test results of colleagues in the vendor, Light Agent reduces the load on the virtual infrastructure of clients by about 25% (when compared with a system that uses traditional antivirus software). In particular, the standard Kaspersky Endpoint Security (KES) antivirus for physical environments consumes almost twice as much server processor time (2.95%) than the light-agent virtualization solution (1.67%).
CPU load comparison graph
A similar situation is observed with write access to the disk: for classic antivirus, it is 1011 IOPS, for cloud antivirus - 671 IOPS.
Graph comparing disk access rates
Performance gains help maintain infrastructure stability and leverage computing power. By adapting to work in a public cloud environment, the solution does not reduce cloud performance: it performs a centralized file check and update downloads, distributing the load. This means that, on the one hand, threats that are relevant to the cloud infrastructure will not be missed, on the other hand, the requirements for virtual machine resources will decrease by an average of 25% compared to traditional antivirus.
In terms of functionality, both solutions strongly resemble each other: the following is a comparative table. However, in the cloud, as the above test results show, it is still more optimal to use the solution for virtual environments.
About tariffing as part of a new approach. We decided to use a model that allows you to obtain licenses by the number of vCPUs. This means that the number of licenses will be equal to the number of vCPUs. Antivirus can be tested by leaving a request on the site .
In the next article on cloud-related topics, we’ll talk about the evolution of cloud-based WAFs and what’s best to choose: hardware, software, or the cloud.
The text was prepared by #CloudMTS cloud provider employees: Denis Myagkov, lead architect and Alexey Afanasyev, IB product development manager.