This note completes the backup cycle. It will talk about the logical organization of a dedicated server (or VPS), convenient for backup, and it will also offer the option of quickly restoring the server from a backup without any downtime in the event of an accident.
Initial data
A dedicated server most often has at least two hard drives that are used to organize a RAID array of the first level (mirror). This is necessary to be able to continue the server if one drive fails. If it is a regular dedicated server, it can be a separate hardware RAID controller with active caching technology on SSDs, so that in addition to ordinary hard drives one or more SSDs can be connected. Sometimes dedicated servers are offered, in which only SATADOM are present from local disks (small disks, structurally - a USB flash drive connected to the SATA port), or even an ordinary small (8-16GB) USB flash drive connected to a special internal port, and data is taken from the storage system connected via a dedicated storage network (Ethernet 10G, FC, etc.), and there are dedicated servers that are loaded directly from the storage system. I will not consider such options, since in such cases the task of backing up the server smoothly passes to the specialist who is servicing the storage system, usually there are various proprietary technologies for creating state snapshots, built-in deduplication, and other joys of the system administrator discussed in the previous parts of this series. The volume of the disk array of a dedicated server can reach several tens of terabytes, depending on the number and volume of disks connected to the server. In the case of VPS, the volumes are more modest: usually no more than 100GB (but there are more), and the tariffs for such VPS can easily be more expensive than the cheapest dedicated servers from the same hoster. VPS most often has one drive, because under it will be the storage (or something hyperconverged). Sometimes VPS has several disks with different characteristics, for different purposes:
- small system - to install the operating system;
- large - storage of user data.
When reinstalling the system using the control panel, the disk with user data is not overwritten, but the system is completely reloaded. Also, in the case of VPS, the host can offer a button that takes a snapshot of the status of the VPS (or disk), however, if you install your operating system or forget to activate the desired service inside the VPS, some of the data may still be lost. In addition to the button, a data storage service is usually offered, most often very limited. Usually this is an account with access via FTP or SFTP, sometimes together with SSH, with a truncated shell (for example rbash), or a restriction on running commands through authorized_keys (via ForcedCommand).
A dedicated server is connected to the network by two ports with a speed of 1 Gbit / s, sometimes it can be cards with a speed of 10 Gbit / s. VPS often has one network interface. Most often, data centers do not limit the speed of the network inside the data center, but limit the speed of access to the Internet.
A typical load of such a dedicated server or VPS is a web server, database, application server. Sometimes various additional support services can be installed, including for a web server or database: search engine, mail system, etc.
A specially prepared server acts as a space for storing backup copies, it will be described in more detail below.
Logical Disk Organization
If there is a RAID controller, or it is a VPS with one disk, and there are also no particular preferences for the disk subsystem (for example, a separate fast disk for the database) - all free space is divided as follows: one partition is created, a group of LVM volumes is created on top of it , it creates several volumes: 2 small ones of the same size that are used as the root file system (they are changed alternately with updates to enable quick rollback, the idea was spied on from the Calculate Linux distribution), another one is for the swap partition, the rest is free this space is divided into small volumes used as the root file system for full containers, disks for virtual machines, file systems for accounts in / home (each account has its own file system), file systems for application containers.
Important note: volumes must be completely self-contained, i.e. should not depend both on each other and on the root file system. In the case of virtual machines or containers, this point is automatically observed. If these are application containers or home directories, you should think about separating the configuration files of the web server and other services in such a way as to maximally remove the dependencies of the volumes among themselves. For example, each site runs on its own user, the siteโs configuration files are in the user's home directory, in the web server settings, the siteโs configuration files are not included through /etc/nginx/conf.d/ .conf, but, for example, / home / /configs/nginx/*.conf
If there are several disks, you can create a software RAID (and configure its caching on SSD, if there is a need and opportunity), on top of which to assemble LVM according to the rules suggested above. Also in this case, you can use ZFS or BtrFS, but here itโs worth considering a few times: both require a much more serious approach to resources, in addition, ZFS does not come with the Linux kernel.
Regardless of the scheme used, it is always worthwhile to estimate the approximate speed of writing changes to disks in advance, and then calculate the amount of free space that will be reserved for creating snapshots. For example, if our server writes data at a speed of 10 megabytes per second, and the size of the entire data array is 10 terabytes - the synchronization time can be up to 24 hours (22 hours - this amount will be transferred over the network 1 gbit / s) - itโs worth to reserve about 800 GB In reality, the number will be less; you can safely divide it by the number of logical volumes.
Backup Storage Server Device
The main difference between the server for storing backups is large, cheap and relatively slow disks. Since modern HDDs have already surpassed the bar of 10 TB in one disk, it is mandatory to use file systems or RAID with checksums, because during the rebuild of the array or restoration of the file system (several days!), The second disk may fail due to increased load. On disks with a capacity of up to 1TB, this was not so sensitive. For simplicity of description, I assume that the disk space is divided into two parts of approximately the same size (again, for example, using LVM):
- volumes corresponding to the servers used to store user data (the last backup made for verification will be deployed to them);
- volumes used as BorgBackup repositories (data for backups will directly get here).
The principle of operation is that separate volumes are created for each server under the BorgBackup repository, where the data from the battle servers will go. Repositories operate in the add-only mode, which eliminates the possibility of intentional deletion of data, and due to deduplication and periodic cleaning of the repositories from old backups (there are annual copies, monthly for the last year, weekly for the last month, daily for the last week, possibly in special cases - hourly for the last day: total 24 + 7 + 4 + 12 + annual - approximately 50 copies for each server).
In the BorgBackup repositories, the add-only mode is not activated, instead, ForcedCommand is used in .ssh / authorized_keys about the following plan:
from=" ",command="/usr/local/bin/borg serve --append-only --restrict-to-path /home/servername/borgbackup/",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc AAAAA.......
A script wrapper is placed on top of the specified path on top of borg, which, in addition to launching the binary with the parameters, additionally starts the process of restoring the backup after the data is removed. To do this, the wrapper script creates a tag file next to the corresponding repository. The last backup made after the data upload process is completed is automatically restored to the corresponding logical volume.
This design allows you to periodically clean unnecessary backups, and also does not allow battle servers to delete anything on the backup storage server.
Backup process
The initiator of the backup is the dedicated server itself or VPS, since such a scheme gives more control over the backup process from this server. First of all, a snapshot of the state of the active root file system is taken, which is mounted and uploaded using BorgBackup to the backup storage server. After the completion of data capture, the image is unmounted and deleted.
If there is a small database (up to 1 GB for each site), a database dump is made, which is stored in the corresponding logical volume, where the rest of the data of the same site is located, but so that the dump is not accessible through the web server. If the databases are large, you should configure a "hot" data mining, for example using xtrabackup for MySQL, or WAL with archive_command in PostgreSQL. In this case, the database will be restored separately from these sites.
If containers or virtual machines are used, you should configure qemu-guest-agent, CRIU or other necessary technologies. In other cases, additional settings are most often not needed - just create snapshots of logical volumes, which are then processed similarly to a snapshot of the root file system. After taking the data, the pictures are deleted.
Further work is done on the backup storage server:
- The last backup made in each repository is checked.
- checks for a tag file indicating that the data capture process is complete,
- Data is being deployed to the corresponding local volume,
- tag file is deleted
Server recovery process
If the main server dies, a similar dedicated server is launched, which is loaded from some standard image. Most likely, the download will take place over the network, however, the data center technician performing the server configuration can immediately copy this standard image to one of the disks. Downloading takes place in RAM, after which the recovery process starts:
- a request is made to attach a block device using iscsi \ nbd or another similar protocol of a logical volume containing the root file system of a dead server; since the root file system should be small - this step should be completed in a few minutes. Also bootloader recovery is performed;
- the structure of local logical volumes is recreated, logical volumes are attached from the backup server using the dm_clone kernel module: data recovery begins, and changes are written immediately to local disks
- a container is launched with all available physical disks - the server is fully restored, but with reduced performance;
- after data synchronization is completed, logical volumes from the backup server are disconnected, the container is turned off, the server is rebooted;
After the reboot, the server will have all the data that was at the time of the backup, and also include all the changes that were made during the recovery process.
Backup, part 1: Why do you need a backup, an overview of methods, technologies
Backup, Part 2: Overview and Testing rsync-based backup tools
Backup, Part 3: Overview and Testing duplicity, duplicati
Backup, Part 4: Overview and Testing zbackup, restic, borgbackup
Backup, Part 5: Testing Bacula and Veeam Backup for Linux
Backup: part requested by readers: AMANDA review, UrBackup, BackupPC
Backup, Part 6: Comparing Backup Tools
Backup Part 7: Conclusions
I invite you to discuss the proposed option in the comments, thank you for your attention!