What is wrong with Copy-on-Write for Linux when copying





Warning: this article applies to all CoW file systems on Linux that support reflink when copying. At the moment, these are: BTRFS, XFS, and OCFS2.



Please refrain from holivars about which FS is better: Btrfs, XFS, Reiser4, NILFS2, ZFS or some unmentioned.



Background





Copy Expectations in CoW File Systems



When Reiser4 was announced in 2001, I was inspired and passionate about Copy-on-Write. Just think, we can easily and simply have as many copies of different projects as you like, and physically the disk will only store the differences between them!



In addition, the copy speed should indecently grow. Due to the fact that when copying, only a reflink link to the old file would be created. When writing to such a new file, sectors for changed data would be automatically allocated. As a result, we would have the same sectors for the common parts of the files, and different parts would be recorded in different sectors.



It then looked like a panacea for creating accounts for shared hosting, and now it is the best solution for lightweight virtual machines and containers. After all, we could not waste space on identical files, at the same time allowing users to easily change them.



However, Hans Riser went roof and killed his wife, and his brainchild (most likely for political reasons) was not included in the core. Maybe, after all, the story depends on a specific person?



ZFS was published under a license incompatible with Linux, and therefore was not included in the kernel. Therefore, the introduction of ZFS in Linux was slowed down for a long time.



After that I began to wait for Btrfs. And only 6 years after the announcement, it was recognized by the developers of the Linux kernel as stable.



After that, I took the time to use Cow systems, since the Copy-on-Write paradigm itself implies increased fragmentation, because data changes are written to a new place every time.



For HDD, fragmentation kills performance, since the process of repositioning the block of read heads is a very lengthy operation.



Therefore, I personally postponed the implementation of btrfs on my machines until they switched to SSD.



I note that SSDs also do not like fragmentation, everyone knows that for an SSD, linear write / read can be tens of times faster than random access.



But the performance of a fragmented SSD is not as dramatic as that of a HDD.



So, what about copy speed with CoW?



And finally, the time has come. When SSDs became reliable enough, I began to use CoW file systems with might and main. More specifically, Btrfs and Nilfs2.



Mastering the exciting possibilities of snapshots, for a while I forgot about my expectations from the 2000s about ultra-fast file copying.



After some time, I decided to conduct tests. And to my great disappointment, I did not see any speed increase from CoW when copying. Here is the result on a SATA SSD:



time cp -a /usr /usr1 real 0m15,572s user 0m0,240s sys 0m4,739s
      
      





It turned out that to use CoW you need to specify a special key.



 time cp -a --reflink=auto /usr /usr2 real 0m3,166s user 0m0,178s sys 0m2,891s
      
      





Only in this case we see a 5-fold advantage. By the way, the size of the advantage grows indefinitely with increasing file length. The /usr



folder that I copied is mostly small files.



I was incredibly surprised why the key advantage of the CoW file system is not used by default. Indeed, for this she was created!



In this case, I tested copying on the Btrfs section. But you will get a similar result with any other CoW file system that supports reflink.



What is the problem, Billy?



The problem is in cp. By default, it does not use CoW when copying. Although it can.



You can say - I do not use cp. However, under the hood of Linux, it is used almost everywhere. Many programs, when they need to copy something, use cp



, and not their "bikes".


Why did the coreutils developers make such an ambiguous decision that crossed out half the benefits of CoW file systems?



It turns out, so decided Pádraig Brady, responsible for the development of GNU coreutils.



Here is his logic :



  1. By default, cp does not use CoW, as someone may use copying in order to increase the likelihood of saving the file to disk after the destruction of the file system.
  2. In terms of performance, if there is some kind of delay-sensitive process, you might want the main recording to be made exactly during the copy, so if this happens later, then there may be a lag when repositioning the heads of the hard drive. Please note that starting with version 8.24 coreutils, mv uses the reflink option by default.


Pádraig Brady answer text in English
It's not the default since for robustness reasons one may want a copy to take place to protect against data corruption. Also for performance reasons you may want the writes to happen at copy time rather than some latency sensitive process working on a CoW file and being delayed by the writes possibly to a different part of a mechanical disk. Note that from coreutils v8.24 mv will reflink by default, since it doesn't have the above constraints.


In terms of speed for mv, there is practically no benefit from CoW when moving files. Within a single file system, mv almost always works very fast.



Again, the question arises of the influence of personality on history. By putting several letters in the program source code differently, you can slow down copying for tens / hundreds of millions of users (here you need to take into account that now most people use cloud services in one form or another, even if they don’t have a computer), reduce the efficiency of using drives and increase their sales worldwide.



Pádraig argument parsing



The first argument makes some sense. Inexperienced users can actually backup valuable files on the same file system.



Second argument. If you read further comments about the lags from Pádraig, it turns out that he had in mind database situations where when writing to an existing file there may be a delay due to the file system looking for free space. But in the case of the CoW file system, new sectors will always be searched for recording due to the nature of CoW, as Jan Kanis noted. Therefore, in my opinion, the second argument is untenable.



However, on CoW systems it is really possible to get a delay or even the “Out of space” error when writing to the database file. To avoid this, you need to initially create an empty directory with CoW disabled for the database.



Disable CoW on the directory / file like this:



 chattr +C /dir/file chattr +C /dir/dir
      
      





There is also the option to mount nodatacow. It will be applied to all newly created files.



But what if we want to use CoW when copying by default?



  1. The radical way is to patch coreutils. Perhaps create your package in your private repository.



    Patch cp.c file:



     -x->reflink_mode = REFLINK_NEVER; +x->reflink_mode = REFLINK_AUTO;
          
          



  2. A less radical solution is to write a cp alias for your shell. For bash, for example:



    In the /etc/profile.d



    folder, create a cp_reflink.sh cp_reflink.sh



    with the contents:



     #!/bin/bash alias cp='cp --reflink=auto'
          
          





    This solution will work in almost all cases when cp is accessed from a shell by name. But if / bin / cp will be used in the scripts, then the alias will not work and copying will occur as usual.


File managers and reflink



Status as of October 31, 2019:





Programming languages, system calls and reflink



Most programming languages ​​do not support reflink.

In C, many programmers still copy using loops and buffers .



The sendfile system call does not use reflink.

cp



uses the ioctl system call with the FICLONE flag.



I think if you need to copy something in the code, it is advisable to do it as cp



does or just call cp --reflink=auto



.



conclusions



With the advent of the era of ubiquitous virtualization and SSD, it has become very important to use CoW file systems. They are convenient for creating snapshots, quick copying. In fact, when using CoW when copying, we automatically do data deduplication .



Currently, only 3 file systems support this type of copy: BTRFS, XFS and OCFS2.



I sincerely hope that reflink support will be completed in ZFS and NILFS2, since they already support CoW by internal mechanisms.



However, in all Linux distributions, CoW is disabled when copying files, and we need to either explicitly specify the corresponding keys, or use various tricks such as aliases or patches.



18 years have passed since the announcement of Reiser4, however, so far lightweight CoW copying has not entered our life everywhere.



PS Docker and CoW



Do you know that Docker supports btrfs for its storage? This option must be enabled. It is not selected by default.



The CoW file system in theory is the perfect complement to easy virtualization when different virtual machines use the same kernel.



In my opinion, it is much more organic than OverlayFS and Aufs, which are technological crutches for simulating CoW.



To use Btrfs in Docker you need:



  1. In a fresh (without images and virtual machines) installation of Docker, mount a separate Btrfs volume to /var/lib/docker



  2. Add Options to Docker Launch Configuration File


For Gentoo and Calculate Linux, this is /etc/conf.d/docker







 DOCKER_OPTS="--storage-driver btrfs --data-root /var/lib/docker"
      
      





And here is the complete instruction for all distributions.



Additions from comments



XFS and CoW



constb ( source ): On XFS, CoW is also not always supported. You need to create a file system with mkfs.xfs -m reflink=1



.

Already on the FS created, you cannot “enable” CoW support. plus, on kernels prior to 4.11 the inclusion of this option causes red warings in dmesg that the feature is experimental.



MacOS and CoW



MMik ( source ): In OS X, a similar option to the cp



command is -c



.



It involves the use of the atomic system call clonefile()



, which creates an entry in the file system about a new file in which the references to data blocks are identical to the original. The attributes and extended attributes of the file are copied, as are access control list (ACL) entries, with the exception of the file owner information and with the reset of the setuid / setgid bits. It works within the same file system, requires support by the file system (attribute ATTR_VOL_CAPABILITIES, flag VOL_CAP_INT_CLONE).



Supported since OS X 10.12 (Sierra) and only on the APFS file system.




Acknowledgments:





PPS Direct the noticed errors in a personal. I increase karma for this.






You can experiment with CoW file systems by ordering a virtual machine from RUVDS with a coupon below.






All Articles