Docker storage (docker root) migration problem history

Not later than a couple of days ago it was decided on one of the servers to move docker storage (the directory where the docker stores all the files of containers, images) to a separate partition, which

had more capacity. The task, it would seem, is trivial and did not portend trouble ...



Getting started:



1. Stop and kill all the containers of our application:



docker-compose down
      
      





if there are a lot of containers, and they are in different compose, you can do this:



 docker rm -f $(docker ps -q)
      
      





2. Stop the docker daemon:



 systemctl stop docker
      
      





3. Move the directory to the desired location:



 cp -r /var/lib/docker /docker/data/storage
      
      





4. We inform the docker to the demon to look in a new directory. There are several options: either use the -g flag to point the daemon to a new path, or the systemd configs, which we used. Well, or symlink. I won’t paint it much, the Internet is full of manuals on porting docker root to a new location.



5. We start the docker demon, and see that it looks where it should be:



 systemctl status docker
      
      





In one of the lines of output we should see:

 ├─19493 /usr/bin/dockerd --data-root=/docker/data/storage
      
      





We made sure that the option was passed to the daemon, now we ’ll check if it has applied it (thanks inkvizitor68sl )!

 docker info | awk '/Root Dir/ {print $NF}'
      
      





6. We start our application:

 docker-compose up -d
      
      





7. Check



And here the fun begins, DBMS, MQ, all is well! The base is intact, everything works ... except for nginx. We have our own nginx assembly with kerberos and courtesans . And viewing the container’s logs indicated that he could not write to / var / tmp - Permission denied. I stretch my fingers with whiskey and try to analyze the situation ... How is that? The docker image did not change. We just moved the directory. It always worked, and here it is for you ... For the sake of experiment, I went into the container with pens and changed the rights to this directory, there were root, root 755 , gave root, root 777 . And everything started ... A thought sounded in my head - some kind of nonsense ... I thought, well, maybe I didn’t take into account something ...



I decided that we liked the access rights to the files during the transfer. They stopped the application, the docker daemon, deleted the new directory and made a copy of the / var / lib / docker directory already using rsync -a



.



I think now everything is fine, we’re raising the docker, the application.



III ... the problem remains ... My eye twitched. I rushed to the console of my virtual machine, where I am running various tests, I had this nginx image, and I climbed inside the container, and here the root, root 777 rights are on the / var / tmp directory. That is, the same ones that I had to put up with my hands . But the images are identical!



Everywhere used FS xfs.



I compared through the command



 docker inspect my-nginx:12345
      
      





All hashes are identical, all one to one. Both on the server and on my virtual machine. I deleted the local nginx image and spooled again with the registry, which for several reasons is on the same machine. And the problem is the same ... Now my second eye twitched.



I don’t remember what thoughts were in my head, besides the screams of “AAAAAAA” and other things. On the street at the 4th o'clock in the morning, the docker sources were used to understand the principle of hashing image layers. He opened the third can of energy. And in the end, it dawned on me that hashing takes into account only the file, its contents, but NOT RIGHT OF ACCESS ! That is, in some mysterious way, rights were beaten, and selinux is disabled, acl is not used, sticky bit is not present.



I deleted the local image, also deleted the image from docker registry and started it again. And it worked. It turns out that during the transfer the rights were beaten, both inside the local image and inside the image lying in the registry. As I said, for a number of reasons, it was located on the same wheelbarrow. And as a result, in the same directory / var / lib / docker.



And anticipating the question of whether they tried to return the docker's gaze to the old catalog - no, they did not, alas, circumstances did not allow. Yes, and really wanted to figure it out.



After writing this article, the solution to the problem seems obvious to me, but at the time of analysis it did not seem to be such. I honestly googled, and did not find similar situations.



Bottom line: I solved the problem, I did not understand the reason = (



If someone knows / guesses / had a vision about the possible causes of this problem - I will be extremely glad to see you in the comments!



All Articles