Docker Storage

In this tutorial we are going to discuss about docker storage drivers and file systems. We’re going to see where and how docker stores data and how it manages file systems of containers.

File System

Let us start with how a docker stores data on the local file system. When you install Docker on a system it creates this folder structure at /var/lib/docker.

We have multiple folders under it called aufs, containers, image, volumes etc. This is where Doctor stores all its data by default.

When I say data I mean files related to images and containers running on the docker host.

For example, all files related to containers are stored under the containers folder and the files related to images are stored under the image folder. Any volumes created by the docker containers are created under the volumes folder.

Now let’s just understand where Docker stores its files and in what format.

So how exactly does Docker stored the files of an image and a container. We understand that we need to understand Dockers layered architecture.

Layered Architecture

Let’s quickly recap something we learned when Docker builds images it builds these in a layered architecture.

FROM Ubuntu:18.04

RUN apt-get update && apt-get -y install python

RUN pip install flask flask-mysql

COPY . /opt/myapp/src/

ENTRYPOINT FLASK_APP=/opt/myapp/src/app.py flask run

Each line of instruction in the docker file creates a new layer in the Docker image with just the changes from the previous layer.

For example the first layer is a base Ubuntu operating system followed by the second instruction that creates a second layer which installs all the APT packages.

And then the third instruction creates a third layer which with the python packages followed by the fourth layer that copies the source code over.

And then finally the fifth layer that updates the entry point of the image.

Since each layer only stores the changes from the previous layer. It is reflected in the size as well.

If you look at the base one to image it is around and 120 MB in size. The apt packages that are installed is around 300 MB and then the remaining layers are small.

FROM Ubuntu:18.04

RUN apt-get update && apt-get -y install python

RUN pip install flask flask-mysql

COPY app2.py /opt/myapp/src/

ENTRYPOINT FLASK_APP=/opt/myapp/src/app2.py flask run

Advantages of layered architecture

To understand the advantages of this layered architecture, let’s consider a second application this application has a different docker file but is very similar to our first application.

As in it uses the same base image as a Ubuntu uses as the same python and flask dependencies but uses a different source code to create a different application. And so a different entry point as well.

When I run the docker build command to build a new image for this application since the first three layers of both the applications are the same Docker is not going to build the first three layers.

Instead it reuses the same three layers it built for the first application from the cache and only creates the last two layers with the new sources and the new entry point. This way Docker builds images faster and efficiently saves disk space.

This is also applicable if you were to update your application code. Whenever you update your application code such as the app.py in this case Docker simply reuses all the previous layers from cache and quickly rebuilds the application image by updating the latest source code

Thus saving us a lot of time, during rebuilds and updates.

Better understanding about layered architecture

Let’s rearrange the layers bottom up so we can understand it better at the bottom.

We have the base Ubuntu layer, then the packages, then the dependencies and then the source code of the application and then the entry point.

All of these layers are created when we run the docker build command to form the final Docker image so all of these are the Docker image layers.

Once the build is complete you cannot modify the contents of these layers and so they are read only and you can only modify them by initiating a new build.

When you run a container based off of this image using the docker run command Docker creates a container based off of these layers and creates a new writable layer on top of the image layer.

The writable layer is used to store data created by the container such as log files by the applications, any temporary files generated by the container or just any file modified by the user on that container.

The life of this layer though is only as long as the container is alive. When the container is destroyed this layer and all of the changes stored in it are also destroyed.

Remember that the same image layer is shared by all containers created using this image.

If I were to log into the newly created container and say create a new file called temp.txt. It would create that file in the container layer which is read and write.

We just said that the files in the image layer are read only meaning you cannot edit anything in those layers.

Simple example

Let’s take an example of our application code. Since we bake our code into the image. The code is part of the image layer and as such is read only.

After running a container, what if I wish to modify the source code to say test a change? Remember the same image layer may be shared between multiple containers created from this image.

So does it mean that I cannot modify this file inside the container? No I can still modify this file but before I save the modified file, Docker automatically creates a copy of the file in the read write layer and I will then be modifying a different version of the file in the read write layer.

All future modifications will be done on this copy of the file in the read write layer. This is called copy on write mechanism.

The image layer being read only just means that the files in these layers will not be modified in the image itself. So the image will remain the same all the time until you rebuild the image using the docker build command.

What happens when we get rid of the container? All of the data that was stored in the container layer also gets deleted. The change we made to the app.py and the new temp file we created will also get removed.

volumes

So what if we wish to persist this data? For example if we were working with a database and we would like to preserve the data created by the container we could add a persistent volume to the container.

To do this first create a volume using the docker volume create command. So when we run the docker volume create data_volume command it creates a folder called data_volume under the var/lib/docker volumes directory.

Then when I run the docker container using the docker run command, I could mount this volume inside the docker containers read write layer using the -v option like this.

$ docker run -v data_volume:/var/lib/mysql mysql

So I would do a docker run -v then specify my newly created volume name followed by a colon (:) and the location inside my container which is the default location where MySQL stored data and that is var/lib/mysql and then the image name mysql.

This will create a new container and mount the data volume we created into var/lib/mysql folder inside the container.

So all data written by the database is in fact stored on the volume created on the docker host. Even if the container is destroyed the data is still active.

Now what if you didn’t run the docker volume create command to create the volume before the docker run command.

For example if I run the docker run command to create a new instance of MySQL container with the volume data_volume2, which I have not created yet. Docker will automatically create a volume named data_volume2 and mount it to the container.

volume mounting

You should be able to see all these volumes, if you list the contents of the var/lib/docker volumes folder. This is called volume mounting.

As we are mounting in volume created by Docker under the var/lib/docker/volumes folder.

But what if we had our data already at another location. For example let’s say we have some external storage on the docker host at /data and we would like to store database data on that volume and not in the default var/lib/docker/volumes folder.

In that case we would run a container using the command docker run -v. But in this case we will provide the complete part to the folder we would like to mount. That is /data/msql and so it will create a container and mount the folder to the container. This is called bind mounting.

$ docker run -v /data/mysql:/var/lib/mysql mysql

So there are 2 types of mounts.

Volume mounting
Bind mount.

Volume mount mounts a volume from the volumes directory and bind mount mounts a directory from any location on the docker host.

Remember that using the -v is an old style. The new way is to use -mount option. The –mount is the preferred way as it is more verbose. So you have to specify each parameter in a key=value format.

For example the previous command can be written with the -mount option as following.

$ docker run --mount type=bind,source=/data/mysql,target=/var/lib/mysql mysql

So who is responsible for doing all of these operations? Maintaining the layered architecture, creating a writable layer moving files across layers to enable copy and write etc. It’s the storage drivers.

Storage Drivers

Docker uses storage drivers to enable layered architecture. Some of the common storage drivers are

AUFS
BTRFS
ZFS
Device Mapper
Overlay
Overlay2

The selection of the storage driver depends on the underlying OS being used.

For example with Ubuntu, the default storage driver is a ufs. Whereas this store as driver is not available on other operating systems like fedora or cent OS. In that case device mapper may be a better option.

Docker will choose the best stories driver available automatically based on the operating system. The different storage drivers also provide different performance and stability characteristics.

So you may want to choose one that fits the needs of your application and your organization.

Docker Storage