Docker in Simple Terms - Dockerfile
In the follow-up part of the series, we'll delve into one of the core mechanics of Docker - the Dockerfile - exploring how it works and best practices.
Introduction
In the previous part, we explored the basics of Docker and the benefits of containerization. In this follow-up, we'll shift our focus to Dockerfiles - the essential tool for building and configuring images according to our needs.
What is a Dockerfile
Dockerfile (or .Dockerfile
) is a text file used to define the configuration and setup of a Docker container. It contains instructions that:
- specify the base image
- install software dependencies
- set environment variables
- copy files into the container
To run the Dockerfile, simply execute the following command from the directory containing the file:
docker build -t image_name .
Structure
Dockerfile does not follow any of the common formats you might be familiar with, such as YAML
, XML
, or JSON
. Instead, it has its own distinct structure as follows:
FROM node:latest
ENV NODE_ENV=production
WORKDIR /app
COPY *.json .
RUN npm install --production
COPY . .
CMD ["node", "server.js"]
As you can see, it's rather straightforward. Regardless, let's break it down instruction by instruction.
Note: Please do note that these are not by any means all of the available instructions. These are simply the most basic ones required to run a container from A to Z. For a full list of commands, refer to docs.docker.com.
FROM
In the most basic scenario, this instruction allows us to define the base image from which our image will be built. Think of it as telling Docker where to get the template from.
In this scenario, it follows the <image>:<tag>
format, where image
has to match an image hosted on the repository, and tag
(optional) has to match an appropriate version tag.
Besides that, it can also be used in multi-stage builds to define follow-up steps, but you'll learn about that further into this article.
Note: The aforementioned repository is hub.docker.com by default unless you've decided to host one yourself.
ENV
The ENV
instruction allows us to set environment variables according to the following format:
ENV key=value
The value
will then be set for all subsequent instructions and will persist when the container is run from the resulting image.
Note: Alternatively, you can use the ARG
instruction, but the values passed with ARG
are not preserved in the final image. They are only used during the image build process.
WORKDIR
This instruction sets the working directory for any WORKDIR
, RUN
, CMD
, ENTRYPOINT
, COPY
, or ADD
instructions that follow it.
This is primarily used to simplify subsequent instructions. Consider the following examples:
Before:
FROM ubuntu:latest
RUN mkdir /host/apps/service
COPY . /host/apps/service
CMD ["python3", "/host/apps/service/app.py"]
After:
FROM ubuntu:latest
WORKDIR /host/apps/service
COPY . .
CMD ["python3", "app.py"]
Note: If the path defined by WORKDIR
doesn't exist, it'll be created regardless of whether it's used in subsequent instructions or not.
COPY
The COPY
instruction copies files and directories from the host machine into the image during the build process. It follows the format:
COPY <source> <destination>
It's commonly used to include application code, configuration files, and other necessary resources into the image so that the container can access and use them during runtime.
RUN
The RUN
instruction allows us to execute commands during the image build process. It enables running any valid shell command, installing dependencies, configuring the environment, and performing other actions required to set up the application inside the image.
The format of RUN
is as follows:
RUN <command>
You can use RUN
multiple times in a Dockerfile to execute different commands and build up the image layer by layer. Each execution of the RUN
command creates a new layer in the image - and as you'll learn later, it is not always what you want.
The changes made by RUN
commands are captured in the image and persist when containers are created from that image.
CMD
The CMD
instruction is used to set the default command or executable that will run when a container is started from the image. It allows you to define the primary process that runs inside the container.
The format of the instruction is as follows:
CMD ["executable", "param1", "param2", ...]
For example:
CMD ["python", "app.py"]
Note: You can use this instruction only once in a Dockerfile. If more than one instance of this instruction is found, the last one takes effect.
Note: Alternatively, one can use ENTRYPOINT
, which is less flexible - this might be a preferred approach in some cases.
Best Practices
Enforcing best practices and conventions often leads to compromising flexibility. This holds true in this context, but the advantages are worth the time investment.
In this section, you'll learn about the common approaches to designing Dockerfiles and the advantages that come with them.
Use Official Images
Using official Docker images and specifying version tags ensures security, reliability, and consistency. Official images are trusted and well-maintained, reducing vulnerabilities. Specific version tags prevent unintended updates and guarantee reproducibility for stable and predictable deployments.
Use Layer Caching
Built-in layer caching mechanism in Docker takes advantage of the layered file system. During builds, unchanged layers are cached, so if you rebuild an image, Docker reuses cached layers, resulting in faster and more efficient builds, especially for large projects. This speeds up development and saves resources.
Use Multi-Stage Builds
Multi-stage builds in Docker allow you to create more efficient and smaller images by separating the build environment from the runtime environment. They offer the following advantages:
-
Reduced Image Size: Multi-stage builds discard unnecessary build-time dependencies, resulting in smaller final images.
-
Improved Security: The final image contains only runtime components, reducing potential vulnerabilities from development tools.
-
Build Isolation: Each build stage is independent, making it easier to manage dependencies and isolate changes.
To achieve multi-stage builds:
-
Use Multiple
FROM
Statements: Define different base images for each stage usingFROM
. -
Copy Build Artifacts: In the build stage, copy the required files, install dependencies, and build your application.
-
Final Stage: In the final stage, use a minimalist base image and copy the built artifacts from the build stage using
COPY --from=<stage_name>
.
Here's a concise example:
# Build Stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# Final Stage
FROM nginx:latest
COPY --from=build /app/dist /usr/share/nginx/html
EXPOSE 80
Use HEALTHCHECK
The HEALTHCHECK
instruction defines a command that Docker periodically runs to check a container's health status, ensuring the application is running correctly. If the health check fails, the container is marked as unhealthy, triggering automated actions based on its status.
For more details, refer to docs.docker.com.
Use .dockerignore
Using .dockerignore is crucial for controlling the contents of your Docker image and optimizing the build process. It allows you to exclude specific files and directories from the image, reducing its size, improving build speeds, and enhancing security. Think of .dockerignore as similar to .gitignore for Docker.
Avoid Hardcoding Secrets
Avoiding hardcoding secrets in Dockerfiles is essential for maintaining security and protecting sensitive information. Docker images may be shared, version-controlled, or accessible to unauthorized users, making hardcoding unsafe.
Instead, use external mechanisms to inject secrets at runtime:
-
Using Environment Variables: Pass secrets as environment variables during container runtime using
-e
flag. -
Using Management Tools: Utilize tools like Azure Key Vault, HashiCorp Vault, or Git-crypt to securely store and manage secrets for retrieval during runtime.
Follow Best Practices for RUN
When using RUN
instructions, consider the following best practices:
-
Combine Commands: Minimize
RUN
instructions by chaining commands with&&
to reduce image layers and optimize the build process. -
Clean Up: Remove unnecessary files and caches in the same
RUN
instruction after installing packages or dependencies to reduce image size. -
Sort Commands: Order commands based on frequency of change to benefit from Docker's caching mechanism. Place less frequently changing commands before more frequently changing ones.
-
Break Lines: Use
\
to break lines and enhance the clarity of your Dockerfile.
Avoid Root Access
Avoiding root access in Dockerfile images is essential for security reasons. Running containers as root increases the risk of potential security vulnerabilities and allows attackers to gain more control over the host system. Using a non-root user reduces the attack surface, enhances container portability, and mitigates security risks.
Final Thoughts
In this brief article, we've covered the basics of Dockerfiles, learning how to build custom images effectively and efficiently.
Next, we'll delve deeper into Docker Compose, Docker's networking features, and its storage solutions. To be notified when the next part is out, consider subscribing to our newsletter!