Post

Dockerfile

Creating efficient and optimized Dockerfiles is the key to building reliable containers. Whether you’re containerizing a simple web app or a complex microservice, this comprehensive guide walks you through everything you need to know about writing Dockerfiles - from basic syntax to advanced optimization techniques and real-world examples that you can reference whenever you get stuck.

The Basics of Dockerfiles

A Dockerfile is a text file that contains instructions for building a Docker image. It’s written in a simple syntax that’s easy to understand, even for those new to Docker. Each instruction in a Dockerfile creates a new layer in the image, which allows for efficient caching and rebuilding.

Dockerfile Fundamentals

  • Base Image: Every Dockerfile starts with a FROM instruction that specifies the base image to build upon.
  • Instructions: A series of commands that modify the image (installing packages, copying files, etc.).
  • Layer Caching: Docker caches each layer to speed up subsequent builds.
  • Build Context: The set of files available to the Docker daemon during the build process.

Anatomy of a Dockerfile

At a minimum, a Dockerfile needs to specify:

  1. Base Image: The starting point for your container (e.g., FROM ubuntu:20.04).
  2. Dependencies: Any additional packages or libraries your application requires.
  3. Application Code: Your actual application files copied into the image.
  4. Runtime Configuration: Commands to run when the container starts, environment variables, exposed ports, etc.

Building an Image

Once you’ve created a Dockerfile, you can build an image using the docker build command:

1
2
3
4
5
6
7
8
# Basic build command
docker build -t my-image:tag .

# Build with custom Dockerfile name
docker build -t my-image:tag -f CustomDockerfile .

# Build with build arguments
docker build -t my-image:tag --build-arg VERSION=1.0 .

Dockerfile Instructions Cheat Sheet

Here’s a list of the most common Dockerfile instructions:

Instruction Description
FROM Sets the base image
LABEL Adds metadata to the image
ENV Sets environment variables
WORKDIR Sets the working directory inside the container
COPY Copies files/directories to the image
ADD Like COPY, but supports remote URLs and archives
RUN Executes a command during build
CMD Specifies default command to run when container starts
ENTRYPOINT Configures a container to run as an executable
EXPOSE Documents which port the app will listen on
VOLUME Defines mount points
ARG Defines build-time variables
USER Specifies user to run commands as
ONBUILD Adds a trigger instruction for child builds
SHELL Changes the default shell used in RUN
HEALTHCHECK Configures how Docker checks container health
STOPSIGNAL Sets the system call signal for container exit

Best Practices for Writing Dockerfiles

When writing Dockerfiles, it’s important to follow best practices to ensure that your containers are reliable, efficient, and secure. Here’s a comprehensive guide to Dockerfile best practices:

1. Use Minimal Base Images

The smaller the base image, the faster it downloads, deploys, and the smaller your attack surface.

1
2
3
4
5
6
7
8
# GOOD: Use minimal images when possible
FROM alpine:3.16

# BETTER: Use distroless images for even smaller footprint
FROM gcr.io/distroless/static-debian11

# AVOID: Using full OS images when not needed
# FROM ubuntu:22.04  # Much larger than necessary for most applications

2. Leverage Multi-Stage Builds

Multi-stage builds allow you to use one image for building and another for running, resulting in a much smaller final image.

1
2
3
4
5
6
7
8
9
10
11
12
13
# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# Runtime stage - only copy what's needed
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

3. Order Instructions by Stability

Place instructions that change less frequently at the top of your Dockerfile to maximize layer caching.

1
2
3
4
5
6
7
8
9
10
# GOOD: Dependencies first, code changes later
FROM node:18-alpine
WORKDIR /app

# These rarely change
COPY package*.json ./
RUN npm install

# These change frequently
COPY . .

4. Use COPY Instead of ADD

ADD has more features (like auto-extracting archives), but COPY is more explicit and predictable.

1
2
3
4
5
# GOOD: Use COPY for simple file copying
COPY ./app /app

# Only use ADD when you need its special features
ADD https://example.com/big.tar.gz /tmp/

5. Create a .dockerignore File

Exclude files not needed in the build context to speed up builds and reduce image size.

1
2
3
4
5
6
7
# Example .dockerignore file
node_modules
npm-debug.log
.git
.env
*.md
tests/

6. Run as Non-Root User

Running containers as a non-root user improves security by preventing privilege escalation.

1
2
3
4
5
6
7
8
9
10
11
12
13
FROM node:18-alpine
WORKDIR /app

# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

COPY --chown=appuser:appgroup . .
RUN npm install

# Switch to non-root user
USER appuser

CMD ["node", "app.js"]

7. Use Specific Tags for Base Images

Avoid using latest tags as they can lead to unexpected changes and break your builds.

1
2
3
4
5
# GOOD: Specific version tag
FROM python:3.10.8-slim

# AVOID: Latest tag can change unexpectedly
# FROM python:latest

8. Combine RUN Instructions

Combine related RUN commands with && to reduce the number of layers and image size.

1
2
3
4
5
6
7
8
9
10
11
# GOOD: Single RUN instruction with cleanup
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# AVOID: Multiple RUN instructions for related operations
# RUN apt-get update
# RUN apt-get install -y python3
# RUN apt-get install -y python3-pip

9. Use HEALTHCHECK Instruction

Implement health checks to monitor container health and enable automatic restarts.

1
2
3
4
5
6
FROM nginx:alpine
COPY ./app /usr/share/nginx/html

# Add a health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

10. Use Environment Variables for Configuration

Make your containers configurable through environment variables.

1
2
3
4
5
6
7
8
9
10
11
FROM node:18-alpine
WORKDIR /app
COPY . .

# Define default environment variables
ENV NODE_ENV=production \
    PORT=3000 \
    LOG_LEVEL=info

EXPOSE ${PORT}
CMD ["node", "app.js"]

11. Add Useful Metadata with LABEL

Labels help with image organization, versioning, and documentation.

1
2
3
4
5
6
FROM alpine:3.16

LABEL maintainer="Your Name <your.email@example.com>" \
      version="1.0" \
      description="This image runs our awesome app" \
      org.opencontainers.image.source="https://github.com/yourusername/repo"

12. Use Docker Compose for Multi-Container Applications

Docker Compose simplifies managing multi-container applications with dependencies.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# docker-compose.yml example
version: '3.8'
services:
  webapp:
    build: ./webapp
    ports:
      - "8080:80"
    depends_on:
      - database
  database:
    image: postgres:14
    volumes:
      - db-data:/var/lib/postgresql/data
    environment:
      POSTGRES_PASSWORD: example
volumes:
  db-data:

Dockerfile Structure and Examples

Basic Structure

Here’s a minimal Dockerfile example for a Node.js application:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Use an official base image
FROM node:18-alpine

# Set working directory
WORKDIR /app

# Copy package files and install dependencies
COPY package*.json ./
RUN npm install

# Copy source code
COPY . .

# Expose app port
EXPOSE 3000

# Start the application
CMD ["npm", "start"]

Real-World Examples

Python Flask Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment variables
ENV FLASK_APP=app.py \
    FLASK_ENV=production \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

# Expose the port
EXPOSE 5000

# Run the application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

Java Spring Boot Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Build stage
FROM maven:3.8.6-openjdk-18-slim AS build
WORKDIR /app

# Copy pom.xml and download dependencies
COPY pom.xml .
RUN mvn dependency:go-offline

# Copy source code and build
COPY src ./src
RUN mvn package -DskipTests

# Runtime stage
FROM openjdk:18-jdk-slim
WORKDIR /app

# Copy the built JAR file
COPY --from=build /app/target/*.jar app.jar

# Set JVM options
ENV JAVA_OPTS="-Xms512m -Xmx1024m"

# Expose the port
EXPOSE 8080

# Run the application
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]

Go Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Build stage
FROM golang:1.19-alpine AS build

WORKDIR /app

# Download dependencies
COPY go.mod go.sum ./
RUN go mod download

# Copy source code and build
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server ./cmd/server

# Runtime stage
FROM alpine:3.16

WORKDIR /app

# Install CA certificates for HTTPS
RUN apk --no-cache add ca-certificates

# Copy the binary from the build stage
COPY --from=build /app/server /app/server

# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Expose the port
EXPOSE 8080

# Run the application
CMD ["/app/server"]

React Frontend with Nginx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Build stage
FROM node:18-alpine AS build

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci

# Copy source code and build
COPY . .
RUN npm run build

# Runtime stage
FROM nginx:alpine

# Copy nginx configuration
COPY nginx.conf /etc/nginx/conf.d/default.conf

# Copy built files from build stage
COPY --from=build /app/build /usr/share/nginx/html

# Expose port
EXPOSE 80

# Start nginx
CMD ["nginx", "-g", "daemon off;"]

Specialized Examples

Database Container (PostgreSQL)

1
2
3
4
5
6
7
8
9
10
11
12
FROM postgres:14

# Set environment variables
ENV POSTGRES_USER=myuser \
    POSTGRES_PASSWORD=mypassword \
    POSTGRES_DB=mydb

# Copy initialization scripts
COPY ./init-scripts/ /docker-entrypoint-initdb.d/

# Expose the PostgreSQL port
EXPOSE 5432

Machine Learning Model Serving

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    libgomp1 && \
    rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model and application code
COPY ./model/ ./model/
COPY ./src/ ./src/

# Expose API port
EXPOSE 8000

# Start the model server
CMD ["uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "8000"]

Microservice with Health Checks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
FROM node:18-alpine

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci --only=production

# Copy application code
COPY . .

# Expose service port
EXPOSE 3000

# Add health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# Start the service
CMD ["node", "server.js"]

Advanced Dockerfile Techniques

1. BuildKit Features

BuildKit is Docker’s next-generation builder with advanced features. Enable it with:

1
2
3
4
5
# Linux/macOS
export DOCKER_BUILDKIT=1

# Windows PowerShell
$env:DOCKER_BUILDKIT=1

Secret Mounting

Securely use secrets during build without including them in the final image:

1
2
3
4
5
6
7
# syntax=docker/dockerfile:1.4
FROM alpine

# Mount a secret during build time only
RUN --mount=type=secret,id=mysecret cat /run/secrets/mysecret

# Build with: docker build --secret id=mysecret,src=./secret.txt .

SSH Agent Forwarding

Use your local SSH keys during build for private repository access:

1
2
3
4
5
6
# syntax=docker/dockerfile:1.4
FROM alpine

RUN --mount=type=ssh git clone git@github.com:private/repo.git

# Build with: docker build --ssh default .

2. Multi-Architecture Builds

Create images that work on multiple CPU architectures:

1
2
3
4
5
# Create and use a multi-architecture builder
docker buildx create --name mybuilder --use

# Build for multiple platforms
docker buildx build --platform linux/amd64,linux/arm64 -t username/image:latest --push .

3. Optimizing Layer Caching

Selective File Copying

Copy only what’s needed to maximize cache hits:

1
2
3
4
5
6
7
# Copy files that rarely change first
COPY go.mod go.sum ./
RUN go mod download

# Copy files that change more frequently later
COPY internal/ ./internal/
COPY cmd/ ./cmd/

Using Build Arguments for Versioning

1
2
3
4
5
6
7
8
9
10
# Define build arguments
ARG NODE_VERSION=18
ARG APP_VERSION=1.0.0

# Use them in the image
FROM node:${NODE_VERSION}-alpine

LABEL version="${APP_VERSION}"

# Build with: docker build --build-arg NODE_VERSION=16 --build-arg APP_VERSION=1.1.0 .

4. Entrypoint Scripts

Use entrypoint scripts for initialization and configuration:

1
2
3
4
5
6
7
FROM alpine:3.16

COPY entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/entrypoint.sh

ENTRYPOINT ["entrypoint.sh"]
CMD ["echo", "Hello World"]

Example entrypoint script (entrypoint.sh):

1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/sh
set -e

# Initialize application
if [ "$1" = 'init' ]; then
  echo "Initializing..."
  # Setup code here
  exit 0
fi

# Execute the command passed as arguments
exec "$@"

5. Squashing Layers

Reduce the number of layers in your final image:

1
2
# Build with squash option
docker build --squash -t myapp:latest .

6. Using .dockerignore Effectively

Example of a comprehensive .dockerignore file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Version control
.git
.gitignore
.svn

# Build artifacts
node_modules
dist
build
target
*.o
*.obj

# Development files
.env*
*.log
n.vscode
.idea
*.md
DOCKER*
docker-compose*

# Testing
test
tests
*.test.js
*.spec.js

# OS specific
.DS_Store
Thumbs.db

Troubleshooting Common Dockerfile Issues

When working with Dockerfiles, you might encounter various issues. Here’s how to troubleshoot and fix common problems:

1. Build Failures

Issue: Package Installation Failures

1
2
3
4
5
# PROBLEM: Package not found
RUN apt-get install -y package-name

# SOLUTION: Update package lists first
RUN apt-get update && apt-get install -y package-name

Issue: Permission Denied

1
2
3
4
5
6
7
# PROBLEM: Permission denied when executing a script
COPY ./script.sh /app/
RUN ./script.sh

# SOLUTION: Make the script executable
COPY ./script.sh /app/
RUN chmod +x /app/script.sh && ./script.sh

2. Container Runtime Issues

Issue: Container Exits Immediately

This often happens when the main process exits or there’s no foreground process.

1
2
3
4
5
# PROBLEM: Container exits immediately
CMD node app.js

# SOLUTION: Use proper JSON array format for CMD
CMD ["node", "app.js"]

Issue: Cannot Connect to Service

1
2
3
4
5
6
7
8
# PROBLEM: Service not accessible from outside
# Missing EXPOSE directive or not publishing port

# SOLUTION: Expose port in Dockerfile
EXPOSE 8080

# And when running:
# docker run -p 8080:8080 your-image

3. Image Size Issues

Issue: Image Too Large

1
2
3
4
5
6
7
8
# PROBLEM: Installing unnecessary packages
RUN apt-get update && apt-get install -y build-essential python3 vim git

# SOLUTION: Only install what's needed and clean up
RUN apt-get update && \
    apt-get install -y --no-install-recommends python3 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

4. Caching Issues

Issue: Cache Not Being Used Effectively

1
2
3
4
5
6
7
8
# PROBLEM: Copying all files before installing dependencies
COPY . /app/
RUN npm install

# SOLUTION: Copy only package files first
COPY package*.json /app/
RUN npm install
COPY . /app/

5. Security Issues

Issue: Running as Root

1
2
3
4
5
6
7
# PROBLEM: Running container as root
CMD ["node", "app.js"]

# SOLUTION: Create and use non-root user
RUN useradd -r -u 1001 -g appgroup appuser
USER appuser
CMD ["node", "app.js"]

Docker Compose Integration

While Dockerfiles define individual container images, Docker Compose allows you to define and run multi-container applications. Here’s how they work together:

Basic Docker Compose Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# docker-compose.yml
version: '3.8'

services:
  web:
    build: 
      context: ./web
      dockerfile: Dockerfile
    ports:
      - "8080:80"
    environment:
      - NODE_ENV=production
    depends_on:
      - api
      - db
  
  api:
    build: ./api
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgres://postgres:password@db:5432/mydb
    depends_on:
      - db
  
  db:
    image: postgres:14
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=mydb

volumes:
  postgres_data:

Advanced Docker Compose Features

Using Build Arguments

1
2
3
4
5
6
7
8
services:
  web:
    build:
      context: ./web
      dockerfile: Dockerfile
      args:
        - NODE_VERSION=18
        - ENVIRONMENT=production

Health Checks

1
2
3
4
5
6
7
8
9
services:
  api:
    build: ./api
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Resource Constraints

1
2
3
4
5
6
7
8
9
10
11
services:
  api:
    build: ./api
    deploy:
      resources:
        limits:
          cpus: '0.50'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M

Development vs. Production

Using multiple compose files for different environments:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Base configuration
docker-compose.yml

# Development overrides
docker-compose.dev.yml

# Production overrides
docker-compose.prod.yml

# Run development environment
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up

# Run production environment
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up

Example development overrides (docker-compose.dev.yml):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
services:
  web:
    build:
      context: ./web
      dockerfile: Dockerfile.dev
    volumes:
      - ./web:/app
      - /app/node_modules
    environment:
      - NODE_ENV=development
  
  api:
    volumes:
      - ./api:/app
    environment:
      - DEBUG=true

Additional Resources

Official Documentation

Tools

Tutorials and Guides

Conclusion

Writing efficient and optimized Dockerfiles is critical for building reliable, secure, and efficient containers. By following the best practices and examples in this guide, you can create containers that are:

  • Smaller - Reducing download times and attack surface
  • Faster - Building and deploying more quickly
  • More secure - Following security best practices
  • More maintainable - Using clear, consistent patterns

Remember that Dockerfile optimization is an iterative process. As your application evolves, revisit your Dockerfiles to ensure they remain efficient and follow current best practices. With the examples and troubleshooting tips in this guide, you should be well-equipped to handle most Dockerfile challenges you encounter.

Final Words

Happy Containerizing! Whether you’re a seasoned Docker user or just getting started, this guide has provided you with everything you need to know to create efficient and optimized Dockerfiles. Remember, practice makes perfect, so start containerizing today!

This post is licensed under CC BY 4.0 by the author.