MLOpsBeginner
Docker Deployment
Containerize AI applications with Docker and Docker Compose
Docker Deployment
Containerize your AI applications for consistent, reproducible deployments.
TL;DR
Use multi-stage Dockerfiles (builder → production) for small images. Docker Compose orchestrates API + Redis + Prometheus + Grafana. Mount models as volumes (not baked into image), run as non-root user, and add health checks for orchestration.
What You'll Learn
- Multi-stage Docker builds for ML apps
- Docker Compose for multi-service deployments
- Volume management for models and data
- Environment configuration
- Production best practices
Tech Stack
| Component | Technology |
|---|---|
| Containerization | Docker |
| Orchestration | Docker Compose |
| Registry | Docker Hub / ECR |
| Base Images | Python 3.11 slim |
Architecture
┌──────────────────────────────────────────────────────────────────────────────┐
│ DOCKER COMPOSE ARCHITECTURE │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ Client │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DOCKER COMPOSE STACK │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ FastAPI │────────▶│ Redis Cache │ │ │
│ │ │ Service │ │ (persistence) │ │ │
│ │ │ :8000 │ │ :6379 │ │ │
│ │ └────────┬────────┘ └─────────────────┘ │ │
│ │ │ │ │
│ │ │ /metrics │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Prometheus │────────▶│ Grafana │ │ │
│ │ │ (scraping) │ │ (dashboards) │ │ │
│ │ │ :9090 │ │ :3000 │ │ │
│ │ └─────────────────┘ └─────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VOLUMES │ │
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │
│ │ │ model-data │ │ redis-data │ │ logs │ │ │
│ │ │ (read-only) │ │ (persistent) │ │ (write) │ │ │
│ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘Project Structure
docker-deployment/
├── src/
│ ├── __init__.py
│ ├── main.py
│ └── config.py
├── models/
├── docker/
│ ├── Dockerfile
│ ├── Dockerfile.dev
│ └── entrypoint.sh
├── docker-compose.yml
├── docker-compose.dev.yml
├── .dockerignore
├── .env.example
└── requirements.txtImplementation
Step 1: Production Dockerfile
# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.11-slim as production
WORKDIR /app
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install runtime dependencies only
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Create non-root user
RUN groupadd --gid 1000 appgroup && \
useradd --uid 1000 --gid appgroup --shell /bin/bash --create-home appuser
# Copy application code
COPY --chown=appuser:appgroup src/ ./src/
COPY --chown=appuser:appgroup docker/entrypoint.sh ./entrypoint.sh
# Create directories for volumes
RUN mkdir -p /app/models /app/data /app/logs && \
chown -R appuser:appgroup /app
# Switch to non-root user
USER appuser
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Entrypoint
RUN chmod +x ./entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]
# Default command
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]Understanding Multi-Stage Builds:
┌─────────────────────────────────────────────────────────────────────────────┐
│ MULTI-STAGE BUILD FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ STAGE 1: BUILDER STAGE 2: PRODUCTION │
│ ┌───────────────────────────┐ ┌───────────────────────────┐ │
│ │ FROM python:3.11-slim │ │ FROM python:3.11-slim │ │
│ │ as builder │ │ as production │ │
│ │ │ │ │ │
│ │ • Install build-essential │ │ • NO build tools │ │
│ │ • Create venv │ │ • COPY venv from builder │ │
│ │ • pip install deps │ │ • Create non-root user │ │
│ │ • Compile C extensions │ │ • Copy app code only │ │
│ │ │ │ │ │
│ │ Size: ~1.5GB │ ──► │ Size: ~500MB │ │
│ │ (Not shipped) │ │ (Final image) │ │
│ └───────────────────────────┘ └───────────────────────────┘ │
│ │
│ Key: Builder stage is DISCARDED. Only /opt/venv is copied to production. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘Security Best Practices in Dockerfile:
| Practice | Implementation | Benefit |
|---|---|---|
| Non-root user | USER appuser | Limits damage if container compromised |
| Read-only models | Volume :ro flag | Prevents accidental modification |
| No cache dirs | --no-cache-dir | Smaller image, no pip cache |
| Health check | HEALTHCHECK instruction | K8s/Compose knows when ready |
| Slim base | python:3.11-slim | Smaller attack surface |
Step 2: Development Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install development dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
curl \
git \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt requirements-dev.txt ./
RUN pip install --no-cache-dir -r requirements.txt -r requirements-dev.txt
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Expose port
EXPOSE 8000
# Development command with hot reload
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]Step 3: Entrypoint Script
#!/bin/bash
set -e
echo "Starting AI Service..."
# Wait for dependencies if needed
if [ -n "$REDIS_HOST" ]; then
echo "Waiting for Redis at $REDIS_HOST:${REDIS_PORT:-6379}..."
while ! nc -z "$REDIS_HOST" "${REDIS_PORT:-6379}" 2>/dev/null; do
sleep 1
done
echo "Redis is ready!"
fi
# Download model if not present
if [ -n "$MODEL_URL" ] && [ ! -f "/app/models/model.pt" ]; then
echo "Downloading model from $MODEL_URL..."
curl -L -o /app/models/model.pt "$MODEL_URL"
fi
# Run database migrations if needed
if [ -n "$RUN_MIGRATIONS" ]; then
echo "Running migrations..."
python -m alembic upgrade head
fi
# Execute the main command
exec "$@"Step 4: Docker Compose for Production
version: '3.8'
services:
api:
build:
context: .
dockerfile: docker/Dockerfile
image: ai-service:latest
container_name: ai-api
restart: unless-stopped
ports:
- "8000:8000"
environment:
- MODEL_PATH=/app/models/model.pt
- LOG_LEVEL=INFO
- REDIS_HOST=redis
- REDIS_PORT=6379
volumes:
- model-data:/app/models:ro
- logs:/app/logs
depends_on:
redis:
condition: service_healthy
networks:
- ai-network
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
redis:
image: redis:7-alpine
container_name: ai-redis
restart: unless-stopped
ports:
- "6379:6379"
volumes:
- redis-data:/data
networks:
- ai-network
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
command: redis-server --appendonly yes
prometheus:
image: prom/prometheus:latest
container_name: ai-prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
networks:
- ai-network
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
grafana:
image: grafana/grafana:latest
container_name: ai-grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
- ./monitoring/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
- ./monitoring/grafana/datasources:/etc/grafana/provisioning/datasources:ro
networks:
- ai-network
depends_on:
- prometheus
volumes:
model-data:
driver: local
redis-data:
driver: local
prometheus-data:
driver: local
grafana-data:
driver: local
logs:
driver: local
networks:
ai-network:
driver: bridgeUnderstanding Docker Compose Service Dependencies:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SERVICE STARTUP ORDER │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ depends_on with condition: service_healthy │
│ │
│ Time ────────────────────────────────────────────────────────────► │
│ │
│ Redis: [Starting]──[Running]──[Health OK ✓] │
│ │ │
│ API: [Waiting...................]──[Starting]──[Running] │
│ │
│ WITHOUT health condition: │
│ Redis: [Starting]──[Running but not ready] │
│ │ │
│ API: [Starting]──[CRASH! Redis not accepting connections] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘Volume Types Explained:
┌──────────────────────────────────────────────────────────────────────────────┐
│ VOLUME TYPES │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ NAMED VOLUMES (managed by Docker) │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ volumes: │ │
│ │ model-data: # Docker manages location │ │
│ │ redis-data: # Persists between restarts │ │
│ │ │ │
│ │ Use: model-data:/app/models:ro # :ro = read-only │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ BIND MOUNTS (host path mapped to container) │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ volumes: │ │
│ │ - ./src:/app/src:ro # For development hot-reload │ │
│ │ - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro │ │
│ │ │ │
│ │ Use when: You need to edit files on host and see changes │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘Step 5: Development Compose
version: '3.8'
services:
api:
build:
context: .
dockerfile: docker/Dockerfile.dev
container_name: ai-api-dev
ports:
- "8000:8000"
environment:
- DEBUG=true
- LOG_LEVEL=DEBUG
- REDIS_HOST=redis
volumes:
- ./src:/app/src:ro
- ./models:/app/models:ro
- ./tests:/app/tests:ro
depends_on:
- redis
networks:
- ai-network-dev
redis:
image: redis:7-alpine
container_name: ai-redis-dev
ports:
- "6379:6379"
networks:
- ai-network-dev
networks:
ai-network-dev:
driver: bridgeStep 6: Dockerignore
# Git
.git
.gitignore
# Python
__pycache__
*.py[cod]
*$py.class
*.so
.Python
.venv
venv/
ENV/
.eggs/
*.egg-info/
dist/
build/
# IDE
.idea/
.vscode/
*.swp
*.swo
# Testing
.pytest_cache/
.coverage
htmlcov/
.tox/
# Docker
docker-compose*.yml
Dockerfile*
.docker/
# Documentation
docs/
*.md
!README.md
# Local files
.env
.env.local
*.log
logs/
# Models (use volumes instead)
models/*.pt
models/*.onnx
# Misc
.DS_Store
Thumbs.dbStep 7: Prometheus Configuration
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: []
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'ai-api'
static_configs:
- targets: ['api:8000']
metrics_path: '/metrics'
scrape_interval: 10sStep 8: Environment Configuration
# API Configuration
DEBUG=false
LOG_LEVEL=INFO
MODEL_PATH=/app/models/model.pt
# Redis
REDIS_HOST=redis
REDIS_PORT=6379
# Monitoring
GRAFANA_PASSWORD=secure_password_here
# Model Download (optional)
# MODEL_URL=https://example.com/model.pt
# Resource Limits
MAX_WORKERS=4
MAX_BATCH_SIZE=32Step 9: Build and Run Scripts
#!/bin/bash
set -e
IMAGE_NAME=${1:-ai-service}
IMAGE_TAG=${2:-latest}
echo "Building $IMAGE_NAME:$IMAGE_TAG..."
docker build \
--file docker/Dockerfile \
--tag "$IMAGE_NAME:$IMAGE_TAG" \
--build-arg BUILD_DATE="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--build-arg VERSION="$IMAGE_TAG" \
.
echo "Build complete!"
docker images "$IMAGE_NAME:$IMAGE_TAG"#!/bin/bash
set -e
ENV=${1:-production}
echo "Deploying to $ENV..."
if [ "$ENV" = "development" ]; then
docker-compose -f docker-compose.dev.yml up -d
else
docker-compose -f docker-compose.yml up -d
fi
echo "Waiting for services to be healthy..."
sleep 10
# Check health
curl -f http://localhost:8000/health || {
echo "Health check failed!"
docker-compose logs api
exit 1
}
echo "Deployment successful!"Usage Commands
# Build production image
docker build -f docker/Dockerfile -t ai-service:latest .
# Run development environment
docker-compose -f docker-compose.dev.yml up
# Run production environment
docker-compose up -d
# View logs
docker-compose logs -f api
# Scale API service
docker-compose up -d --scale api=3
# Stop all services
docker-compose down
# Clean up volumes
docker-compose down -vBest Practices
Image Optimization
| Practice | Benefit |
|---|---|
| Multi-stage builds | Smaller final image |
| Slim base images | Reduced attack surface |
| Layer caching | Faster builds |
| .dockerignore | Smaller build context |
Security
- Run as non-root user
- Use read-only volumes where possible
- Scan images for vulnerabilities
- Don't store secrets in images
Resource Management
- Set memory and CPU limits
- Use health checks
- Configure restart policies
- Monitor resource usage
Key Concepts Recap
| Concept | What It Is | Why It Matters |
|---|---|---|
| Multi-Stage Build | Builder stage installs deps, production stage copies result | Smaller images (no build tools in final) |
| Non-Root User | Run container as appuser, not root | Security best practice, limits damage |
| Health Check | HEALTHCHECK instruction in Dockerfile | Docker/K8s knows when container is ready |
| Volumes | Mount models/data from host or named volumes | Don't bake large files into images |
| Entrypoint Script | Shell script that runs before CMD | Wait for deps, download models, run migrations |
| .dockerignore | Files to exclude from build context | Faster builds, smaller context |
| Compose depends_on | Service startup order with health condition | API waits for Redis to be healthy |
| Resource Limits | CPU/memory limits in compose deploy | Prevent runaway containers |
Next Steps
- LLM Caching - Add intelligent caching
- Monitoring Dashboard - Monitor your deployments