Why Containerisation?
- Everyone has a different operating system
- steps to run a project can vary based on the operating system
- Extremely harder to keep track of dependencies as a project grows
- What if there was a way to describe your project’s configuration in a single file
- What if that could be run in an isolated environment
- Makes local setup of OS projects a breeze
- Makes installing auxiliary services very simple
Definition
Containerization involves building self-sufficient software packages that perform consistently, regardless of the machines they run on.
It’s taking the snapshot of a machine, the filesystem, and letting you use and deploy it as a construct.
note:- Allows for container orchestration which makes deployment a breeze.
Docker has 3 parts
-
CLI
- The CLI is where Docker commands are executed.
-
Engine
- Docker Engine is the heart of Docker and is responsible for running and managing containers. It includes:
- Docker Daemon: Runs on the host, managing images, containers, networks, and storage.
- Docker Engine is the heart of Docker and is responsible for running and managing containers. It includes:
-
Registry
- Docker Registry is a system for storing and sharing Docker images. It can be public or private, allowing users to upload and download images for easy collaboration and deployment.
- Docker Hub: The default public registry with millions of images.
- Private Registries: Custom, secure repositories organizations use to control their images.
- Docker Registry is a system for storing and sharing Docker images. It can be public or private, allowing users to upload and download images for easy collaboration and deployment.
Images v/s Containers
A docker image behaves like a template from which consistent containers can be created.
if docker was a traditional virtual machine, the image could be likened to the ISO used to install your VM. This isn’t a robust comparison, as Docker differs from VMs in concept and implementation, but it’s a useful starting point nonetheless.
Images define the initial filesystem state of new containers. They bundle your application’s source code and its dependencies into a self-contained package runtime. Within the image, filesystem content is represented as multiple independent layers.
How to Containerize an App
Below is an example of a simple Dockerfile for a Node.js backend application:
# Use Node.js version 20 as the base image
FROM node:20
# Set up a working directory inside the container
WORKDIR /usr/src/app
# Copy the contents of the current directory to the working directory in the container
COPY . .
# Install dependencies specified in package.json
RUN npm install
# Expose the container's port 3000 to the host machine
EXPOSE 3000
# Define the command to run the application when the container starts
CMD ["node", "index.js"]
the first four lines of code i.e., FROM node:20
WORKDIR /usr/src/app
COPY . .
RUN npm install
these run while the image is being created but the line
CMD ["node", "index.js"]
executes only while the container starts. The expose 3000 is only exposing a port so we won’t be considering that.
Build and Run the Docker Image
#Build the Docker image
docker build -t my-node-app .
#Run the Docker container
docker run -p 3000:3000 my-node-app
docker build -t my-node-app .
- here the -t flag signifies the tag-name
docker run -p 3000:3000 my-node-app
- all the requests coming to my machine at port 3000 should be routed to port 3000 of the container
Caching and Layers
FROM node:20 #layer1
WORKDIR /usr/src/app #layer2
COPY . . #layer3
RUN npm install #layer4
EXPOSE 3000
CMD ["node", "index.js"]
When building Docker images, each command in the Dockerfile creates a new layer. Docker caches these layers to speed up future builds. However, if one layer changes, all layers after it must be rebuilt.
Why layers?
- caching
- Re-using layers
- Faster build time
Problem: Layer Dependency in Docker Images
Layer 3: The COPY . . command copies your entire project into the container, depending on your project files.
Issue: If you update any files (like index.js), Docker detects this change and rebuilds Layer 3 and all layers after it, such as RUN npm install. This can slow down the build, especially if later steps are time-consuming.
# Solution to the above-mentioned problem statement
FROM node:20 #layer1
WORKDIR /usr/src/app #layer2
COPY package *. #layer3
RUN npm install #layer5
COPY . . #layer4
EXPOSE 3000
CMD ["node", "index.js"]
How Reordering Solves the Problem
-
Layer 1:
FROM node:20
- Base Image: Sets up the environment. Rarely changes, so it’s cached.
-
Layer 2:
WORKDIR /usr/src/app
- Working Directory: Stable and rarely changes.
-
Layer 3:
COPY package*.json ./
-
Copy Dependencies: Copies
package.json
. Rebuilt only if dependencies change.
-
Copy Dependencies: Copies
-
Layer 4:
RUN npm install
- Install Dependencies: Installs Node.js packages. Cached unless dependencies change.
-
Layer 5:
COPY . .
- Copy Project Files: Copies the rest of the project. Rebuilt only if files change.
Benefits:
-
Faster Rebuilds: Only the final layer (
COPY . .
) rebuilds on code changes. -
Dependency Isolation: Keeps
npm install
cached unlesspackage.json
changes.
Volumes & Networks
- Docker is used to run DBs/Redis/Auxiliary services locally.
- This is useful when we don’t want to pollute our filesystem with unnecessary dependencies.
- We can bring up or bring down those services to clean our machine.
There is a problem
-We want the local databases to retain information across restarts(can be achieved using volumes.).
-We want to allow one docker container to talk to another docker container(can be achieved using networks.).
we shall discuss this further:-
Volumes :
- Used for persisting data across starts.
- Specifically useful for things like a database.
docker volume create volume_db
docker run -v volume_db:/data/db -p 27017:27017 mongo
docker run -v volume_name:/data/db -p 27017:27017 mongo
–Purpose: Runs a MongoDB container.
–Volume: Mounts the volume_name volume to store MongoDB data at /data/db inside the container.
–Port: Maps port 27017 on the host to port 27017 in the container, allowing access to MongoDB from the host machine.
Networks
-
Each container has its own local host. So we’ll need to form a network for the containers to communicate.
-
Containers have their own network
-
One container can’t talk to the host machine or other containers
docker network create my-custom-network
docker run -p 3000:3000 --name backend --network my-custom-network <image_tag>
docker run -v volume_name:/data/db --name mongo --network my-custom-network -p 27017:27017 mongo
Multi-Stage Builds
What if we want to allow the development backend to hot reload?
But the production environment to not?
Hot Reloading: Ensure your npm run dev script in package.json uses a tool like nodemon for hot reloading.
FROM node:20 AS Base
WORKDIR /usr/src/app
COPY . .
RUN npm install
FROM Base AS development
COPY . .
CMD ["npm", "run", "dev"]
FROM Base AS production
COPY . .
RUN npm prune --production
CMD ["npm", "run", "start"]
while building dev:-
docker build . --target development -t tag-name:dev
docker run -e MONGO_URI=mongodb://127.0.0.1:27017/my_db -p 3000:3000 -v .:/usr/src/app myapp:dev
while building prod
docker build . --target production -t tag-name:dev
docker run -e MONGO_URI=mongodb://127.0.0.1:27017/my_db -p 3000:3000 -v .:/usr/src/app myapp:dev
Docker Compose & YAML Files
Docker Compose
Docker Compose is a tool for defining and running multi-container Docker applications. With Docker Compose, you can use a YAML file to configure your application’s services, networks, and volumes. Then, with a single command, you can create and start all the services from your configuration.
key commands:-
- Start services: docker compose up
- Stop and remove services: docker compose down
- View logs: docker compose logs
- List services: docker compose ps
Example for a docker-compose.yml file:-
version: '3'
services:
web:
build: .
ports:
- "3000:3000"
networks:
- frontend
- backend
depends_on:
- db
environment:
DB_HOST: db
DB_PORT: 5432
REDIS_HOST: redis
REDIS_PORT: 6379
db:
image: postgres
volumes:
- db_data:/var/lib/postgresql/data
networks:
- backend
environment:
POSTGRES_DB: mydb
POSTGRES_USER: myuser
POSTGRES_PASSWORD: mypassword
redis:
image: redis
networks:
- backend
nginx:
image: nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
networks:
- frontend
volumes:
db_data:
networks:
frontend:
backend:
Source link
lol