Docker toolbelt

Table of Contents

1 Intro

Hey! In this doc we're going to learn a few things about docker, and about development environments. Stay tuned.

2 What do we want?

The aim of this doc is to learn about docker and docker compose through a practical project. This project is a development environment generator. Which I'll be building with you as we go.

3 When do we want it?

ASAP

4 Why do we want it?

To have awesome pipelines, development environments, and to learn about containers.

5 How are we going to do it?

At first, it's going to be about docker. Then we'll use docker-compose, and finally we'll generate docker-compose files automatically, like there's no tomorrow.

6 What is a container?

Do you know chroot? ok, imagine you create a chroot and you run a something inside it. You are going to run it successfully, and that process is only going to have access to that part of the filesystem. But, the process you're running, will it be able to see other processes? what would htop say? Also, what about ping www.google.com?

Containers is how we name cgroups and networking and lxc working together to isolate processes from other processes, or group them together.

7 ETOOMANYFLAGS

we'll use docker. All container systems have too many flags. There's a lot of things they can tune, but the amount of flags is scary. There's no way around it.

8 What are containers made of?

A container needs a few things to run.

  • filesystem. The same as you'd need an .img which is the "saved state" that the vm runner will use as a filesystem, container running systems need also a fake filesystem to use as its main filesystem.
  • networking. Different containers can be running at the same time, and you can choose to make them see each other or not. Or you can choose to give each one an IP on the same network as the host, or not. You can even give them the --net host option so that all ports opened will live in the same host as the host machine. In that case, if you run 1 container that opens port 3000 and another that opens 4000, you can reach all services from all containrs, or even the host, using localhost:XXXX.
  • cpu

9 can I run 2 containers on the same filesystem?

you can run 2 containers from the same Image, making them 2 different running containers, and they should be independent from one another. They will have access to the same filesystem (except for the last layer, which is the RW one, which is independent for each container).

If you modify the filesystem, one container won't see the modification of the other, because they run "copies" of the Image filesystem. The shared layers are read only, so each container feels it runs in complete isolation (unless you mount parts of the filesystems from one to the other)

10 patterns

We'll start seeing different patterns, like small exercises, and we'll start adding them up. It's a bit like the * schemer style, only worse (because I'm not Dan).

11 Build

docker build image-directory will create an image from a list of commands in a Dockerfile.

Images are the filesystem of the container. Think a chroot, or think the .img or .vdi file you download to run virtualbox on it.

12 Dockerfile

A dockerfile contains the recipe to create the image and leave the image in your desired state, ready to be run later on a container.

It's like a shellscript, with some special commands, but it's an imperative script that gets run by docker build. Its goal is to leave a filesystem in the state that it can run your app when docker run img.

Q: What runs the dockerfile? in which environment?

13 images can extend other images

Dockerfiles can be divided in 3 parts

FROM XXXYYYY     # <- 1 From
.....            # <- 2 Build more things
.....
.....
.....
.....
CMD/ENTRYPOINT/RUN #<- run command

You usually start from an image that already has some contents in its filesystem. That's why usually Dockerfiles start with the name of a distro FROM ubuntu:20.04. If you'd like a completely empty filesystem to start from, there's an images called scratch, that looks like a recently formatted filesystem.

CMD/ENTRYPOINT/RUN part is used to give a default entrypoint or command to run when we run a new container from that image. The command can be overriden, but it's nice to have good defaults.

14 can I only "inherit" from an official image?

You can inherit from any image. Images can be in a container registry (usually on the internet), or locally, where you have a local collection of already downloaded images, ready to be used.

Also, when you're using docker build, you are creating a new image.

15 Examples?

echo <<EOF
  FROM ubuntu:20.04
EOF >Dockerfile
docker build .

....
Successfully build 7e0aa2e69a15

At this point, 7e0aa2e69a15 is the exact filestystem as the ubuntu:20.04. it's like subclassing a class without changing any method.

16 How do we know it?

diff <(docker inspect 7e0aa2e69a15) <(docker inspect ubuntu:20.04)

17 Let's play with docker inspect

Let's try to change the default command that our container will run.

echo <<EOF
  FROM ubuntu:20.04
  CMD ["tail -f /dev/null"]
EOF >Dockerfile
docker build .

....
Successfully build aabbccdd

Now, let's diff them:

diff <(docker inspect aabbccdd) <(docker inspect ubuntu:20.04)

Now we see the differences, and they mostly make sense. I guess you now can be confident with inspect. Everything should make sense.

18 Where/how are those images stored?

Instead of having a .iso, .img, or .vdk, images are stored as a directory with a bunch of data and metadata.

19 Dockerfiles are like

git clone ubuntu:20.04 --depth=1
cmd1
git add -A; git commit -m 'step1'
cmd2
git add -A; git commit -m 'step2'
....

If you imagine 2 dockerfiles that are equal except in the last line, all but last lines are producing the same images. Docker is smart enough to share them, so the "blobs" (in git parlance) are unique (to keep the analogy working, you've got to obviate the fact that git would create different commits because they happen on different dates)

20 Running a container

Once we have the image filesystem, we're ready to bring its contents to life. When we run a process inside that image, jailed in a docker network (described in the docker run command), we are "starting" the container.

In that moment, you can think of a last ephemeral commit in that chain of commits being added. We could be modifying files there, and we would see them, but when we stop and kill the container, that layer would disappear.

21 Minimal flags

  • docker run --rm -ti image command . Those flags the basic combination. --rm tells it to clean after itself, removing whatever it created to make a container from image. -ti binds an interactive terminal, so we can communicate with it. You want that when running things locally 90% of the time.
  • docker run --net=host . This flattens all networking of that container to use the same ip as the host, so everything lives in the same machine, and can get to the other via localhost. Problem with it is that ports may collide. just be aware it's a shortcut and you'll probably want to fix it at some point.
  • docker run -v $PWD:/my-dir mounts a directory from host to container.

22 testing images and containers

docker run --rm -ti ubuntu:20.04 bash. This will open a bash starting from an ubuntu:20.04 image.

In another terminal, run docker run --rm -ti ubuntu:20.04 bash again.

You can see that both containers are running independently (files created in one are not seen in the other one).

And each one of them thinks it's unique. try to run top in each one of them. They shouldn't see each other.

But still, if you run docker ps from the host, you can see both containers run from the same image.

23 Running multiple commands

docker run --rm -ti ubuntu "apt update | apt install foo" doesn't work, but if you want to run several commands at the same time from the "run" command, To test things out from a script, you can use

docker run --rm -ti ubuntu bash -ci "apt update && apt install net-tools"

24 Commit

I said that a running container has that ephemeral last layer, where your modifications happen, and they survive a stop/start, but they don't survive a kill or rm.

But there's a way to make the current state permanent, and effectively create an image, from where new containers can be spawned or new Dockerfiles can start FROM.

docker run -ti ubuntu:20.04 bash
# inside the container
echo 'hi' >/tmp/foo

And in the host

$ docker ps
CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS                                            NAMES
c6e13a79c7b3        ubuntu                      "bash"                   15 seconds ago      Up 14 seconds                                                        determined_wu
$ docker commit c6e13a79c7b3
sha256:d6581aae94a58d9be27b6fecf576630fead7a05b003be12796152110d7f0b010
$ docker run --rm -ti d6581aae94a58d9be27b6fecf576630fead7a05b cat /tmp/foo
hi

See? we created a new image, and we started a container from it

25 Env vars

Build time .env, runtime -e .

Env vars that appear in the compose can be overriden, others no (imagine if we could be able to mount and update LD_PRELOAD, not good).

In projects/empty-dict-value there's an example of different ways to interact with them:

  • docker-compose run -e foo=123 mything0 ; docker eec -ti … bash ; echo $foo

26 If I am a program running on a container, what do I see?

try it: We said there are mostly 3 things that get isolated: network, filesystem and processes.

26.1 Processes

Instead of bash, you can run any other program as the main one. For example, docker run -ti --rm ubuntu top. This will run top, and list all processes top can see. Not many, really.

26.2 Network

Let's play with netstat. Netstat is not installd by default, so we'll have to install it each time.

# isolated
docker run --rm -ti ubuntu bash -ci "apt update && apt install net-tools && netstat -atunp"
# shared network
docker run --net=host --rm -ti ubuntu bash -ci "apt update && apt install net-tools && netstat -atunp"

26.3 filesystem

You should be able to peek through directories by mounting volumes.

docker run --rm -ti -v /tmp:/mything ubuntu ls /mything

27 Can a container run more than one process?

Yes. Let's try something

docker run --rm -ti ubuntu bash
# inside the container's shell
top

There you see 2 processes, bash and top.

28 Using cat from ubuntu container

docker run --rm ubuntu cat /etc/lsb-release

29 mount a file and cat it from inside

docker run --rm -v /etc/lsb-release:/etc/lsb-release ubuntu cat /etc/lsb-release

30 Playing with scratch image

We'll create a filesystem starting from the absolute minimum. scratch, looks like an empty filesystem.

to do the bare minimum explorations possible, we'll get a busybox from our host and add it to the image.

cp /usr/bin/busybox ./busybox

FROM scratch
COPY ./busybox /usr/bin/busybox
COPY ./busybox /bin/sh
CMD ["/usr/bin/busybox", "pwd"]

After that, docker build --tag bare-min . to build the image, and then you can start playing with docker run --rm -ti bare-min

If you try other command, like docker run --rm -ti bare-min ls, docker will complain that "ls" executable is not in $PATH. Here we see that if we override the command from the shell, CMD is replaced by what we entered. The way to fix it is to enter the full command, so docker run --rm -ti bare-min /usr/bin/busybox ls will work.

31 Entrypoint

A way to lock the main program that runs, but let users modify the arguments to this, is to use ENTRYPOINT

32 Network

Footgun Alert:

  • If you create a docker network via docker network create foo, the name stays exactly foo
  • In a docker compose, the network is named after the project, so if the structure is like: myproject/docker-compose.yml, and you docker-compose up and then docker network ls, you'll see a myproject_default network.
  • You can name networks in docker-compose:

    version: '3.5'
    services:
      httpbin:
        # this should basically match docker_reverse_proxy.tf
        image: kennethreitz/httpbin
        ports:
          - 8888:80
        networks:
          my_net:
    networks:
      my_net:
    

    Run docker network ls and you'll see that the network is named…. myproject_my_net.

  • In order to give it an absolute name, you can add a "name" to the network.

    version: '3.5'
    services:
      httpbin:
        # this should basically match docker_reverse_proxy.tf
        image: kennethreitz/httpbin
        ports:
          - 8888:80
        networks:
          my_net:
    networks:
      my_net:
        name: my_net
    

    Now docker network ls will tell us that the network is called my_net. For real :+1:

33 Deploy SPA Frontend and Backend

In an SPA that makes calls to the backend, the isolation that we would love in a docker-compose falls appart, because the external world has to have access to both frontend AND the API backend.

docker-compose can publish random free ports, but in this case, the frontend has to know beforehand what is the port number that backend is gonna use. That means that for the practical perspective, we have to publish "8888" in a fixed place. That means that we'll have a harder time running multiple instances of the stack (ports will collide unless you play around with passing env_vars with random ports to publish in the backend AND frontend).

        +----------------------------------+
        |     Browser                      |
        |   http://localhost:3000          |
        |   js:ajax(http://localhost:8888) |
        |                                  |
        +----------------------------------+
            ^                  ^
            | 3000             | 8888
            |                  |
+-----------+------------------+---------+
|           |           +------+------+  |
| +---------+-------+   |             |  |
| |                 |   | backend     |  |
| |  frontend       |   |             |  |
| |                 |   |             |  |
| |                 |   |             |  |
| |                 |   +-------------+  |
| +-----------------+                    |
|                                        |
|                       +-------------+  |
|                       |             |  |
|                       |  db         |  |
|                       |             |  |
|                       |             |  |
|                       |             |  |
|                       +-------------+  |
|                                        |
|                                        |
+----------------------------------------+

34 container names in a docker-compose

A very similar situation to the network one happens with containers themselves. That previous service httpbin results in a container named myproject_httpbin_1. You can fix the name of the container with container_name: httpbin1. That way you can have global names.

35 Traefik and whole whole shebang

So, Traefik is this reverse proxy that is able to manage your containers in a fairly automatic way.

For every container in a docker network that you manage with traefik, traefik will check the container's labels, and check for traefik labels that are used for configuring the routes. It's kinda distributed in the sense that you don't go poke traefik app to configure it, but it detects when containers are upped and checks by itself.

It also has some default rules, so you can, for example, easily match the host header to redirect to the container_name.

But, how are the container names matched? is it the foo, or myproject_foo? or myproject_foo_1?

It turns out it's smart enough to figure out what you mean.

version: '3.5'
services:
  traefik:
    image: traefik:v2.0
    ports:
      - "8888:80"
      - "8080:8080"
    command:
      # https://doc.traefik.io/traefik/v2.0/providers/docker/#defaultrule
      # - --providers.docker.defaultRule=Host(`{{ normalize .Name }}`
      - --api.insecure=true
      - --providers.docker=true
      # - --entrypoints.web.address=:80
      # - --providers.docker.network="my_net",
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      hosting:

  httpbin1:
    image: kennethreitz/httpbin
    # container_name: httpbin2
    networks:
      hosting:
    labels:
      - traefik.enable=true
      - traefik.http.routers.httpbin1.rule=Host(`httpbin1`)
      # - traefik.http.routers.httpbin1.entrypoints=web

networks:
  hosting:
    name: my_net

with this, and docker-compose up, http :8888/status/202 Host:httpbin1 you can start trying things:

If you use a container_name, the Host will try to match it exactly. If there is no container_name, traeffik still recognizes Host with just the service name (httpbin1 in this case). If you use myproject_httpbin_1, it WON'T work, so it's DWIM, but it has its own warts.

https://doc.traefik.io/traefik/user-guides/docker-compose/basic-example/ https://doc.traefik.io/traefik/v2.0/providers/docker/#defaultrule

36 can I join?

New containers can join the network, and will be playing the game without having been started on the same docker-compose.

  • docker run --rm --net my_net --name httpbin2 --hostname httpbin2 kennethreitz/httpbin,
  • http :8888/status/202 Host:httpbin2

37 http_proxy, dns, and networks

./projects/network_proxy contains a testing project that has:

  • coredns with a fixed ip
  • httpbin
  • ubuntu (we'll use it as our shell)
  • tinyproxy. Lives in the same network
  • tinyproxy_host lives in the host namespace, so it's accessible from localhost:8888 from the host, or host.docker.internal from the other containers.

    We can start testing things around, by doing docker-compose up, and on another shell, docker-compose exec main bash. you can run ./script.sh once inside

38 Dok

39 gojira

40 ryu

41 mba

42 hma

So far, the most minimalistic approach.

The goal for this project is to have a dev environment for our devs, with as little code besides the docker-compose.yml file. There are a few basic traits:

  • Support for local non-commited customizations. The 3 files involved are hma.yml hma.mac.yml hma.override.yml. First one is the main docker-compose file, the .mac. has mac-only modifications, and hma.override.yml is a local non-commited file that will be merged on top of the other two.
  • Augment docker-compose, but keep docker-compose knowledge relevant and useful.

    hma uses docker-compose under the hood, and proxies all commands to docker-compose. It adds the command do, that tries to be smart about the thing to execute, but all old docker-compose knowledge is directly applicable here.

    hma up docker-compose -f hma.yml -f hma.mac.yml -f hma.override.yml up
    hma exec ... docker-compose -f hma.yml -f hma.mac.yml -f hma.override.yml exec ...
    hma do docker-compose -f hma.yml -f hma.mac.yml -f hma.override.yml exec <main_service> $HMA_CLI
    hma do db docker-compose -f hma.yml -f hma.mac.yml -f hma.override.yml exec db $HMA_CLI
    hma do db bash docker-compose -f hma.yml -f hma.mac.yml -f hma.override.yml exec db bash
  • Sane defaults. The <main_service> is guessed by the name of the directory

Key points (tricks) of this approach:

42.1 Mount every container's root to ~.hma/.hma-home/

We'll mount every services' /root directory to a common directory in our host machine.

This way, all your hma containers will have a persistent $HOME, meaning that you'll get persistent history, and a fully customizable environment. Try modifying ~/.hma/.hma-home/.bashrc to add a new alias, or add a script to ~/.hma/.hma-home/bin/ (and add /root/bin to the $PATH in .bashrc).

An extra benefit is that all containers share this space at the same time, so you can copy files among containers and the host without needing to remember docker cp. It's just like a shared drive.

volumes:
  - $HOME/.hma/.hma-home:/root:rw

42.2 Customize the main app's service

The main container needs a few modifications. Change the command and/or entrypoint to a waiting loop like tail -f /dev/null.

This way we keep the container alive and we can start our app manually from within a hma do , which will be an equivalent of docker-compose exec $HMA_CLI.

We should add the environment variable HMA_CLI, pointing to some meaninful entrypoint for our dev container.

Build a developer friendly container to run the app. This container can come FROM a microsoft devcontainer, or you can build your own. It should be dev-friendly to do some file navigation and basic development in.

The volumes to mount are the main one for the app code (where we'll overwrite the working_dir to), aws credentials, and of course, the trick to mount the $HOME dir of the user to our "portal" directory in our host.

services:
  #....
  harbormaster:
    command: tail -f /dev/null
    build:
      context: $PWD/.devcontainer
      dockerfile: Dockerfile
    volumes:
      - $HOME/.aws/credentials:/root/.aws/credentials:rw
      - $PWD:/app/:rw
      - $HOME/.hma/.hma-home:/root:rw
    working_dir: /app
    environment:
      HMA_CLI: bash

42.3 Dockerfile

To prepare your main container, ($PWD/.devcontainer/Dockerfile in the example above), you should leave space for extension on the .override.yml file. Here's an example:

FROM mcr.microsoft.com/vscode/devcontainers/java:11

ARG EXTRA_DEPS
RUN apt-get update && apt-get -y install postgresql-client-12 $EXTRA_DEPS

With this trick, the final user will be able to install custom packages by editing the hma.override.yml

harbormaster:
  build:
    args:
      EXTRA_DEPS: nvim iputils-ping httpie

42.4 Development

Since we mount the app directory from the host to the container, development can keep happening on the host, using your favourite editor.

In case of clojure there are a few tunnings more to do:

  • Expose a port from the container to the host
  • Configure lein/deps.edn to use that port for the repl.
  • go-to-definition will break using this approach, because when your editor (emacs in my case) will ask for a function location through cider, it will get a path and location according to the machinery inside the container, that most likely won't match your external path.

    environment:
      LEIN_REPL_PORT: 7888
      LEIN_REPL_HOST: "0.0.0.0"
    expose:
      - '7888'
    ports:
      - '7888:7888'
    
    (setq cider-path-translations '(("/app/harbormaster/source" . "~/workspace/harbormaster")
                                    ("/app/metabase/source" . "~/workspace/metabase")
                                    ("/root/.m2/" . "~/.hma/.hma-home/.m2/")))
    

42.5 Code

#!/usr/bin/env bash
die()  { echo "$*" ; exit 1; }
mkcd() { mkdir -p "$1"; cd "$1"; }

pg-create-dbs-file() {
  cat <<'EOF'
#!/bin/bash

pg_conf_file=/var/lib/postgresql/data/postgresql.conf

echo "\
log_statement = 'all'
log_disconnections = off
log_duration = on
log_min_duration_statement = -1
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 2048
pg_stat_statements.track = all
pg_stat_statements.max = 10000
" >>$pg_conf_file

for database in $(echo $POSTGRES_MULTIPLE_DATABASES | tr ',' ' '); do
  echo "Creating database $database"
  psql -U $POSTGRES_USER <<-EOSQL
    CREATE DATABASE $database;
    GRANT ALL PRIVILEGES ON DATABASE $database TO $POSTGRES_USER;
    CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
EOSQL

  psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" -d "$database" <<-EOSQL
    CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
EOSQL
done
EOF
}

hma-setup() {
  [ -f $1 ] && grep -siq 'HMA_CLI' $1 || die "Not an hma-compatible repo."
  [ -d $HOME/.mba/.mba-home/ ] || mkdir -p $HOME/.mba/.mba-home
  [ -d $HOME/.mba/resources/postgres/docker-entrypoint-initdb.d/ ] || (
    mkcd $HOME/.mba/resources/postgres/docker-entrypoint-initdb.d/ &&
      pg-create-dbs-file > 00-create-pg-db.sh && chmod +x 00-create-pg-db.sh
  )
}
hma-exec () {
  local where=${1:-$(basename $PWD)}
  shift
  local cmd=${@:-'$HMA_CLI'}
  hma exec "$where" sh -lic "$cmd"
}

# hma up [maildev db]
# hma down
# hma do [service] [$HM_CLI]
# hma do db
# hma do db bash
hma() {
  fname="hma"
  hma-setup "${fname}.yml"
  if [[ $1 == "do" ]]; then
    shift
    hma-exec "$@"
  else
    local files=("-f ${fname}.yml")
    [[ "$OSTYPE" == "darwin"* ]] && [ -f "./${fname}.mac.yml" ] && files+=("-f ./${fname}.mac.yml")
    [ -f "./${fname}.override.yml" ] && files+=("-f ./${fname}.override.yml")
    docker-compose ${files[*]} "$@"
  fi
}

hma "$@"

43 Docker as a tool to help you hack on development

44 dev

45 see logs for a container

Author: Raimon Grau

Emacs 26.1 (Org mode 9.1.9)

Validate