Why Docker?

Docker is a great containerization technology for running your applications in a stateless manner.

This way, all builds have similar behavior. All you need to do is spin up a docker container that runs ubuntu and you’re good to go. There’s no source of confusion by running docker because there’s no uncertainty to whether your system’s architecture or configurations is causing some bug.

The most basic way to spin up a docker container is either taking a docker image and docker run‘ing it directly:

$docker run -it ubuntu:latest will let us run interactive tty into the ubuntu image. As you can see, the result is that we have a brand new ubuntu image.

Dockerfile

FROM hiimivantang/ubuntu-anaconda:latest
MAINTAINER Ray Zhang "peifeng2005@gmail.com"

COPY . /app

WORKDIR /app
RUN /root/anaconda3/bin/python setup.py install
ENTRYPOINT ["/root/anaconda3/bin/python", "core_module/app.py"]

Say we wanted to use docker for ML. We would want the docker container to setup with all the required installations. We can either do this in two ways:

  1. Use a docker image that contains whatever you needed.
  2. Take some blank docker image like ubuntu:latest, and run your custom installations, like installing pip.

I opted to try a bit of both: Whatever I could get out of the image, I took. I used anaconda because of the scipy package issues(python package management takes care of most things well, just not native binaries like BLAS or other C libraries).

In the first line, I took some guy’s anaconda installation running on ubuntu version 14.04:

FROM hiimivantang/ubuntu-anaconda:latest

You can find many of these docker images on dockerhub. It’s like a github but images for docker online.

Copy the URL {dockerurl}.com/{user}/{image_name}. In this case we can see {user} is hiimivantang, and {image_name} is ubuntu-anaconda. Take any version(displayed on dockerhub), like latest. It will be latest by default.

MAINTAINER Ray Zhang "peifeng2005@gmail.com"

If you decide to publish your modified image, then you should have a MAINTAINER field. This is self explanatory.

COPY . /app

Copies everything from the current directory and writes it into the docker container’s /app directory.

Be careful, If you call docker run from some other location that’s not its $PWD, then you won’t copy the correct files.

In our case, I’m confident enough that we won’t call it outside. For docker-compose, it also cd’s into its directory beforehand.

WORKDIR /app

WORKDIR moves our current directory to /app. Why don’t we just call a RUN cd /app? Refer to the next section.

RUN /root/anaconda3/bin/python setup.py install

Whatever is called in RUN does not persist. If we modified a file in RUN, it will, but our current $PWD in our environment will be lost everytime we call RUN.

However, if you wanted to temporarily cd somewhere, you can do RUN cd some_dir && perform_action, and it will be executed correctly.

In this case, we are doing a setup.py install, which will install the necessary packages from PyPI, so we’re good.

ENTRYPOINT ["/root/anaconda3/bin/python", "core_module/app.py"]

Now, this one’s a bit tricky. There are actually 2 commands that do almost identical things (CMD and ENTRYPOINT), and each of them have modes called shell mode and exec mode.

shell mode is when you’re running the ENTRYPOINT from the bash shell, as in $/bin/bash -c "your commands here", and exec mode is when you’re running the ENTRYPOINT directly, as in $"your commands here".

Our example is in exec mode. To execute shell mode, simply run:

ENTRYPOINT /root/anaconda3/bin/python core_module/app.py

instead. You can see in your $ps -aux that the command from the above is actually prepended with a /bin/bash -c.

It’s generally suggested to run exec mode, since it’s more consistent.

Entrypoint vs CMD

If you’re familiar with Docker, you’d be wondering why I chose ENTRYPOINT over CMD. They both do “pretty much” the same thing, except that:

  1. CMD and ENTRYPOINT set different parts of the docker container metadata.

  2. CMD can be used to pass in extra commands to ENTRYPOINT’s exec mode.
    1. Done by: CMD ['params', 'here', 'for', 'entrypoint'].
  3. CMD is the default command run by docker containers if ENTRYPOINT does not exist.

Docker Compose

Docker-compose is a wrapper around Dockerfiles. It doesn’t provide a lot of utility over Dockerfiles, but is a very convenient way to tie two containers together, in say, a network. Note that everything I wrote below can be done in Dockerfile. I was just playing around with the YAML format.

In addition, take using docker-compose.yml with a grain of salt. Professionals have criticized the use of compose, and suggest either using pure Dockerfile or docker-swarm/kubernetes for complex applications. However, compose is a great tool for testing and development as the setup time and typing is close to none.

A very common thing people use docker for is to run a server, that connects to a DB in the backend. On docker’s website, there’s an example of flask-redis stack for backend server development. However, people usually use MySQL as the backend, so I wanted to modify that “hello world” example.

Here is my docker-compose.yml:

version: '1'
services:
  web:
    build: server/
    ports:
        - "5000:5000"
    volumes:
        - ./server/:/app
    links:
        - db
  db:
    build: db/
    environment:
        MYSQL_ALLOW_EMPTY_PASSWORD: "yes"
    ports:
        - "3306:3306"
    volumes:
        - ./db/localdb-run.sh:/localdb-run.sh:rw
        - ./db/data/:/var/lib/mysql:rw

This YAML file declares a couple things:

  1. 'web', 'db' are names of my services. They can connect to each other easily via a DNS resolution setup by Docker. For example, the hostname server will have a specific IP associated with it when MySQL tries to contact it, and vice versa.

  2. ports: are exposed ports in docker. By default, docker does not expose its ports to the outside world. We can map our host environment’s port to docker’s port if that port is available. If you have MySQL running on 3306 on your computer, and you try to start this container, you will fail!

  3. build: are where Dockerfiles are. This is a relative file path.

  4. environment: are environment variables that bash can utilize in docker’s environment. In addition, things like MySQL also inject their environment variables, so this is actually required to set up our db. In a real application setting, don’t set that field to “yes” :).

  5. volumes: are mounts. We can mount a file system from our host env to the docker containers. Mounting allows us to share files with the docker container. However, files that existed previous to mounting in either docker or the host will not be reflected unless you do a COPY command or something of the sort in your Dockerfile.

  6. links: are linkages of network between services. This is not necessary for that purpose, but it is necessary if you want to impose a bootup order. It’s like defining a DAG and running toposort through it for bootup.

Here are my 2 docker files:

server:

FROM hiimivantang/ubuntu-anaconda:latest
MAINTAINER Ray Zhang "peifeng2005@gmail.com"

COPY . /app

WORKDIR /app
RUN /root/anaconda3/bin/python setup.py install

ENTRYPOINT ["/root/anaconda3/bin/python", "core_module/app.py"]

db:

FROM mysql:latest
MAINTAINER Ray Zhang "peifeng2005@gmail.com"

RUN apt-get update && apt-get install vim -y

COPY . /db/files
WORKDIR /db

I installed vim in my docker container for convenience, you don’t actually need that there. I’m using the image’s default ENTRYPOINT as well, which calls $mysqld -u'$user' -p'$pwd' ..., with the list of configurations in the environment. Note that $user is probably not the actual variable, just figuratively.

If you want to see exactly how the wrapper works, and how I use mysql and flask(which are not the topics of this blog), check out my github:

Flask-MySQL-Docker

Footnote

Thanks to my friend Jesse Cai for debriefing me on some of these topics!