How long does it take to set up your current project locally? Two days? Two hours?
With Docker this can be done virtually in one command, and only network bandwidth is the limit.

In this article, I show how Docker can help in setting up a runtime environment, a database with a predefined dataset, get it all together and run isolated from everything else on your machine.

Let’s start with the goals:

  1. I want to have isolated Java SDK, Scala SDK and SBT (build tool)
  2. I want to be still able to edit and rebuild my project code easily from my IDE
  3. I need a MongoDB instance running locally
  4. Last but not least, I want to have some minimal data set in my MongoDB out of the box
  5. Ah, right, all of the above must be downloaded and configured in a single command

All these goals can be achieved by running and tying together just three Docker containers. Here is a high-level overview of these containers:

High-level design overview

It’s impressive that such a simple setup brings all listed benefits, isn’t it? Let’s dive in.

Step I: Development Container Configuration

First two goals are covered by Dev Container, so let’s start with that one.
Our minimal project structure should look like this:

The structure will become more sophisticated as the article progresses, but for now it’s sufficient. Dockerfile is the place where Dev Container’s image is described (if you aren’t yet familiar with images and containers, you should read this first). Let’s look inside:

Official OpenJDK 1.8 image is specified as a base image in the 1st line of Dockerfile.
Note: official repositories are available for many other popular platforms.
They are easily recognizable by naming convention: it never contains slashes i.e. it always just a repository name without author name.

Install SBT and specify environment variable SBT_OPTS.

SBT-specific trick (skip it safely if you don’t use SBT) to speed up containers starting time.
As you may know, nothing is eternal in a container: it can (and normally is) destroyed every now and then.
By making some required dependencies a part of the image, we download them just once and in this way significantly speed up containers building time.

Declare port 8080 as the one listened by an application inside the container. We’ll refer it later to access the application.

Declare a new volume under /app folder and start next commands from there. We will use it in a moment to make all project files accessible from two worlds: from the host and from the container.

Default command to run SBT interactive mode on container startup. For build tools without interactive mode (like Maven), this can be CMD [/bin/bash] .

Now we can already test the image:

The first command builds an image from our Dockerfile and gives it a name myfancyimage . The second command builds and starts the container from the image. It binds current folder to the container’s volume ( $(pwd):/app ) and binds host port 9090 to container’s exposed port 8080.

Step II: MongoDB Container Configuration

Ok, now it’s time to bring in some data. We start with adding Mongo Engine container, and later supply it with sample data snapshot. As we’re about to run multiple containers linked together, it’s convenient to describe how to run these containers via docker-compose configuration file. Let’s add to the project’s root docker-compose.yml with the following content:

The commands for building and running myfancyimage are transformed to the dev-container.

Container mongo-engine with MongoDB 3.2 from official DockerHub repository.

Link mongo-engine to dev-container: mongo-engine will start prior to dev-container and they will share a network. MongoDB is available to dev-container by url “mongodb://mongo-engine/”.

Let’s try it:

It’s important to add --service-ports flag to enable configured ports mapping.

Step III: Data Container Configuration

All right, here comes the hardest part: sample data distribution. Unfortunately, there’s no suitable mechanism for Docker Data Volumes distribution. Although, there exist a few Docker volumes managers (e.g. Flocker, Azure Volume driver etc), these tools serve other goals.
Note: an alternative solution would be to restore data from DB dump programmatically or even generate it randomly. But this approach is not generic i.e. involves specific tools and scripts for each DB, and in general is more complicated.

The data distribution mechanism we’re seeking must support two operations:

  1. replicate a fairly small dataset from a remote shared repository to local environment
  2. publish new or modified dataset to the remote repository

One obvious approach is to distribute data via docker images. In this case a remote repository is the same place we store our docker images: it can be either DockerHub, or a private Docker Registry instance. The solution described below can work with both.

Meeting the 1st requirement is easy: we need to run a container from a data image, mark data folder as a volume, and link that volume (via --volumes-from argument) to Mongo Engine container.
The 2nd requirement is complicated. After doing some changes inside the volume we cannot simply commit those changes back to docker image: volume is technically not a part of the modifiable top layer of a container. In simpler words, Docker daemon just don’t see any changes to commit.

Here’s the trick: if we can read changed data but cannot commit it from the volume then we need to copy it first elsewhere outside of all volumes so that the daemon detects changes. Applying the trick has a not-so-obvious consequence: we cannot create a volume directly from the data folder, but have to use another path for it, and then copy all data to the volume when the container starts. Otherwise, we’ll have to alternate the volume path depending on where the data is stored this time, and this is hardly automated.

The whole process of cloning and saving a dataset is displayed on the diagrams below:

Execute 'apply_snapshot.sh' script to copy snapshot from '/data/snapshot' to a volume '/data/active'

Making Data Snapshot Available to Other Containers as a Volume on Startup

Run snapshot-taker container, copy data from the volume, commit changes and push new image to Docker Registry

Commiting Changes to New Image and Pushing It to Storage

We’ll dig into scripts for taking and applying data snapshots a bit later, for now let’s assume they are present in the data snapshot container’s /usr folder. Here is how the docker-compose.yml is updated with the data container definition:

Link volumes from data-snapshot container defined below.

Make a volume from folder /data/active in data-snapshot container.

Run /usr/apply_snapshot.sh script on data-snapshot container startup.

Now let’s see what the scripts are doing. apply_snapshot.sh is simply copying /data/snapshot folder contents to the volume folder /data/active (see 2nd diagram). Here’s its full listing:

Accompanying script take_snapshot.sh is doing the opposite: replaces contents of /data/snapshot with contents of /data/active folder. It also removes files with .lock extention, which is the only MongoDB-specific action here (and more a precaution than a necessity). Listing of take_snapshot.sh is shown below:

Dockerfile for data-snapshot container and its image can be found at github and dockerhub respectively.

Taking a snapshot is directed externally from publish_snapshot.sh:

Run a temporary container from data-snapshot:scratch image with linked data-snapshot’s volume, execute /usr/take_snapshot.sh script on startup and stop the container (it’s stopped automatically because no other processes are run there). I run the container from my image on DockerHub, but most likely you want to use your own copy.

Commit changes to new local image tagged with $tag.

Push new data snapshot image to your repository.

Remove the temporary container.

Now imagine you’ve just published a new shiny data snapshot tagged essential-data-set .
Then you simply update data-snapshot definition in docker-compose.yml with new tag and make a git push.
Your teammate pulls those changes, and can reestablish the whole dev env including your new dataset just by running a single command:

As a final step, you can add some scripting for removing existing containers and volumes before updating the environment, so that docker-compose can work out smoothly every run.

Did you like the post? Please spread the word 🙂