ETH Research Data Management Summer School 2021

Workshop 4 - Tools and Platforms for Reproducible Data Analysis

Andrei Plamada - andrei.plamada@id.ethz.ch

Henry Lütcke - henry.luetcke@id.ethz.ch

Scientific IT Services, ETH Zurich

16 June 2021

1 Introduction to Conda

1.1 Reference

1.2 Starting conda

Command Line Interface

$ conda deactivate
Conda on Windows
Conda on macOS

1.3 Version, Info and Help

conda Environment

A directory that contains specific installed packages (system libraries, python/R modules, …).

base Environment

The default environment that you have when installing conda.

$ conda --version

$ conda --help

$ conda info

1.3.1 Exercise

  1. Which version of conda are you running?

  2. Scan the output of conda --info especially: platform, Python version, active environment, base environment and channel URLs.

1.4 Create / Clone and List all Envs

$ conda create --name py38 python=3.8

$ conda create --name py38-clone --clone py38

$ conda env list

1.5 Activate and Deactivate Env

Check Python version on:

  1. the system,
  2. the base env, and
  3. the new py38 env.
      $ python --version

      $ conda activate 
(base)$ python --version

(base)$ conda activate py38
(py38)$ python --version

(py38)$ conda deactivate --help

1.6 List Installed Packages

      $ conda activate py38
(py38)$ conda list
(py38)$ conda list --name base

(py38)$ conda list python
(py38)$ conda list --name base python

1.7 Search and Install Packages

Conda channels

Locations where packages are stored, e.g. conda-forge, bioconda, r;

(py38)$ conda deactivate && conda deactivate
      $ conda search numpy=1.20
      $ conda search --channel conda-forge numpy=1.20

      $ conda install --channel conda-forge numpy=1.20.3 --name py38
      $ conda activate py38
(py38)$ conda install pandas="<1"

1.8 Update Packages and Revisions

(py38)$ conda update --channel conda-forge pandas
(py38)$ conda update --all

(py38)$ conda list --revisions
(py38)$ conda install --revision 3
PackagesNotFoundError
(py38)$ conda install --revision 3 --channel conda-forge
(py38)$ conda list --revisions

Note: In restoring to a revision the additional channels are needed explicitly.

Warning: conda install --revision can be problematic.

Warning: pip install in a conda env is a dead-end - no future conda install/update.

1.9 Remove Package / Env and Clean

(py38)$ conda env list
(py38)$ conda remove --name py38 pandas
(py38)$ conda remove --name py38-clone --all
(py38)$ conda env list

(py38)$ conda clean --all

Note: It is not possible to remove the active environment.

1.9.1 Exercise

1.10 Environment File - YAML Format

Environment File

A YAML file that contains all information to create an environment.

YAML file: src/mamba.yml

name: mamba
channels:
  - conda-forge
dependencies:
  - mamba

Note: mamba is like conda but faster.

(py38)$ cd src

1.11 Environment File to Environment and back

 (py38)$ conda env create --file mamba.yml
 (py38)$ conda activate mamba

(mamba)$ conda env export
(mamba)$ conda env export --no-builds
(mamba)$ conda env export --from-history
(mamba)$ conda env export --no-builds > mamba_export.yaml
(mamba)$ conda env create --file mamba_export.yaml
CondaValueError: prefix already exists: ...
(mamba)$ conda env create --file mamba_export.yaml --name mamba_export

Note: The build specification is generally platform dependent.

Note: The packages might not be available on all platforms.

Warning: In --from-history pip packages and additional channels are not provided.

1.12 Identical Environments - Spec File

Spec File

A text file that contains all information to create an identical environment.

(mamba)$ conda list --explicit 

(mamba)$ conda list --explicit > mamba_spec_file.txt
(mamba)$ conda create --name mamba_explicit --file mamba_spec_file.txt

Warning: Use the spec file on a similar platform.

1.12.1 Exercise

  1. List all environments.
  2. Read the help for conda remove and pay attention at the flag -y, --yes .
  3. Remove mamba_export and mamba_explicit environments.

1.13 Conda Optional Arguments

There are long -- and potentially equivalent short - optional arguments.

long: -- short: - Remark
--name -n
--channel -c
--help -h
--file -f not always
--all -a not always
--revisions -r
--yes -y

Note: If an argument is not working as expected use --help.

1.14 What Did We Learn?

2 Introduction to Docker

2.1 Reference

2.2 Version, Info, Deamon

Docker Deamon

A service that runs on your host operating system (Linux) that is usually started by a system utility and makes sure docker is working as expected.

$ docker --version

$ docker info
...
ERROR: ... Is the docker daemon running? ...

You need to start the docker deamon. Start Docker or on Linux try one of:

$ docker info
ERROR: Got permission denied

$ sudo docker info

Warning: To use Docker you need privileged access (even if you don’t need sudo).

2.3 List Images and Containers

[Docker] Container Image - REPOSITORY[:TAG]

A binary that includes all requirements for running a single container - a packaging technology for the computational environment. It is organized as a stack of read-only layers that can be efficiently reused.

[Docker] Container

An “instance” of a [Docker] Container Image. It adds a writable layer on the top of image layers.

$ sudo docker run hello-world

$ sudo docker image ls --help
$ sudo docker image ls

$ sudo docker container ls --help
$ sudo docker container ls --all

$ sudo docker run hello-world
$ sudo docker container ls -a

2.4 Registry, Pull, Tag, and Run

[Docker] Container Registry

A repository for storing [Docker] Container Images.

$ sudo docker pull conda/miniconda3-centos7
$ sudo docker tag conda/miniconda3-centos7 conda:1.0

$ sudo docker image ls

$ sudo docker run conda:1.0
$ sudo docker container ls -a

$ sudo docker run conda
Unable to find image 'conda:latest' locally
$ sudo docker tag conda:1.0 conda
$ sudo docker run conda

Note: If no tag is provided latest is used in repository[:tag].

2.5 Remove Containers and Images

$ sudo docker container ls -a
$ sudo docker container rm CONTAINER_ID
$ sudo docker container ls -a
$ sudo docker image ls
$ sudo docker image rm IMAGE_ID
Error ... image is being used by stopped container NEW_CONTAINER_ID

$ sudo docker container rm NEW_CONTAINER_ID
$ sudo docker image rm IMAGE_ID

$ sudo docker container ls -a
$ sudo docker image ls -a

2.6 Interactive Entrypoint and Clean Up

Entrypoint

Default command to execute when running a container.

 $ echo "Hello World"
 $ sudo docker run --entrypoint echo conda "Hello World"

 $ sudo docker run -it --entrypoint bash conda
/# echo "Hello World"
/# cat /etc/centos-release
/# exit

 $ sudo docker container ls -a

 # !!! DELETE all containers !!!
 $ sudo docker container rm $(sudo docker container ls -aq)

 $ sudo docker run --rm conda
 $ sudo docker container ls -a

$ sudo docker run --rm -it --entrypoint bash conda
/# exit

2.7 Mount a Volume

 $ mkdir my_data
 $ ls -lt my_data

 $ sudo docker run --rm -it conda
/# conda --version
conda 4.6.11
/# ls -lt /my_data
ls: cannot access '/my_data': No such file or directory
/# exit

 $ sudo docker run -it --rm -v $(pwd)/my_data:/my_data conda
/# ls -lt /my_data
/# touch /my_data/newfile.txt
/# ls -lt /my_data
/# exit

 $ ls -lt my_data

Warning: absolute path is required in -v and changes are persistent.

2.8 Push, Save and Load

2.8.1 Push (account on Docker Hub required)

Note: you can delete what you push using https://hub.docker.com/ .
Note: in case you want to use Docker Hub check carefully its policy.
Note: there are also other container registries, e.g. GitLab can provide a container registry.

$ sudo docker login
$ sudo docker tag conda username/conda
$ sudo docker push username/conda
$ sudo docker image ls 
$ sudo docker image rm username/conda

2.8.2 Save and Load

$ sudo docker image rm conda/miniconda3-centos7 conda:1.0
$ sudo docker save --output conda.tar conda
$ ls -lh

# !!! DELETE all containers and images !!!
$ sudo docker container rm $(sudo docker container ls -aq)
$ sudo docker image rm -f $(sudo docker image ls -aq)

$ sudo docker load --input conda.tar 
$ sudo docker image ls

2.9 Why to bother with Containers?

Conda Environment in a Container Image

https://commons.wikimedia.org/wiki/File:Recursive_Matrena_01.jpg

2.10 Dockerfile

Dockerfile

A text file that contains all the information to build a [Docker] Container Image.

FROM centos:centos7

COPY . /source_files

RUN yum -y update \
    && yum -y install curl bzip2 \
    && curl -sSL https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh \
    && bash /tmp/miniconda.sh -bfp /usr/local/ \
    && rm -rf /tmp/miniconda.sh \
    && conda init \
    && conda install -y python=3 \
    && conda update conda \
    && conda env create --file  /source_files/mamba.yml \
    && conda clean --all --yes \
    && rpm -e --nodeps curl bzip2 \
    && yum clean all

Note: The files that you COPY are part of the image (even if you rm them in RUN).

2.11 Build a Container Image from a Dockerfile

 $ cd src/

 $ sudo docker build -t my_conda .
 # Be patient - it might take minutes

 $ sudo docker run --rm -it --entrypoint bash my_conda
/# conda env list
/# conda --version
/# exit

2.12 repo2docker: Container Image with Ease

 # add your user to the docker group -> no need to use sudo for docker
 $ sudo groupadd docker
 $ sudo gpasswd -a $USER docker 
 # reboot

 # repo2docker is installed
 $ jupyter-repo2docker --no-run --user-name jovyan \
   --user-id 1000 --image-name r2d_conda .
 
 $ docker image ls

 $ docker run --rm -it --entrypoint bash r2d_conda
/# conda env list
/# conda list mamba
/# exit

2.12.1 How it works?

2.13 What Did We Learn?