Workshop 4 - Tools and Platforms for Reproducible Data Analysis
16 June 2021
Command Line Interface
command + Space and search Terminalanaconda and select Anaconda Prompt or Anaconda Powershell PromptA directory that contains specific installed packages (system libraries, python/R modules, …).
The default environment that you have when installing conda.
Which version of conda are you running?
Scan the output of conda --info especially: platform, Python version, active environment, base environment and channel URLs.
$ conda create --name py38 python=3.8
$ conda create --name py38-clone --clone py38
$ conda env list--name py38 - py38 is the name of the environmentpython=3.8 - python package version 3.8 is installed in the new environment--clone py38 - py38 is the name of the cloned existing environmentconda env list - is equivalent with conda info --envCheck Python version on:
$ python --version
$ conda activate
(base)$ python --version
(base)$ conda activate py38
(py38)$ python --version
(py38)$ conda deactivate --helpconda activate - the default (base) environment is activatedconda activate py38 - py38 is the name of the environment $ conda activate py38
(py38)$ conda list
(py38)$ conda list --name base
(py38)$ conda list python
(py38)$ conda list --name base pythonlist --name base - base is the name of the environmentlist python - only packages matching python string (regular expression) in the active environment (or base if no one active)Locations where packages are stored, e.g. conda-forge, bioconda, r;
channels have priority (left to right)
search using the browser anaconda.org
(py38)$ conda deactivate && conda deactivate
$ conda search numpy=1.20
$ conda search --channel conda-forge numpy=1.20
$ conda install --channel conda-forge numpy=1.20.3 --name py38
$ conda activate py38
(py38)$ conda install pandas="<1"install --channel conda-forge ... - conda-forge is used additionally, with higher priority, to the list of channelsinstall pandas="<1" - the quotes are mandatory(py38)$ conda update --channel conda-forge pandas
(py38)$ conda update --all
(py38)$ conda list --revisions
(py38)$ conda install --revision 3
PackagesNotFoundError
(py38)$ conda install --revision 3 --channel conda-forge
(py38)$ conda list --revisionsupdate ... pandas - update pandas to the latest compatible versionupdate --all - update all packageslist --revisions - list history of changesinstall --revision 3 - restore to revision 3Note: In restoring to a revision the additional channels are needed explicitly.
Warning: conda install --revision can be problematic.
Warning: pip install in a conda env is a dead-end - no future conda install/update.
(py38)$ conda env list
(py38)$ conda remove --name py38 pandas
(py38)$ conda remove --name py38-clone --all
(py38)$ conda env list
(py38)$ conda clean --allremove --name py38 pandas - pandas package is removedconda remove --name py38-clone --all - py38-clone environment is removedclean --all - remove unused packages and cachesNote: It is not possible to remove the active environment.
py38 environment. Do you recognize the changes in the last revision?A YAML file that contains all information to create an environment.
YAML file: src/mamba.yml
Note: mamba is like conda but faster.
(py38)$ conda env create --file mamba.yml
(py38)$ conda activate mamba
(mamba)$ conda env export
(mamba)$ conda env export --no-builds
(mamba)$ conda env export --from-history
(mamba)$ conda env export --no-builds > mamba_export.yaml
(mamba)$ conda env create --file mamba_export.yaml
CondaValueError: prefix already exists: ...
(mamba)$ conda env create --file mamba_export.yaml --name mamba_exportenv export - all installed packages are exported (in yaml format)env export --no-builds - the build specification is removedenv export --from-history - all packages that you explicitly asked forNote: The build specification is generally platform dependent.
Note: The packages might not be available on all platforms.
Warning: In --from-history pip packages and additional channels are not provided.
A text file that contains all information to create an identical environment.
(mamba)$ conda list --explicit
(mamba)$ conda list --explicit > mamba_spec_file.txt
(mamba)$ conda create --name mamba_explicit --file mamba_spec_file.txtconda create --file mamba_spec_file.txt - conda does not check the platform nor the dependenciesWarning: Use the spec file on a similar platform.
conda remove and pay attention at the flag -y, --yes .mamba_export and mamba_explicit environments.There are long -- and potentially equivalent short - optional arguments.
long: -- |
short: - |
Remark |
|---|---|---|
--name |
-n |
|
--channel |
-c |
|
--help |
-h |
|
--file |
-f |
not always |
--all |
-a |
not always |
--revisions |
-r |
|
--yes |
-y |
Note: If an argument is not working as expected use --help.
A service that runs on your host operating system (Linux) that is usually started by a system utility and makes sure docker is working as expected.
A binary that includes all requirements for running a single container - a packaging technology for the computational environment. It is organized as a stack of read-only layers that can be efficiently reused.
An “instance” of a [Docker] Container Image. It adds a writable layer on the top of image layers.
$ sudo docker run hello-world
$ sudo docker image ls --help
$ sudo docker image ls
$ sudo docker container ls --help
$ sudo docker container ls --all
$ sudo docker run hello-world
$ sudo docker container ls -adocker image ls - is equivalent with docker image list or docker imagesdocker container ls - is equivalent with docker container list or docker psA repository for storing [Docker] Container Images.
Docker Hub is a container registry
$ sudo docker pull conda/miniconda3-centos7
$ sudo docker tag conda/miniconda3-centos7 conda:1.0
$ sudo docker image ls
$ sudo docker run conda:1.0
$ sudo docker container ls -a
$ sudo docker run conda
Unable to find image 'conda:latest' locally
$ sudo docker tag conda:1.0 conda
$ sudo docker run condadocker pull repository[:tag] - from Docker HubNote: If no tag is provided latest is used in repository[:tag].
hello-world imagehello-world$ sudo docker image ls
$ sudo docker image rm IMAGE_ID
Error ... image is being used by stopped container NEW_CONTAINER_ID
$ sudo docker container rm NEW_CONTAINER_ID
$ sudo docker image rm IMAGE_ID
$ sudo docker container ls -a
$ sudo docker image ls -adocker container rm - is equivalent with docker rmdocker image rm - is equivalent with docker rmiDefault command to execute when running a container.
$ echo "Hello World"
$ sudo docker run --entrypoint echo conda "Hello World"
$ sudo docker run -it --entrypoint bash conda
/# echo "Hello World"
/# cat /etc/centos-release
/# exit
$ sudo docker container ls -a
# !!! DELETE all containers !!!
$ sudo docker container rm $(sudo docker container ls -aq)
$ sudo docker run --rm conda
$ sudo docker container ls -a
$ sudo docker run --rm -it --entrypoint bash conda
/# exit--rm for automatically clean up the container-it for an intearctive process, like a shell $ mkdir my_data
$ ls -lt my_data
$ sudo docker run --rm -it conda
/# conda --version
conda 4.6.11
/# ls -lt /my_data
ls: cannot access '/my_data': No such file or directory
/# exit
$ sudo docker run -it --rm -v $(pwd)/my_data:/my_data conda
/# ls -lt /my_data
/# touch /my_data/newfile.txt
/# ls -lt /my_data
/# exit
$ ls -lt my_dataWarning: absolute path is required in -v and changes are persistent.
username is the Docker Hub usernameNote: you can delete what you push using https://hub.docker.com/ .
Note: in case you want to use Docker Hub check carefully its policy.
Note: there are also other container registries, e.g. GitLab can provide a container registry.
$ sudo docker login
$ sudo docker tag conda username/conda
$ sudo docker push username/conda
$ sudo docker image ls
$ sudo docker image rm username/conda$ sudo docker image rm conda/miniconda3-centos7 conda:1.0
$ sudo docker save --output conda.tar conda
$ ls -lh
# !!! DELETE all containers and images !!!
$ sudo docker container rm $(sudo docker container ls -aq)
$ sudo docker image rm -f $(sudo docker image ls -aq)
$ sudo docker load --input conda.tar
$ sudo docker image lsConda Environment in a Container Image
A text file that contains all the information to build a [Docker] Container Image.
src/DockerfileFROM centos:centos7
COPY . /source_files
RUN yum -y update \
&& yum -y install curl bzip2 \
&& curl -sSL https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh \
&& bash /tmp/miniconda.sh -bfp /usr/local/ \
&& rm -rf /tmp/miniconda.sh \
&& conda init \
&& conda install -y python=3 \
&& conda update conda \
&& conda env create --file /source_files/mamba.yml \
&& conda clean --all --yes \
&& rpm -e --nodeps curl bzip2 \
&& yum clean allCOPY, RUN, ... will make a new container layerNote: The files that you COPY are part of the image (even if you rm them in RUN).
$ cd src/
$ sudo docker build -t my_conda .
# Be patient - it might take minutes
$ sudo docker run --rm -it --entrypoint bash my_conda
/# conda env list
/# conda --version
/# exitsudo docker build -t my_conda . - my_conda is the name of the new image, and . is used as context and a Dockerfile is expected in itCOPY . /source_files is using the same context # add your user to the docker group -> no need to use sudo for docker
$ sudo groupadd docker
$ sudo gpasswd -a $USER docker
# reboot
# repo2docker is installed
$ jupyter-repo2docker --no-run --user-name jovyan \
--user-id 1000 --image-name r2d_conda .
$ docker image ls
$ docker run --rm -it --entrypoint bash r2d_conda
/# conda env list
/# conda list mamba
/# exitjupyter-repo2docker is using the environment.yml file (same content as mamba.yml) from the binder directorypip, pipenv, conda), R (conda, MRAN+internal file), Julia (Pkg)apt-get) and nonprivileged bash scripts