Workshop 4 - Tools and Platforms for Reproducible Data Analysis
Andrei Plamada - andrei.plamada@id.ethz.ch
Henry Lütcke - henry.luetcke@id.ethz.ch
Scientific IT Services, ETH Zurich
16 June 2021
Command Line Interface
command + Space
and search Terminal
anaconda
and select Anaconda Prompt or Anaconda Powershell PromptA directory that contains specific installed packages (system libraries, python/R modules, …).
The default environment that you have when installing conda.
Which version of conda are you running?
Scan the output of conda --info
especially: platform, Python version, active environment, base environment and channel URLs.
$ conda create --name py38 python=3.8
$ conda create --name py38-clone --clone py38
$ conda env list
--name py38
- py38 is the name of the environmentpython=3.8
- python package version 3.8 is installed in the new environment--clone py38
- py38 is the name of the cloned existing environmentconda env list
- is equivalent with conda info --env
Check Python version on:
$ python --version
$ conda activate
(base)$ python --version
(base)$ conda activate py38
(py38)$ python --version
(py38)$ conda deactivate --help
conda activate
- the default (base) environment is activatedconda activate py38
- py38 is the name of the environment $ conda activate py38
(py38)$ conda list
(py38)$ conda list --name base
(py38)$ conda list python
(py38)$ conda list --name base python
list --name base
- base is the name of the environmentlist python
- only packages matching python string (regular expression) in the active environment (or base if no one active)Locations where packages are stored, e.g. conda-forge, bioconda, r;
channels have priority (left to right)
search using the browser anaconda.org
(py38)$ conda deactivate && conda deactivate
$ conda search numpy=1.20
$ conda search --channel conda-forge numpy=1.20
$ conda install --channel conda-forge numpy=1.20.3 --name py38
$ conda activate py38
(py38)$ conda install pandas="<1"
install --channel conda-forge ...
- conda-forge
is used additionally, with higher priority, to the list of channelsinstall pandas="<1"
- the quotes are mandatory(py38)$ conda update --channel conda-forge pandas
(py38)$ conda update --all
(py38)$ conda list --revisions
(py38)$ conda install --revision 3
PackagesNotFoundError
(py38)$ conda install --revision 3 --channel conda-forge
(py38)$ conda list --revisions
update ... pandas
- update pandas to the latest compatible versionupdate --all
- update all packageslist --revisions
- list history of changesinstall --revision 3
- restore to revision 3Note: In restoring to a revision the additional channels are needed explicitly.
Warning: conda install --revision
can be problematic.
Warning: pip install
in a conda env is a dead-end - no future conda install/update
.
(py38)$ conda env list
(py38)$ conda remove --name py38 pandas
(py38)$ conda remove --name py38-clone --all
(py38)$ conda env list
(py38)$ conda clean --all
remove --name py38 pandas
- pandas package is removedconda remove --name py38-clone --all
- py38-clone environment is removedclean --all
- remove unused packages and cachesNote: It is not possible to remove the active environment.
py38
environment. Do you recognize the changes in the last revision?A YAML file that contains all information to create an environment.
YAML file: src/mamba.yml
Note: mamba
is like conda
but faster.
(py38)$ conda env create --file mamba.yml
(py38)$ conda activate mamba
(mamba)$ conda env export
(mamba)$ conda env export --no-builds
(mamba)$ conda env export --from-history
(mamba)$ conda env export --no-builds > mamba_export.yaml
(mamba)$ conda env create --file mamba_export.yaml
CondaValueError: prefix already exists: ...
(mamba)$ conda env create --file mamba_export.yaml --name mamba_export
env export
- all installed packages are exported (in yaml format)env export --no-builds
- the build specification is removedenv export --from-history
- all packages that you explicitly asked forNote: The build specification is generally platform dependent.
Note: The packages might not be available on all platforms.
Warning: In --from-history
pip packages and additional channels are not provided.
A text file that contains all information to create an identical environment.
(mamba)$ conda list --explicit
(mamba)$ conda list --explicit > mamba_spec_file.txt
(mamba)$ conda create --name mamba_explicit --file mamba_spec_file.txt
conda create --file mamba_spec_file.txt
- conda does not check the platform nor the dependenciesWarning: Use the spec file on a similar platform.
conda remove
and pay attention at the flag -y, --yes
.mamba_export
and mamba_explicit
environments.There are long --
and potentially equivalent short -
optional arguments.
long: -- |
short: - |
Remark |
---|---|---|
--name |
-n |
|
--channel |
-c |
|
--help |
-h |
|
--file |
-f |
not always |
--all |
-a |
not always |
--revisions |
-r |
|
--yes |
-y |
Note: If an argument is not working as expected use --help
.
A service that runs on your host operating system (Linux) that is usually started by a system utility and makes sure docker is working as expected.
A binary that includes all requirements for running a single container - a packaging technology for the computational environment. It is organized as a stack of read-only layers that can be efficiently reused.
An “instance” of a [Docker] Container Image. It adds a writable layer on the top of image layers.
$ sudo docker run hello-world
$ sudo docker image ls --help
$ sudo docker image ls
$ sudo docker container ls --help
$ sudo docker container ls --all
$ sudo docker run hello-world
$ sudo docker container ls -a
docker image ls
- is equivalent with docker image list
or docker images
docker container ls
- is equivalent with docker container list
or docker ps
A repository for storing [Docker] Container Images.
Docker Hub is a container registry
$ sudo docker pull conda/miniconda3-centos7
$ sudo docker tag conda/miniconda3-centos7 conda:1.0
$ sudo docker image ls
$ sudo docker run conda:1.0
$ sudo docker container ls -a
$ sudo docker run conda
Unable to find image 'conda:latest' locally
$ sudo docker tag conda:1.0 conda
$ sudo docker run conda
docker pull repository[:tag]
- from Docker HubNote: If no tag
is provided latest
is used in repository[:tag]
.
hello-world
imagehello-world
$ sudo docker image ls
$ sudo docker image rm IMAGE_ID
Error ... image is being used by stopped container NEW_CONTAINER_ID
$ sudo docker container rm NEW_CONTAINER_ID
$ sudo docker image rm IMAGE_ID
$ sudo docker container ls -a
$ sudo docker image ls -a
docker container rm
- is equivalent with docker rm
docker image rm
- is equivalent with docker rmi
Default command to execute when running a container.
$ echo "Hello World"
$ sudo docker run --entrypoint echo conda "Hello World"
$ sudo docker run -it --entrypoint bash conda
/# echo "Hello World"
/# cat /etc/centos-release
/# exit
$ sudo docker container ls -a
# !!! DELETE all containers !!!
$ sudo docker container rm $(sudo docker container ls -aq)
$ sudo docker run --rm conda
$ sudo docker container ls -a
$ sudo docker run --rm -it --entrypoint bash conda
/# exit
--rm
for automatically clean up the container-it
for an intearctive process, like a shell $ mkdir my_data
$ ls -lt my_data
$ sudo docker run --rm -it conda
/# conda --version
conda 4.6.11
/# ls -lt /my_data
ls: cannot access '/my_data': No such file or directory
/# exit
$ sudo docker run -it --rm -v $(pwd)/my_data:/my_data conda
/# ls -lt /my_data
/# touch /my_data/newfile.txt
/# ls -lt /my_data
/# exit
$ ls -lt my_data
Warning: absolute path is required in -v
and changes are persistent.
username
is the Docker Hub usernameNote: you can delete what you push using https://hub.docker.com/ .
Note: in case you want to use Docker Hub check carefully its policy.
Note: there are also other container registries, e.g. GitLab can provide a container registry.
$ sudo docker login
$ sudo docker tag conda username/conda
$ sudo docker push username/conda
$ sudo docker image ls
$ sudo docker image rm username/conda
$ sudo docker image rm conda/miniconda3-centos7 conda:1.0
$ sudo docker save --output conda.tar conda
$ ls -lh
# !!! DELETE all containers and images !!!
$ sudo docker container rm $(sudo docker container ls -aq)
$ sudo docker image rm -f $(sudo docker image ls -aq)
$ sudo docker load --input conda.tar
$ sudo docker image ls
Conda Environment in a Container Image
A text file that contains all the information to build a [Docker] Container Image.
src/Dockerfile
FROM centos:centos7
COPY . /source_files
RUN yum -y update \
&& yum -y install curl bzip2 \
&& curl -sSL https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh \
&& bash /tmp/miniconda.sh -bfp /usr/local/ \
&& rm -rf /tmp/miniconda.sh \
&& conda init \
&& conda install -y python=3 \
&& conda update conda \
&& conda env create --file /source_files/mamba.yml \
&& conda clean --all --yes \
&& rpm -e --nodeps curl bzip2 \
&& yum clean all
COPY
, RUN
, ...
will make a new container layerNote: The files that you COPY
are part of the image (even if you rm
them in RUN
).
$ cd src/
$ sudo docker build -t my_conda .
# Be patient - it might take minutes
$ sudo docker run --rm -it --entrypoint bash my_conda
/# conda env list
/# conda --version
/# exit
sudo docker build -t my_conda .
- my_conda
is the name of the new image, and .
is used as context and a Dockerfile is expected in itCOPY . /source_files
is using the same context # add your user to the docker group -> no need to use sudo for docker
$ sudo groupadd docker
$ sudo gpasswd -a $USER docker
# reboot
# repo2docker is installed
$ jupyter-repo2docker --no-run --user-name jovyan \
--user-id 1000 --image-name r2d_conda .
$ docker image ls
$ docker run --rm -it --entrypoint bash r2d_conda
/# conda env list
/# conda list mamba
/# exit
jupyter-repo2docker
is using the environment.yml
file (same content as mamba.yml
) from the binder
directorypip
, pipenv
, conda
), R (conda
, MRAN+internal file), Julia (Pkg
)apt-get
) and nonprivileged bash scripts