class: center, middle, inverse # **Introduction to git** ## Scientific IT Services ETH ### Contact: uwe.schmitt@id.ethz.ch ### 17th of June 2020 --- class: middle ## Structure of the course: 1. ### General introduction and basic concepts of `git`. 2. ### Hands-on exercises for basic `git` commands. 3. ### Introduction to "branching" with exercises. 4. ### Basic introduction to `gitlab`/ `github`. --- class: middle ## Installers - ### On Mac: https://git-scm.com/download/mac - ### On Windows: https://git-scm.com/download/windows + make the following choice when running the installer
--- class: middle ## Prerequisites - You know basic shell commands to work on the command line (ls, cd, rm ...) - You installed git on your computer (https://git-scm.com/download) - You installed the command line git client, not a GUI client ! --- class: middle ## About git - `git` supports software development by allowing version control of files in a repository. - `git` supports distributed software development in teams. - `git` is one of the most common used version control systems (VCS). ## Effects - no version tags in file names any more - never loose old files - undo changes efficiently - distribute code in a controlled way --- class: middle ## What git can manage - program code - text files used for documentation (latex files, markdown, html, ..) - textual data files (like .csv, .txt, ..) ## Notebooks? - git to manage jupyter notebooks is inconvenient --- class: middle ## Caution ! - `git` is not suited to track huge data files - ... you need a special solution as `git lsf` or `git annex` instead. - `git` is not suited to track binary files (.xlsx, .zip, .gz, .doc, ...) - `git` has **many** features - You have to exercise to learn it ! --- class: middle ## About git repositories - `git` can be used on a single computer (without github et al) - A local `git` repository is a folder including sub folders, holding a hidden `.git` subfolder. - `git` manages local git repositories - `git` allows synchronization of git repositories (eg with github) ## About gitlab / github / ... - In a team developers have their local git repositories - `github / gitlab` offer just another local git repository on a remote computer. - `git` assists in synchronizing repositories with such a remote. --- class: middle ## Distributed software development without a version control system Setting: - one person is main maintainer of a code base Workflow: 1. contributer ask for a up to date version 1. work on the copy 1. submit the **changes** as a "patch" to the maintainer 1. maintainer checks changes and applies them to the code base 1. goto 1. --- class: middle ## Example patch `print_squares.py`: ````python for i in range(10): # display i and its square print("i is", i) print("i^2 is", i ** 2) ```` `print_squares.py` after modification: ````python for i in range(10): # this is the magic computation print("i^2 is", i ** 2) print("i is", i) ```` and this is the according patch: ````bash --- a/git/sandbox/print_squares.py +++ b/git/sandbox/print_squares.py @@ -1,4 +1,4 @@ for i in range(10): - # display i and its square - print("i is", i) + # this is the magic computation print("i^2 is", i ** 2) + print("i is", i) ```` --- class: middle ## gits most important internals - `git` allows management, contribution and manipulation of patches. - git chains patches over time - if you know the initial state of an folder and this chain you can reconstruct the state of the file for any time. - that's it --- class: middle ## Example list of commits - A **commit** consists of a single or multiple related patches. - The **git history** is a list of commits: ````bash $ git log ```` ````git commit 8ab13c9a930d4b0fd1bc1937f382196bc3a1bbdc (HEAD -> master) Author: Uwe
Date: Fri Feb 9 12:50:00 2018 +0100 changed order of output in print_squares.py commit 0daab54f48ac33b5cfa71f90634ad82dd655e33c Author: Uwe
Date: Fri Feb 9 12:49:16 2018 +0100 initial version of print_squares.py ```` * starts with the most recent commit on top * commits are identified using ids like `8ab13c9a930d4b0fd1bc1937f382196bc3a1bbdc` * Usually the first eight characters are enough to identify the id. --- class: middle ## Example list of commits including patches ````bash $ git log -p ```` (Inverted order, older commit first) This patch adds four new lines to an empty file (marked as `/dev/null`): ````git commit 90b3a581cae221342ca82fe2862fd5daa219a4be Author: Uwe
Date: Fri Feb 9 16:47:49 2018 +0100 initial version of print_squared.py diff --git a/print_squares.py b/print_squares.py new file mode 100644 index 0000000..d45e7ab --- /dev/null +++ b/print_squares.py @@ -0,0 +1,4 @@ +for i in range(10): + # print i and i squared + print("i is", i) + print("i^2 is", i ** 2) ```` --- class: middle ## ... log with patches continued ````git commit b3ed251fa0701ee5f986adc7a63eafd87d7d0bd2 (HEAD -> master) Author: Uwe
Date: Fri Feb 9 16:48:20 2018 +0100 changed order of output in print_squares.py diff --git a/print_squares.py b/print_squares.py index d45e7ab..893f18d 100644 --- a/print_squares.py +++ b/print_squares.py @@ -1,4 +1,4 @@ for i in range(10): - # print i and i squared - print("i is", i) + # print i squared an i print("i^2 is", i ** 2) + print("i is", i) ```` --- class: middle ## The staging area (also named "index") Related patches (maybe over multiple files) are grouped as *commits* 1. First `git add` 1. creates patches 2. places them in the *staging area* 2. `git commit` adds as a single entry (*commit*) to gits history chain --- class: middle ## Important git commands `git` command line interface uses your current working directory to detect the repository you want to work with. Even if you are in a subfolder. - `git init` initializes given folder and its subfolders as a local git repository - `git status` shows state of current git repository - `git diff` shows state of current git repository - `git add` creates patches to create current versions of file(s) and places them into the staging area - `git commit` creates a commit from the current staging area - `git log` shows the history of commits - `git show` shows details single commits --- class: middle, inverse, center # Hands on ## **Be exact when you type the examples !** --- class: middle ## Configuring `git` Before you use `git` the first time we configure some settings: ````bash $ git config --global user.name "Uwe Schmitt" $ git config --global user.email "uwe.schmitt@id.ethz.ch" ```` Then we set your default text editor. - On Mac / Linux: (replace `nano` by `vim` if you like): ````bash $ git config --global core.editor /usr/bin/nano $ git config --global core.autocrlf input ```` - On Windows: ````bash $ git config --global core.editor /usr/bin/notepad $ git config --global core.autocrlf true ```` Now check if the configured editor works, but **don't modify the shown file** ! ````bash $ git config --global -e ```` --- class: middle ## Creating a local Repository Every folder on your file system can be turned into a local git repository: ````bash $ cd $ mkdir git_demo $ cd git_demo $ git init . Initialized empty Git repository in ~/git_demo/.git/ $ ls -a . .. .git ```` - This means: the folder and all subfolders are under control now ! - The folder is not necessarily empty, you can put existing folders under version control. - Use seperate repositories for seperate software projects. --- class: middle ## First steps with `git` Now create a new file `greet.py` in the folder `git_demo`: ````python def greet(name): return "hi %s" % name ```` If we ask `git` about the status of the repository, we see that have one file which is not under controly yet: ````bash $ git status On branch master No commits yet Untracked files: (use "git add
..." to include in what will be committed) greet.py nothing added to commit but untracked files present (use "git add" to track) ```` --- class: middle ## Your first commit `git add` creates a patch to build the current version of `greet.py` and places this patch into the index: ````bash $ git add greet.py $ git status On branch master No commits yet Changes to be committed: (use "git rm --cached
..." to unstage) new file: greet.py ```` * `git add` also works with multiple files and / or folders * `git add` is required to put files under version control. --- class: middle ## Your first commit (continued) * Now commit the patch in the index with a comment describing the commit: ````bash $ git commit -m "first version of greet.py" [master (root-commit) 8175092] first version of greet.py 1 file changed, 2 insertions(+) create mode 100644 greet.py ```` - `8175092` are the first characters of the commit id. - If you ommit the "-m ...." `git` will open the configured text editor where you can enter the message. - Think about your commit messages. You might want to understand them in one year or later. --- class: middle ## Inspecting changes Show the history: ````bash $ git log commit 8175092830d7722c30d339106f3738a7ff5f53fc Author: Uwe Schmitt
Date: Wed Sep 30 21:58:30 2015 +0200 first version of greet.py ```` --- class: middle ## Inspecting changes continued Show which change was introduced by the latest commit: ````bash $ git show commit 8175092830d7722c30d339106f3738a7ff5f53fc Author: Uwe Schmitt
Date: Thu Oct 1 15:47:11 2015 +0200 first version of greet.py diff --git a/greet.py b/greet.py new file mode 100644 index 0000000..7a875b9 --- /dev/null +++ b/greet.py @@ -0,0 +1,2 @@ +def greet(name): + return "hi %s" % name ```` --- class: middle ## More code changes for the next commit First we add a new feature to `greet.py`: ````python def greet(name): return "hi %s" % name def say_hello(name): print(greet(name)) ```` And we create a new file `run.py`: ````python import greet greet.say_hello("monty") ```` --- class:middle ## Which changes did we introduce in already committed files ? ````bash $ git diff diff --git a/greet.py b/greet.py index ffc42de..c886ace 100644 --- a/greet.py +++ b/greet.py @@ -1,3 +1,6 @@ def greet(name): return "hi %s" % name + +def say_hello(name): + print(greet(name)) ```` --- class:middle ## What is the current status or your repository ? ````bash $ git status On branch master Changes not staged for commit: (use "git add
..." to update what will be committed) (use "git checkout --
..." to discard changes in working directory) modified: greet.py Untracked files: (use "git add
..." to include in what will be committed) run.py no changes added to commit (use "git add" and/or "git commit -a") ```` --- class:middle ## `git add` to setup stagin area We place now multiple patches to the staging area: ````bash $ git add run.py $ git add greet.py ```` Now we have two changes in our staging area: ````bash $ git status On branch master Changes to be committed: (use "git reset HEAD
..." to unstage) modified: greet.py new file: run.py ```` --- class:middle ## Check staging area We use `git diff --cached` to see the patches in the staging area: ````bash $ git diff --cached diff --git a/greet.py b/greet.py index 7a875b9..53bb92b 100644 --- a/greet.py +++ b/greet.py @@ -1,2 +1,5 @@ def greet(name): return "hi %s" % name + +def say_hello(name): + print(greet(name)) diff --git a/run.py b/run.py new file mode 100644 index 0000000..1b804f3 --- /dev/null +++ b/run.py @@ -0,0 +1,3 @@ +import greet + +greet.say_helo("monty") ```` --- class:middle ## Our next commit We create a commit now: ````bash $ git commit -m "improved greet.py and added test script" [master c3cdcfa] improved greet.py and added test script 2 files changed, 6 insertions(+) create mode 100644 run.py ```` And inspect the history: ````bash $ git log commit c3cdcfa04b7de45403a8291c567fba6ff4ae0b25 Author: Uwe Schmitt
Date: Wed Sep 30 22:14:10 2015 +0200 improved greet.py and added test script commit 8175092830d7722c30d339106f3738a7ff5f53fc Author: Uwe Schmitt
Date: Wed Sep 30 21:58:30 2015 +0200 first version of greet.py ```` --- class:middle ## Inspect the last commit ````bash $ git show commit c3cdcfa04b7de45403a8291c567fba6ff4ae0b25 Author: Uwe Schmitt
Date: Thu Oct 1 16:03:30 2015 +0200 improved greet.py and added test script diff --git a/greet.py b/greet.py index 7a875b9..2cd4ad9 100644 --- a/greet.py +++ b/greet.py @@ -1,2 +1,5 @@ def greet(name): return "hi %s" % name + +def say_hello(name): + print(greet(name)) diff --git a/run.py b/run.py new file mode 100644 index 0000000..3fb5e9b --- /dev/null +++ b/run.py @@ -0,0 +1,5 @@ +import greet + +greet.say_hello("monty") ```` --- class:middle ## ... and we add another new feature to cour code Edit `greet.py`: ````python def greet(name): return "hi %s" % name def go_to_hell(): print("to hell with python") def say_hello(name): print(greet(name)) ```` We check the changes again: ````bash $ git diff diff --git a/greet.py b/greet.py index 2cd4ad9..e6b949a 100644 --- a/greet.py +++ b/greet.py @@ -1,5 +1,8 @@ def greet(name): return "hi %s" % name +def go_to_hell(): + print("to hell with python") + def say_hello(name): print(greet(name)) ```` --- class:middle ## ... and another commit ````bash $ git add greet.py $ git commit -m "added go_to_hell" ```` Here we use `git log --oneline` to reduce output: ````bash $ git log --oneline 523eb90 (HEAD -> master) added go_to_hell c3cdcfa improved greet.py and added test script 8175092 first version of greet.py ```` --- class:middle ## Typical pattern: - you worked on a new feature - `git diff` and `git status` to inspect changes - `git add` to add files to index (staging area) - `git diff --cached` to check stating area - `git commit` to add index to history --- class:middle ## But one of the commits was crap ! First lookup the commit id of your last commit. Replace the `COMMIT_ID_HERE` fields by this id. ````bash $ git revert COMMIT_ID_HERE --no-edit ```` git created an "inverse" patch of commit COMMIT_ID_HERE and committed it: ````bash $ git log --oneline d3cf980 Revert "added go_to_hell" 523eb90 added go_to_hell c3cdcfa improved greet.py and added test script 8175092 first version of greet.py ```` And we see that the feature is removed again: ````bash $ cat greet.py def greet(name): return "hi %s" % name def say_hello(name): print(greet(name)) ```` --- class:middle ## Inspect what `git revert` did The last commit holds the mentioned "inverse patch": ````bash $ git show commit f4ba88676875a087ac7fac3bee2b0dbb676202d2 Author: Uwe Schmitt
Date: Thu Oct 1 16:20:40 2015 +0200 Revert "added go_to_hell" This reverts commit COMMIT_ID_HERE diff --git a/greet.py b/greet.py index e6b949a..2cd4ad9 100644 --- a/greet.py +++ b/greet.py @@ -1,8 +1,5 @@ def greet(name): return "hi %s" % name -def go_to_hell(): - print("to hell with python") - def say_hello(name): print(greet(name)) ```` --- class:middle ## More about `git revert` - `git revert` may be used not only to revert the most recent commit. - even reverted commits can be reverted again ! - if the "inverse" patch can not be applied, git indicates a so called "merge conflict" --- class:middle ## Recap * `git status` shows current status of repository * `git add PATH, PATH, ...` adds files to staging area * `git diff` shows current changes for files already tracked by git * `git commit -m "...."` commits changes with given commit message * `git log` shows history * `git log --oneline` shows only commit uniqued ids plus messages from history * `git show` shows changes introduced in most recent commit * `git show COMMIT_ID` shows changes introduced in given commit * `git revert COMMIT_ID` reverts changes introduced in given commit. --- class:middle ## Not handled yet * `git rm` to remove a file under version control. * `git mv` to move / rename a file under version control. * `git commit -a ....` adds all changes of tracked files + creates commit. * `git reset` clears staging area in case of unintended added files. * `git log -N` only shows the last `N` patches in the history * `git checkout FILE_NAME` overwrites the current changes in the given file by its last recoreded contents. (undo current edits) * `git checkout .` resets the current folder and its subfolder to the state of the last commit. * `.gitignore` is a file where you can specify file names, folders and patterns ignored by `git` (e.g. binary files or backup files from your editor) --- class:middle,inverse,center # **Working with branches** --- class:middle ## Branching Up to now the evolution of our repository was linear, our repository has only one branch, the `master` branch: ````bash A --> B --> C --> D master (*) ```` (the * marks the active branch) To create a new branch ans work with it we enter: ````bash $ git checkout -b new_feature ```` And if we check the status we get: ````bash $ git status On branch new_feature ... ```` --- class:middle ## The history is still the same Our last commit `C` now is the last commit for both branches, `new_feature` is active. ````bash A --> B --> C --> D master, new_feature(*) ```` ```bash $ git log --oneline af68936 (HEAD -> new_feature, master) Revert "added go_to_hell" 389d0ea added go_to_hell a8ba071 improved greet.py and added test script 2c9fcec first version of greet.py ``` To list the existing branches: ```bash $ git branch master * new_feature ``` --- class:middle ## Branching continued Now we add a new function ````python def greet(name): return "hi %s" % name def say_hello(name): print(greet(name)) def greet_two(name_1, name_2): return "hi %s and %s" % (name_1, name_2) ```` and commit it to our new branch ````bash $ git add greet.py $ git commit -m "added function greet_two" ```` --- class:middle ## Branching continued This is now the history of the active branch `new_feature`: ````bash $ git log --oneline 3732022 (HEAD -> new_feature) added function new_feature af68936 (master) Revert "added go_to_hell" 389d0ea added go_to_hell a8ba071 improved greet.py and added test script 2c9fcec first version of greet.py ```` ````bash A --> B --> C --> D master \ E new_feature (*) ```` --- class:middle ## Branching continued We switch to the master branch we use `git checkout BRANCH_NAME`: ````bash $ git checkout master ```` this diagram shows the current state of our repository: ````bash A --> B --> C --> D master (*) \ E new_feature ```` ````bash $ git log --oneline af68936 (HEAD -> master) Revert "added go_to_hell" 389d0ea added go_to_hell a8ba071 improved greet.py and added test script 2c9fcec first version of greet.py ```` and the files in our repository reflect the `master` branch: ````bash $ cat greet.py def greet(name): return "hi %s" % name def say_hello(name): print(greet(name)) ```` --- class:middle ## Branching continued And back ... ````bash $ git checkout new_feature $ cat greet.py def greet(name): return "hi %s" % name def say_hello(name): print(greet(name)) def greet_two(name_1, name_2): return "hi %s and %s" % (name_1, name_2) ```` --- class:middle ## Branching continued We implement a "bug fix" in our master branch: ````bash $ git checkout master ```` And change `"hi %s"` to `"hello %s"`: ````python def greet(name): return "hello %s" % name def say_hello(name): print(greet(name)) ```` ````bash $ git add greet.py $ git commit -m "fixed spelling" ```` --- class:middle ## Branching continued We decide that the new feature is ready and want to introduce it to the master branch. This is the current state of the commits and branches: ````bash A --> B --> C --> D --> F master (*) \ E new_feature ```` To merge our commit(s) from the `new_feature` branch to the current active branch (here `master`) we enter ````bash $ git merge new_feature ```` --- class:middle ## Branching continued This created a "merge commit" G: ````bash A --> B --> C --> D --> F --> G master (*) \ / E ------- new_feature ```` ````bash $ git log --oneline --graph * 9abfcea (HEAD -> master) Merge branch 'new_feature' |\ | * 32c3d76 (new_feature) added function greet_two * | 27e558a fixed spelling |/ * 03f9fbb Revert "added go_to_hell" * bbf296e added go_to_hell * 4b79572 improved greet.py and added test script * 8c637a8 fist version of greet.py ```` --- ## Branching continued: use case "freeze publication" Lets assume this is the state of your repository at the time of publicaton: ````bash A --> B --> C --> D master (*) ```` ````bash $ git checkout -b publication $ git checkout master ```` `publication` and `master` point at the same `HEAD`: ````bash A --> B --> C --> D master (*) ^ publication ```` --- ## Branching continued: use case "freeze publication" Now you continue to work on master, and commit more changes: ````bash A --> B --> C --> D --> E --> F master(*) ^ new_feature ```` To reset your project into the state of publication: ````bash $ git checkout publication ```` Which results in: ````bash A --> B --> C --> D --> E --> F master ^ new_feature (*) ```` Note: `git` also supports so called "tags". --- class:middle ## Branching: merge conflicts If a patch can not be applied we get a "merge conflict". http://githowto.com/resolving_conflicts --- class:middle ## Good practices * commits should be small * commit messages should be understandable and descriptive * use `git status`, `git diff --cached` before you commit. * only track plain source and text files (e.g. put `*.pyc` into `.gitignore`) --- class:middle,inverse,center # **Introduction to github / gitlab / ...** --- class:middle ## About remote repositories * Working with remote repositories is about synchronization of repositories: 1. push commits and branches to another repository 2. fetch commits and branches from another repository and merge them ## Why a remote repository ? * You can share your code * You can use it as a backup * You can use it as an "staging area" if you work on different computers --- class:middle ## Typical workflow for one or a few developers: 1. I updated my local repository by fetching and merging all new commits from a remote repository with `git pull` 2. I work on a new feature / fix a bug / ... (`git add`, `git commit`, ...) 3. I push the new commits and branches to the remote repository with `git push` 4. Continue with 1. --- class:middle,center ### 1. Please create a user account on https://gitlab.com if you don't have one yet. ### 2. Please open https://gitlab.com/uweschmitt/git-handson.git in your browser. --- class:middle ## Fetch (clone) a remote repository Before you want to work with a remote reposiory, we first have to `clone` it. First look at https://gitlab.com/uweschmitt/git-handson.git before you continue with the examples below. ````bash $ cd $ mkdir gitlab_demo $ cd gitlab_demo $ git clone https://gitlab.com/uweschmitt/git-handson.git Cloning into 'git-handson'... .... Unpacking objects: 100% (20/20), done. $ cd git-handson $ ls -al total 16 drwxr-xr-x 5 uweschmitt staff 160 Feb 12 17:32 . drwxr-xr-x 47 uweschmitt staff 1504 Feb 12 17:32 .. drwxr-xr-x 13 uweschmitt staff 416 Feb 12 17:33 .git -rw-r--r-- 1 uweschmitt staff 252 Feb 12 17:32 greet.py -rw-r--r-- 1 uweschmitt staff 123 Feb 12 17:32 run.py ```` --- class:middle ## Fetch (clone) a remote repository continued When we ran `git clone` we automatically registered the remote repository as `origin`: ````bash $ git remote -v origin https://gitlab.com/uweschmitt/git-handson.git (fetch) origin https://gitlab.com/uweschmitt/git-handson.git (push) ```` Our default branch is `master`: ````bash $ git log --oneline --graph * 2f16683 (HEAD -> master, origin/master, origin/HEAD) Merge branch 'new_feature' |\ | * 2ab650b (origin/new_feature, new_feature) added function greet_two * | 381ebd0 fixed spelling |/ * ba63be2 Revert "added go_to_hell" * c1f84d2 added go_to_hell * 69e5167 improved greet.py and added test script * 16cf3f2 first version of greet.py ```` --- class:middle ## Fetch (clone) a remote repository continued We can now checkout an existing branch: ````bash $ git checkout new_feature $ git log --oneline --graph * 2ab650b (HEAD -> new_feature, origin/new_feature) added function greet_two * ba63be2 Revert "added go_to_hell" * c1f84d2 added go_to_hell * 69e5167 improved greet.py and added test script * 16cf3f2 first version of greet.py ```` And also list existing branches: ````bash $ git branch master * new_feature ```` ````bash $ git checkout master ```` --- class:middle ## A new commit 1. Make sure that your active branch is `master`. 2. Please create a text file with a unique name (eg choose your name) and some random content. 3. Commit this file. 4. Now push these changes using arguments `REMOTE_NAME` and `BRANCH_NAME` ````bash $ git push origin master ```` 5. When all your colleagues are done or the `push` fails, because the remote repository changed, you have to update your local repository: ````bash $ git pull origin master ```` 6. Now check what happened: ````bash $ ls -al ..... ```` --- class:middle,center ## `git pull` caveat `git pull origin BRANCH` fetches and merges the remote `BRANCH` into the current active branch on your machine ! --- class:middle, center ## What next ? ## Similar git introduction http://swcarpentry.github.io/git-novice/ ## Advanced and deeper tutorials about git https://eev.ee/blog/2015/04/24/just-enough-git-to-be-less-dangerous/ https://www.learnenough.com/git-tutorial https://about.gitlab.com/images/press/git-cheat-sheet.pdf --- class:middle, center, inverse # **Exercise !** --- class: center, middle, inverse This slide show was created with http://remarkjs.com/