o-o o   o  o-o   o-o                o  
 /    |\ /| o     |     O          o  |  
O     | O | |  -o  o-o        o--o   -o- 
 \    |   | o   |     | O     |  | |  |  
  o-o o   o  o-o  o--o        o--O |  o  
                                 |       
                              o--o       

Presented for McGill’s Condensed Matter Graduate Seminar on October 23, 2017

Learning git from the bottom up.

Outline:

  • What/why is git?
  • git quick start
  • From the bottom: piping
  • GitHub collaborations basics

What is git?

git is a local version control system for tracking changes in files.

Created by Linus Torvalds (inventor of Linux) in 2005

Wanted something fast, free and decentralized

git is distributed: everyone has a full independent history and copy of the files.


Fun facts about git

git is British slang for ‘unpleasant person’

“I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’.” - Linus Torvalds

git is now the most widely used VCS in the world. - Google - Netflix - Facebook - Twitter - Android


git vs GitHub

Git is an offline tool for version control.

  • GitHub is an online service for hosting and managing git repositories.


Initializing your repository

Create a directory where you will store your project.

$ mkdir Git_Tutorial $ cd Git_Tutorial

Initialize git for your directory.

$ git init

Creates a hidden folder .git where all content and history of your repository is stored.

All git commands follow the word git


Track some files

Create some files to track

$ echo "condensed matter matters" > cmgs.txt $ echo "not tracking" > hi.txt

We have to tell git which files to track.

$ git add cmgs.txt

Now git is tracking changes to cmgs.txt


Check the status of your repo

$ git status

Prints information on the repo.

  • Changes to tracked files
  • New files added
  • Untracked files, etc.

Useful to run regularly.


Writing to history

Adding a file is something we want to go down in history.

Writing the current state of the repo is called committing

$ git commit -m "first commit."

Commits require a message describing the change.


Make a change to track

Let’s add a line to the end of our file.

$ echo "actually no" >> cmgs.txt $ git status

We should see that git noticed a change.


See the changes

$ git diff

Displays differences between last commit and current state.


Stage changes

If we like the changes we can tell git to track them.

$ git add cmgs.txt

Let’s commit the new version to history.

$ git commit -m "new line"


Viewing repo history

$ git log

Displays all the commits, who made them and when.


We can go back in time

$ git checkout [commit ID]

Brings the repo to the state at the given commit.

Once you’re done looking around you can go back to the present.

$ git checkout master


Branching

Let’s say you want to test a different version without affecting the rest of the repo.

You can create a new independent copy of the repo with the same history as the original.

$ git branch test

Creates a new branch called test

You can checkout a branch.

$ git checkout test

You can commit changes to test and they will stay in test.

$ echo "hi from test" >> cmgs.txt" $ git checkout master


Merging

If you liked how things worked in the test branch you can merge them to the main, aka master branch.

$ git merge test


Okay now some plumbing

You can get by quite nicely with these commands.

Git calls these the porcelain commands.

To really understand what’s happening we have to look at the ugly plumbing.


Git Objects

Git stores data in 3 major object types:

  1. blob
  2. tree
  3. commit

All the commands we saw mainpulate there 3 object types.

Most of the info I am presenting on git internals is taken from the Pro Git book.


blobs

blobs are the way git stores file contents

Every version of a file is a different blob.

git is a content addressable file-system.

  • Data in a repo is accessed based on its content.

making a blob

Let’s make a new repo to play around with the plumbing.

$ cd ..
$ git init Plumbing

Git computes the SHA1 hash of the file’s contents and an automatically generated header.

We can hash content from stdin:

$ echo "condensed matter matters" | git hash-object --stdin 
b1a1b80ca6f24ccdd220d8db24af08db4e096970

Or from a file:

$ echo "condensed matter matters" > cmgs.txt
$ git hash-object cmgs.txt
b1a1b80ca6f24ccdd220d8db24af08db4e096970

Storing blobs

If we want git to store the result use -w

$ git hash-object -w cmgs.txt

This creates a compressed file in .git/objects

$ find .git/objects -type f
.git/objects/b1/a1b80ca6f24ccdd220d8db24af08db4e096970

Low-level version control

Let’s make a new version of the file.

$ echo "condensed matter doesn't matter" > cmgs.txt
$ git hash-object -w cmgs.txt

Now we have two objects stored

$ find .git/objects -type f
.git/objects/3a/918db0eb98806e55545a8457d5fa3675f5270c
.git/objects/b1/a1b80ca6f24ccdd220d8db24af08db4e096970

Reading blobs

Since the hash depends on the content, we have a content-addressable filesystem.

First version

$ git cat-file -p b1a1b80ca6f24ccdd220d8db24af08db4e096970
condensed matter matters

Second version

$ git cat-file -p 3a918db0eb98806e55545a8457d5fa3675f5270c
condensed matter doesn't matter

We can revert back to the original version.

$ git cat-file -p b1a1b80ca6f24ccdd220d8db24af08db4e096970 > cmgs.txt

So now we pretty much have a version control system!

However, not very user-friendly.


trees: remembering directories

It is annoying to have to know the hash of each file by heart. We also haven’t stored the filename.

This is where trees come in.

Trees are objects which contain pointers to:

  • blobs
  • other trees

You can think of them as directories. Directories contain files and other directories.


Creating trees

When you add files to the staging area, git add, git creates an index of files and folders to track.

Git automatically creates trees and blobs based on the index.

To make our own tree we have to make an index.

The index needs to know:

  • mode (10064 means regular file)
  • hash
  • name
$ git update-index --add --cacheinfo 100644 \
b1a1b80ca6f24ccdd220d8db24af08db4e096970 cmgs.txt

--add hashes file, stores to .git/objects and adds to index.

--cacheinfo tells git to use the info already in .git/objects for this content.


Creating trees (pt. 2)

Now we can make a tree.

$ git write-tree
9e51f861e7d8976a04cfbeb45003255a59bca9bd

This creates a tree based on current state of the index and gives us its hash.

$ git cat-file -t 9e51f861e7d8976a04cfbeb45003255a59bca9bd
tree
$ git cat-file -p 9e51f861e7d8976a04cfbeb45003255a59bca9bd
100644 blob b1a1b80ca6f24ccdd220d8db24af08db4e096970 cmgs.txt

Now we can use the tree to get a filenames and hashes.

		+--------------+
		|     tree     |
		|--------------|
		|     9e51     |
		|              |
		+------+-------+
		       |
		       |cmgs.txt
		       v
		+--------------+
		|    blob      |
		|--------------|
		|    b1a1      |
		|              |
		+--------------+

More fun with trees

Let’s make a new tree with the other version of cmgs.txt

$ git update-index --cacheinfo 100644 \
 3a918db0eb98806e55545a8457d5fa3675f5270c cmgs.txt

And just for fun let’s add a new file.

$ echo "new file" > new.txt
$ git update-index --add new.txt

Finally, make the tree..

$ git write-tree
$ git cat-file -p 34bfdc1c8a3d1d2bc487d79d9208650ef28415bc 
100644 blob 3a918db0eb98806e55545a8457d5fa3675f5270c    cmgs.txt
100644 blob fa49b077972391ad58037050f2a75f74e3671e92    new.txt


Trees of trees

We can add trees to our current tree. This produces a new tree.

$ git read-tree --prefix=v1 9e51f861e7d8976a04cfbeb45003255a59bca9bd 
$ git write-tree
c1b987fd9e44054318fc1786953b1a06ba0bfd5c

$ git cat-file -p c1b987fd9e44054318fc1786953b1a06ba0bfd5c 
100644 blob 3a918db0eb98806e55545a8457d5fa3675f5270c    cmgs.txt
100644 blob fa49b077972391ad58037050f2a75f74e3671e92    new.txt
040000 tree 9e51f861e7d8976a04cfbeb45003255a59bca9bd    v1 

Our trees

                            +--------------+
                            |     tree     |
                            |--------------|
                            |     c1b9     |
                            |              |
                            +-----++-------+
                              ++--+++-----------------+
                              ++                      |
                  +-----------++----+                 |v1
         cmgs.txt |                 | new.txt         |
                  +                 +                 +
          +--------------+   +--------------+  +--------------+
          |    blob      |   |    blob      |  |    tree      |
          |--------------|   |--------------|  |--------------|
          |    3a91      |   |    fa49      |  |    9e51      |
          |              |   |              |  |              |
          +--------------+   +--------------+  +------+-------+
                                                      |
                                                      |
                                                      v
                                               +--------------+
                                               |    blob      |
                                               |--------------|
                                               |    b1a1      |
                                               |              |
                                               +--------------+

Commit objects

Finally, we have the commit object.

Trees helped us group files and filenames together.

But we still have to remember the tree hash.

This is where the commit object comes in.


Commit objects

Commit objects store:

  1. Tree pointer
  2. Committer
  3. Commit message
  4. Timestamp

Creating commits

We currently have 3 trees:

  • 9e51 -- cmgs.txt (original)
  • 34bf -- cmgs.txt (modified), new.txt
  • c1b9 -- cmgs.txt (modified), new.txt, v1 (tree)

The command git commit-tree creates a commit from a tree and optionally a parent commit.

$ echo 'first commit' | git commit-tree 9e51
5f2c3ae3513b23273dc775cc8c9117c870b92933

This hash value depends on the data pointed too as well as the author, message, and timestamp.


Viewing commit info

Let’s look at what’s inside our commit.

$ git cat-file -p 5f2c3ae3513b23273dc775cc8c9117c870b92933 
tree 9e51f861e7d8976a04cfbeb45003255a59bca9bd
author Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com> 1508759875 -0400
committer Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com> 1508759875 -0400

first commit

User info pulled from file ~/.gitconfig


Creating a history

We can chain commits together by specifying a preceding commit at commit creation using the -p flag.

$ echo 'second commit' | git commit-tree 34bf -p 5f2c
64cd23d2a8da3ff0ea24daab7dc33ca40dd91adf

$ echo 'third commit' | git commit-tree c1b0 -p c1b9
06cdfb14114061185c292b5b5952ed13b9306855

We can now view our history using git log

commit 06cdfb14114061185c292b5b5952ed13b9306855
Author: Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com>
Date:   Mon Oct 23 08:05:13 2017 -0400

    third commit

 v1/cmgs.txt | 1 +
 1 file changed, 1 insertion(+)

commit 64cd23d2a8da3ff0ea24daab7dc33ca40dd91adf
Author: Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com>
Date:   Mon Oct 23 08:04:09 2017 -0400

    second commit

 cmgs.txt | 2 +-
 new.txt  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

commit 5f2c3ae3513b23273dc775cc8c9117c870b92933
Author: Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com>
Date:   Mon Oct 23 07:57:55 2017 -0400

    first commit

 cmgs.txt | 1 +
 1 file changed, 1 insertion(+)

  +--------------+    +--------------+    +--------------+
  |    commit    |    |     tree     |    |     blob     |
  |--------------|    |--------------|    |--------------|
  |    06cd      +--->|    c1b9      +--->|  'doesn't'   +
  |'third  comm.'|    |              |    |              |
  +------+-------+    +------+-------+    +------+-------+
         |                v1 |
         |                   +----------------------------------+
         v                                                      |
  +--------------+    +--------------+                          |
  |    commit    |    |     tree     |                          |
  |--------------|    |--------------|                          |
  |    64cd      +--->|    34bf      ++---->                    |
  |'second com..'|    |              |                          |
  +------+-------+    +------+-------+                          |
         |                                                      |
         |                                                      |
         v                                                      |
  +--------------+    +--------------+      +--------------+    |
  |    commit    |    |     tree     |      |     blob     |    |
  |--------------|    |--------------|cmgs  |--------------|    |
  |    5f2c      +--->|    9e51      +----->|    9e51      +    |
  |'first commit'|    |              |      |'cm matters'  |    |
  +--------------+    +------+-------+      +------+-------+    |
                             ^                                  |
                             +---------------------------------++


References

Okay it’s time to finally get rid of those clunky SHA1 digests.

References are like nicknames for commit hash values.

References are stored in .git/refs

Let’s create a reference to our latest commit and call it ‘master’.

$ echo 06cdfb14114061185c292b5b5952ed13b9306855 > .git/refs/heads/master

Now git log master displays the history from our last commit.

We can add a reference to a different state of our repo.

echo 64cd23d2a8da3ff0ea24daab7dc33ca40dd91adf > .git/refs/heads/test

Now we will only see changes from the second commit down.


The HEAD

To make our lives easier, git automatically stores a reference to our latest commit in a ref called HEAD.

This reference is stored in .git/HEAD

Running git commit:

  1. Makes a tree from your current index.
  2. Uses the current value of HEAD as the predecessor
  3. Sets HEAD to the new commit hash.

The HEAD and branching

Now we have everything wee need to understand branches.

When you say git branch test

git creates a new reference to the latest commit, in this case called test.

                                    +---------+
                                    |  HEAD   |
                                    +----+----+
                                         v
                                    +---------+
                                    |  master |
                                    +----+----+
                                         v
 +---------+      +---------+       +---------+
 |         |      |         |       |         |
 | 5f2c    |<-----+   64cd  |<------+ 06cd    |
 +---------+      +---------+       +---------+
                                         ^
                                    +----+----+
                                    |  test   |
                                    +----+----+


Checkout moves HEAD

$ git checkout test
                                    +---------+
                                    |  master |
                                    +----+----+
                                         v
 +---------+      +---------+       +---------+
 |         |      |         |       |         |
 | 5f2c    |<-----+   64cd  |<------+ 06cd    |
 +---------+      +---------+       +---------+
                                         ^
                                    +----+----+
                                    |  test   |
                                    +----+----+
                                         ^
                                    +----+----+
                                    |  HEAD   |
                                    +----+----+

Committing to a new branch

Committing on a new branch moves head forward but master stays the same.


                                  +---------+
                                  |  master |
                                  +----+----+
                                       v
 +---------+      +---------+     +---------+     +---------+
 |         |      |         |     |         |     |         |
 | 5f2c    |<-----+   64cd  |<----+ 06cd    |<----+ 2jk2    |
 +---------+      +---------+     +---------+     +---------+
                                                       ^
                                                  +----+----+
                                                  |  test   |
                                                  +----+----+
                                                       ^
                                                  +----+----+
                                                  |  HEAD   |
                                                  +----+----+

Compared to other VCS branching in git is super cheap.

All it takes is writing a 40 character string to a file.

In other VCS, branching involves making full copies of directories and can take minutes to run.

Branching and merging can get kind of involved so we leave that for Part III.


Remote branches

Git can also let you commit to branches that are not on your computer.

These are called remote branches.

There are many services for hosting remote repositories.

The most popular one is GitHub

Make an account on GitHub (note: you can get free unlimited private repositories by proving you are a student.


Setting up your ssh key

GitHub needs to authenticate anyone trying to commit to the remote.

The easiest way is to tell GitHub your public key so it can authenticate you without using a password.

If you don’t already have a key-pair:

$ ssh-keygen -t rsa -b 4096 -C "your_email@example.com" 

Then:

#copy public key to clipboard
$ pbcopy < ~/.ssh/id_rsa.pub

Now you have to give GitHub your public key so they can verify your commits are truly coming from the holder of the private key.

GitHub > Settings > New SSH key or Add SSH key > Paste the key


Adding a remote

In GitHub, click ‘New Repository’

Let’s create a new repository from our current local repo.

git remote add origin git@github.com:cgoliver/test.git

This gives us a reference called origin (you can name it whatever you like) to the remote repository.

Now we can access branches on the remote using the syntax [remote]/[branchname]

For example, the master branch on our remote will be called origin/master.


Cloning a repository

Let’s say we all want to collaborate on the same project.

You can download (aka clone) it and commit your own changes to it.

git clone git@github.com:cgoliver/CMGS.git

Give me your username and I’ll give you access to modify the repo.


Basic remote workflow

Once your repo is cloned, and you have full permission you can work and commit normally.

I. Make sure you are up to date with the remote

$ git fetch origin
$ git merge origin/master

fetch gets the latest status from the remote.

If you like it you can merge it to your local master.

Shorthand:

$ git pull

II. Commit your changes locally

$ git commit -am "my changes"

The -am option adds the changed state of the repo and commits at the same time.

III. Publish your changes.

$ git push origin master

Pushes your master branch to the origin remote.