Intro to git
o-o o o o-o o-o o
/ |\ /| o | O o |
O | O | | -o o-o o--o -o-
\ | | o | | O | | | |
o-o o o o-o o--o o--O | o
|
o--o
Presented for McGill’s Condensed Matter Graduate Seminar on October 23, 2017
Learning git from the bottom up.
Outline:
- What/why is git?
- git quick start
- From the bottom: piping
- GitHub collaborations basics
What is git?
git is a local version control system for tracking changes in files.
Created by Linus Torvalds (inventor of Linux) in 2005
Wanted something fast, free and decentralized
git is distributed: everyone has a full independent history and copy of the files.
Fun facts about git
git is British slang for ‘unpleasant person’
“I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’.” - Linus Torvalds
git is now the most widely used VCS in the world. - Google - Netflix - Facebook - Twitter - Android
git vs GitHub
Git is an offline tool for version control.
-
GitHub is an online service for hosting and managing git repositories.
- You can set up your own git server
Initializing your repository
Create a directory where you will store your project.
$ mkdir Git_Tutorial
$ cd Git_Tutorial
Initialize git for your directory.
$ git init
Creates a hidden folder .git
where all content and history of your repository is stored.
All git
commands follow the word git
Track some files
Create some files to track
$ echo "condensed matter matters" > cmgs.txt
$ echo "not tracking" > hi.txt
We have to tell git which files to track.
$ git add cmgs.txt
Now git is tracking changes to cmgs.txt
Check the status of your repo
$ git status
Prints information on the repo.
- Changes to tracked files
- New files added
- Untracked files, etc.
Useful to run regularly.
Writing to history
Adding a file is something we want to go down in history.
Writing the current state of the repo is called committing
$ git commit -m "first commit."
Commits require a message describing the change.
Make a change to track
Let’s add a line to the end of our file.
$ echo "actually no" >> cmgs.txt
$ git status
We should see that git noticed a change.
See the changes
$ git diff
Displays differences between last commit and current state.
Stage changes
If we like the changes we can tell git to track them.
$ git add cmgs.txt
Let’s commit the new version to history.
$ git commit -m "new line"
Viewing repo history
$ git log
Displays all the commits, who made them and when.
We can go back in time
$ git checkout [commit ID]
Brings the repo to the state at the given commit.
Once you’re done looking around you can go back to the present.
$ git checkout master
Branching
Let’s say you want to test a different version without affecting the rest of the repo.
You can create a new independent copy of the repo with the same history as the original.
$ git branch test
Creates a new branch called test
You can checkout a branch.
$ git checkout test
You can commit changes to test
and they will stay
in test
.
$ echo "hi from test" >> cmgs.txt"
$ git checkout master
Merging
If you liked how things worked in the test branch you can merge them to the main, aka master branch.
$ git merge test
Okay now some plumbing
You can get by quite nicely with these commands.
Git calls these the porcelain commands.
To really understand what’s happening we have to look at the ugly plumbing.
Git Objects
Git stores data in 3 major object types:
- blob
- tree
- commit
All the commands we saw mainpulate there 3 object types.
Most of the info I am presenting on git internals is taken from the Pro Git book.
blobs
blobs are the way git stores file contents
Every version of a file is a different blob.
git is a content addressable file-system.
- Data in a repo is accessed based on its content.
making a blob
Let’s make a new repo to play around with the plumbing.
$ cd ..
$ git init Plumbing
Git computes the SHA1 hash of the file’s contents and an automatically generated header.
We can hash content from stdin:
$ echo "condensed matter matters" | git hash-object --stdin
b1a1b80ca6f24ccdd220d8db24af08db4e096970
Or from a file:
$ echo "condensed matter matters" > cmgs.txt
$ git hash-object cmgs.txt
b1a1b80ca6f24ccdd220d8db24af08db4e096970
Storing blobs
If we want git to store the result use -w
$ git hash-object -w cmgs.txt
This creates a compressed file in .git/objects
$ find .git/objects -type f
.git/objects/b1/a1b80ca6f24ccdd220d8db24af08db4e096970
Low-level version control
Let’s make a new version of the file.
$ echo "condensed matter doesn't matter" > cmgs.txt
$ git hash-object -w cmgs.txt
Now we have two objects stored
$ find .git/objects -type f
.git/objects/3a/918db0eb98806e55545a8457d5fa3675f5270c
.git/objects/b1/a1b80ca6f24ccdd220d8db24af08db4e096970
Reading blobs
Since the hash depends on the content, we have a content-addressable filesystem.
First version
$ git cat-file -p b1a1b80ca6f24ccdd220d8db24af08db4e096970
condensed matter matters
Second version
$ git cat-file -p 3a918db0eb98806e55545a8457d5fa3675f5270c
condensed matter doesn't matter
We can revert back to the original version.
$ git cat-file -p b1a1b80ca6f24ccdd220d8db24af08db4e096970 > cmgs.txt
So now we pretty much have a version control system!
However, not very user-friendly.
trees: remembering directories
It is annoying to have to know the hash of each file by heart. We also haven’t stored the filename.
This is where trees come in.
Trees are objects which contain pointers to:
- blobs
- other trees
You can think of them as directories. Directories contain files and other directories.
Creating trees
When you add files to the staging area, git add
,
git creates an index of files and folders to track.
Git automatically creates trees and blobs based on the index.
To make our own tree we have to make an index.
The index needs to know:
- mode (10064 means regular file)
- hash
- name
$ git update-index --add --cacheinfo 100644 \
b1a1b80ca6f24ccdd220d8db24af08db4e096970 cmgs.txt
--add
hashes file, stores to .git/objects
and adds to index.
--cacheinfo
tells git to use the info already in .git/objects
for this content.
Creating trees (pt. 2)
Now we can make a tree.
$ git write-tree
9e51f861e7d8976a04cfbeb45003255a59bca9bd
This creates a tree based on current state of the index and gives us its hash.
$ git cat-file -t 9e51f861e7d8976a04cfbeb45003255a59bca9bd
tree
$ git cat-file -p 9e51f861e7d8976a04cfbeb45003255a59bca9bd
100644 blob b1a1b80ca6f24ccdd220d8db24af08db4e096970 cmgs.txt
Now we can use the tree to get a filenames and hashes.
+--------------+
| tree |
|--------------|
| 9e51 |
| |
+------+-------+
|
|cmgs.txt
v
+--------------+
| blob |
|--------------|
| b1a1 |
| |
+--------------+
More fun with trees
Let’s make a new tree with the other version of cmgs.txt
$ git update-index --cacheinfo 100644 \
3a918db0eb98806e55545a8457d5fa3675f5270c cmgs.txt
And just for fun let’s add a new file.
$ echo "new file" > new.txt
$ git update-index --add new.txt
Finally, make the tree..
$ git write-tree
$ git cat-file -p 34bfdc1c8a3d1d2bc487d79d9208650ef28415bc
100644 blob 3a918db0eb98806e55545a8457d5fa3675f5270c cmgs.txt
100644 blob fa49b077972391ad58037050f2a75f74e3671e92 new.txt
Trees of trees
We can add trees to our current tree. This produces a new tree.
$ git read-tree --prefix=v1 9e51f861e7d8976a04cfbeb45003255a59bca9bd
$ git write-tree
c1b987fd9e44054318fc1786953b1a06ba0bfd5c
$ git cat-file -p c1b987fd9e44054318fc1786953b1a06ba0bfd5c
100644 blob 3a918db0eb98806e55545a8457d5fa3675f5270c cmgs.txt
100644 blob fa49b077972391ad58037050f2a75f74e3671e92 new.txt
040000 tree 9e51f861e7d8976a04cfbeb45003255a59bca9bd v1
Our trees
+--------------+
| tree |
|--------------|
| c1b9 |
| |
+-----++-------+
++--+++-----------------+
++ |
+-----------++----+ |v1
cmgs.txt | | new.txt |
+ + +
+--------------+ +--------------+ +--------------+
| blob | | blob | | tree |
|--------------| |--------------| |--------------|
| 3a91 | | fa49 | | 9e51 |
| | | | | |
+--------------+ +--------------+ +------+-------+
|
|
v
+--------------+
| blob |
|--------------|
| b1a1 |
| |
+--------------+
Commit objects
Finally, we have the commit object.
Trees helped us group files and filenames together.
But we still have to remember the tree hash.
This is where the commit object comes in.
Commit objects
Commit objects store:
- Tree pointer
- Committer
- Commit message
- Timestamp
Creating commits
We currently have 3 trees:
9e51 -- cmgs.txt
(original)34bf -- cmgs.txt
(modified), new.txtc1b9 -- cmgs.txt
(modified), new.txt, v1 (tree)
The command git commit-tree
creates a commit from a tree and
optionally a parent commit.
$ echo 'first commit' | git commit-tree 9e51
5f2c3ae3513b23273dc775cc8c9117c870b92933
This hash value depends on the data pointed too as well as the author, message, and timestamp.
Viewing commit info
Let’s look at what’s inside our commit.
$ git cat-file -p 5f2c3ae3513b23273dc775cc8c9117c870b92933
tree 9e51f861e7d8976a04cfbeb45003255a59bca9bd
author Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com> 1508759875 -0400
committer Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com> 1508759875 -0400
first commit
User info pulled from file ~/.gitconfig
Creating a history
We can chain commits together by specifying a preceding commit
at commit creation using the -p
flag.
$ echo 'second commit' | git commit-tree 34bf -p 5f2c
64cd23d2a8da3ff0ea24daab7dc33ca40dd91adf
$ echo 'third commit' | git commit-tree c1b0 -p c1b9
06cdfb14114061185c292b5b5952ed13b9306855
We can now view our history using git log
commit 06cdfb14114061185c292b5b5952ed13b9306855
Author: Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com>
Date: Mon Oct 23 08:05:13 2017 -0400
third commit
v1/cmgs.txt | 1 +
1 file changed, 1 insertion(+)
commit 64cd23d2a8da3ff0ea24daab7dc33ca40dd91adf
Author: Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com>
Date: Mon Oct 23 08:04:09 2017 -0400
second commit
cmgs.txt | 2 +-
new.txt | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
commit 5f2c3ae3513b23273dc775cc8c9117c870b92933
Author: Carlos G. Oliver <carlos.gonzalez.oliver@gmail.com>
Date: Mon Oct 23 07:57:55 2017 -0400
first commit
cmgs.txt | 1 +
1 file changed, 1 insertion(+)
+--------------+ +--------------+ +--------------+
| commit | | tree | | blob |
|--------------| |--------------| |--------------|
| 06cd +--->| c1b9 +--->| 'doesn't' +
|'third comm.'| | | | |
+------+-------+ +------+-------+ +------+-------+
| v1 |
| +----------------------------------+
v |
+--------------+ +--------------+ |
| commit | | tree | |
|--------------| |--------------| |
| 64cd +--->| 34bf ++----> |
|'second com..'| | | |
+------+-------+ +------+-------+ |
| |
| |
v |
+--------------+ +--------------+ +--------------+ |
| commit | | tree | | blob | |
|--------------| |--------------|cmgs |--------------| |
| 5f2c +--->| 9e51 +----->| 9e51 + |
|'first commit'| | | |'cm matters' | |
+--------------+ +------+-------+ +------+-------+ |
^ |
+---------------------------------++
References
Okay it’s time to finally get rid of those clunky SHA1 digests.
References are like nicknames for commit hash values.
References are stored in .git/refs
Let’s create a reference to our latest commit and call it ‘master’.
$ echo 06cdfb14114061185c292b5b5952ed13b9306855 > .git/refs/heads/master
Now git log master
displays the history from our last commit.
We can add a reference to a different state of our repo.
echo 64cd23d2a8da3ff0ea24daab7dc33ca40dd91adf > .git/refs/heads/test
Now we will only see changes from the second commit down.
The HEAD
To make our lives easier, git automatically stores a reference to our latest commit in a ref called HEAD.
This reference is stored in .git/HEAD
Running git commit
:
- Makes a tree from your current index.
- Uses the current value of
HEAD
as the predecessor - Sets
HEAD
to the new commit hash.
The HEAD and branching
Now we have everything wee need to understand branches.
When you say git branch test
git creates a new reference to the latest commit, in this case called test
.
+---------+
| HEAD |
+----+----+
v
+---------+
| master |
+----+----+
v
+---------+ +---------+ +---------+
| | | | | |
| 5f2c |<-----+ 64cd |<------+ 06cd |
+---------+ +---------+ +---------+
^
+----+----+
| test |
+----+----+
Checkout moves HEAD
$ git checkout test
+---------+
| master |
+----+----+
v
+---------+ +---------+ +---------+
| | | | | |
| 5f2c |<-----+ 64cd |<------+ 06cd |
+---------+ +---------+ +---------+
^
+----+----+
| test |
+----+----+
^
+----+----+
| HEAD |
+----+----+
Committing to a new branch
Committing on a new branch moves head forward but master stays the same.
+---------+
| master |
+----+----+
v
+---------+ +---------+ +---------+ +---------+
| | | | | | | |
| 5f2c |<-----+ 64cd |<----+ 06cd |<----+ 2jk2 |
+---------+ +---------+ +---------+ +---------+
^
+----+----+
| test |
+----+----+
^
+----+----+
| HEAD |
+----+----+
Compared to other VCS branching in git is super cheap.
All it takes is writing a 40 character string to a file.
In other VCS, branching involves making full copies of directories and can take minutes to run.
Branching and merging can get kind of involved so we leave that for Part III.
Remote branches
Git can also let you commit to branches that are not on your computer.
These are called remote branches.
There are many services for hosting remote repositories.
The most popular one is GitHub
Make an account on GitHub (note: you can get free unlimited private repositories by proving you are a student.
Setting up your ssh key
GitHub needs to authenticate anyone trying to commit to the remote.
The easiest way is to tell GitHub your public key so it can authenticate you without using a password.
If you don’t already have a key-pair:
$ ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
Then:
#copy public key to clipboard
$ pbcopy < ~/.ssh/id_rsa.pub
Now you have to give GitHub your public key so they can verify your commits are truly coming from the holder of the private key.
GitHub > Settings > New SSH key or Add SSH key > Paste the key
Adding a remote
In GitHub, click ‘New Repository’
Let’s create a new repository from our current local repo.
git remote add origin git@github.com:cgoliver/test.git
This gives us a reference called origin
(you can name it whatever you like) to the remote repository.
Now we can access branches on the remote using the syntax [remote]/[branchname]
For example, the master branch on our remote will be called origin/master
.
Cloning a repository
Let’s say we all want to collaborate on the same project.
You can download (aka clone) it and commit your own changes to it.
git clone git@github.com:cgoliver/CMGS.git
Give me your username and I’ll give you access to modify the repo.
Basic remote workflow
Once your repo is cloned, and you have full permission you can work and commit normally.
I. Make sure you are up to date with the remote
$ git fetch origin
$ git merge origin/master
fetch
gets the latest status from the remote.
If you like it you can merge it to your local master.
Shorthand:
$ git pull
II. Commit your changes locally
$ git commit -am "my changes"
The -am
option adds the changed state of the repo and commits at the same time.
III. Publish your changes.
$ git push origin master
Pushes your master branch to the origin
remote.