new views on your history with git replace
DESCRIPTION
Git has become the most popular version control system in the Open Source world, and more and more companies are also using it. The source code history when managed by Git is supposed to be immutable, because Git uses a content addressed database. The Git objects are indexed by their SHA-1 hash. When mistake have been made, or to make some history based features more useful or more reliable, though, it can be interesting to transform the Git source code history. To do that it is a good idea to use git replace.TRANSCRIPT
New Views on your History with git replace
Christian Couder, [email protected]
OSDC.fr 2013
October 5, 2013
About Git
A Distributed Version Control System (DVCS): ●created by Linus Torvalds●maintained by Junio Hamano●since 2005●prefered VCS among open source
developers
Git Design
Git is made of these things:
●“Objects”●“Refs”●config, indexes, logs, hooks,
grafts, packs, ...
Only “Objects” and “Refs” are transferred from one repository to another.
Git Objects
●Blob: content of a file●Tree: content of a directory●Commit: state of the whole source code●Tag: stamp on an object
Git Objects Storage
●Git Objects are stored in acontent addressable database.
●The key to retrieve each Object is theSHA-1 of the Object’s content.
●A SHA-1 is a 160-bit / 40-hex / 20-bytehash value which is considered unique.
blob size
/* content of this blob, it can be anything like an image, a video, ... but most of the time it is source code like:*/
#include <stdio.h>
int main(void){ printf("Hello world!\n"); return 0;}
SHA1: e8455...
Blob
blob = content of a file
Example of storing and retrieving a blob
# echo “Whatever…” | git hash-object -w --stdinaa02989467eea6d8e0bc68f3663de51767a9f5b1
# git cat-file -p aa02989467Whatever...
tree size
SHA1: 0de24...
blob
tree
hello.c
lib
e8455...
10af9...
Tree
tree = content of a directory
It can point to blobs and other trees.
Example of storing and retrieving a tree
# BLOB=aa02989467eea6d8e0bc68f3663de51767a9f5b1# (printf "100644 whatever.txt\0"; echo $BLOB | xxd -r -p)
| git hash-object -t tree -w --stdin0625da548ef0a7038c44b480f10d5550b2f2f962
# git cat-file -p 0625da548e100644 blob aa02989467... whatever.txt
commit size
SHA1: 98ca9...
parents
tree 0de24...
()
author Christian <timestamp>
committer
My commit message
Commit
commit = information about some changes
It points to one tree and 0 or more parents.
Christian <timestamp>
Example of storing and retrieving a commit (1)
# TREE=0625da548ef0a7038c44b480f10d5550b2f2f962# ME=”Christian Couder <[email protected]>”# DATE=$(date "+%s %z")# (echo -e "tree $TREE\nauthor $ME $DATE";
echo -e "committer $ME $DATE\n\nfirst commit")| git hash-object -t commit -w --stdin
37449e955443883a0a888ee100cfd0a7ba7927b3
Example of storing and retrieving a commit (2)
# git cat-file -p 37449e9554tree 0625da548ef0a7038c44b480f10d5550b2f2f962author Christian Couder <[email protected]> 1380447450 +0200committer Christian Couder <[email protected]> 1380447450 +0200
first commit
Git Objects Relations
Commit size
SHA1: e84c7...
parents
tree 29c43...
()
author Christian
committer ChristianInitial commit
Tree size
blob
tree
0de24...hello.c
doc 98ca9...
Commit size
SHA1: 98ca9...
parents
tree 5c11f...
(e84c7...)
author Arnaud
committer ArnaudChange hello.c
SHA1: 29c43...
Tree size
blob
blob
677f4...readme
install 23ae9...
SHA1: 98ca9...
Tree size
blob
tree
bc789...hello.c
doc 98ca9...
SHA1: 5c11f...
Blob size
SHA1: 0de24...
int main() { ... }
Blob size
SHA1: bc789...
int main(void) { ... }
Git Refs
●Head: branch,.git/refs/heads/
●Tag: lightweight tag,.git/refs/tags/
●Remote: distant repository,.git/refs/remotes/
●Note: note attached to an object,.git/refs/notes/
●Replace: replacement of an object,.git/refs/replace/
Example of storing and retrieving a branch
# git update-ref refs/heads/master 37449e9554
# git rev-parse master37449e955443883a0a888ee100cfd0a7ba7927b3
# git reset --hard masterHEAD is now at 37449e9 first commit
# cat whatever.txtWhatever...
Result from previous examples
master
commit 37449e9554
tree 0625da548e
blob aa02989467
Commits in Git form a DAG (Directed Acyclic Graph)
● history direction is from left to right● new commits point to their parents
B
git bisect
● B introduces a bad behavior called "bug" or "regression"
● red commits are called "bad"● blue commits are called "good"
Problem when bisecting
Sometimes the commit that introduced a bug will be in an untestable area of the graph.
For example:
X X1 X2 X3W Y Z
Commit X introduced a breakage, later fixed by commit Y.
Possible solutions
Possible solutions to bisect anyway:●apply a patch before testing and remove it
afterwards (can be done using "git cherry-pick"), or
●create a fixed up branch (can be done with "git rebase -i"), for example:
X X1 X2 X3W Y Z
X + Y X1' X2' X3' Z'
Z1
A good solution
The idea is that we will replace Z with Z' so that we bisect from the beginning using the fixed up branch.
X X1 X2 X3W Y Z
X + Y X1' X2' X3' Z' Z1
$ git replace Z Z'
Grafts
Created mostly for projects like linux kernel with old repositories.
●“.git/info/grafts” file●each line describe parents of a
commit●<commit> <parent> [<parent>]*●this overrides the content in the
commit
Problem with Grafts
They are neither objects nor refs, so they cannot be easily transferred.
We need something that is either:
● an object, or● a ref
Solution, part 1: replace ref
● It is a ref in .git/refs/replace/● Its name is the SHA-1 of the
object that should be replaced.● It contains, so it points to, the
SHA-1 of the replacement object.
Solution, part 2: git replace
● git replace [ -f ] <object> <replacement>:to create a replace ref
● git replace -d <object>:to delete a replace ref
● git replace [ -l [ pattern ] ]:to list some replace refs
Replace ref transfer
●as with heads, tags, notes, remotes●except that there are no shortcuts and
you must be explicit●refspec: refs/replace/*:refs/replace/*●refspec can be configured (in .git/config),
or used on the command line (after git push/fetch <remote>)
Creating replacement objects
When it is needed the following commands can help:
●git rebase [ -i ]●git cherry-pick●git hash-object●git filter-branch
What can it be used for?
Create new views of your history.
Right now only 2 views are possible:
● the view with all the replace refs enabled●the view with all the replace refs disabled,
using --no-replace-objects or the GIT_NO_REPLACE_OBJECTS environment variable
Why new views?
● split old and new history or merge them●fix bugs to bisect on a clean history●fix mistakes in author, committer,
timestamps●remove big files to have something lighter
to use, when you don’t need them●prepare a repo cleanup●mask/unmask some steps●...
Limitations
●everything is still in the repo●so the repo is still big●there are probably bugs●confusing?●...
Current and future work
●a script to replace grafts●fix bugs●allow subdirectories in .git/refs/replace/●maybe allow “views” as set of active
subdirectories●...
Considerations
●best of both world: immutability and configurability of history
●no true view●history is important for freedom
Many thanks to:
●Junio Hamano (comments, help, discussions, reviews, improvements),
● Ingo Molnar,●Linus Torvalds,●many other great people in the Git and Linux
communities, especially: Andreas Ericsson, Johannes Schindelin, H. Peter Anvin, Daniel Barkalow, Bill Lear, John Hawley, ...
●OSDC/OWF organizers and attendants,●Murex the company I am working for.
Questions ?