new views on your history with git replace

Post on 10-May-2015

518 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Git has become the most popular version control system in the Open Source world, and more and more companies are also using it. The source code history when managed by Git is supposed to be immutable, because Git uses a content addressed database. The Git objects are indexed by their SHA-1 hash. When mistake have been made, or to make some history based features more useful or more reliable, though, it can be interesting to transform the Git source code history. To do that it is a good idea to use git replace.

TRANSCRIPT

New Views on your History with git replace

Christian Couder, Murexchriscool@tuxfamily.org

OSDC.fr 2013

October 5, 2013

About Git

A Distributed Version Control System (DVCS): ●created by Linus Torvalds●maintained by Junio Hamano●since 2005●prefered VCS among open source

developers

Git Design

Git is made of these things:

●“Objects”●“Refs”●config, indexes, logs, hooks,

grafts, packs, ...

Only “Objects” and “Refs” are transferred from one repository to another.

Git Objects

●Blob: content of a file●Tree: content of a directory●Commit: state of the whole source code●Tag: stamp on an object

Git Objects Storage

●Git Objects are stored in acontent addressable database.

●The key to retrieve each Object is theSHA-1 of the Object’s content.

●A SHA-1 is a 160-bit / 40-hex / 20-bytehash value which is considered unique.

blob size

/* content of this blob, it can be anything like an image, a video, ... but most of the time it is source code like:*/

#include <stdio.h>

int main(void){ printf("Hello world!\n"); return 0;}

SHA1: e8455...

Blob

blob = content of a file

Example of storing and retrieving a blob

# echo “Whatever…” | git hash-object -w --stdinaa02989467eea6d8e0bc68f3663de51767a9f5b1

# git cat-file -p aa02989467Whatever...

tree size

SHA1: 0de24...

blob

tree

hello.c

lib

e8455...

10af9...

Tree

tree = content of a directory

It can point to blobs and other trees.

Example of storing and retrieving a tree

# BLOB=aa02989467eea6d8e0bc68f3663de51767a9f5b1# (printf "100644 whatever.txt\0"; echo $BLOB | xxd -r -p)

| git hash-object -t tree -w --stdin0625da548ef0a7038c44b480f10d5550b2f2f962

# git cat-file -p 0625da548e100644 blob aa02989467... whatever.txt

commit size

SHA1: 98ca9...

parents

tree 0de24...

()

author Christian <timestamp>

committer

My commit message

Commit

commit = information about some changes

It points to one tree and 0 or more parents.

Christian <timestamp>

Example of storing and retrieving a commit (1)

# TREE=0625da548ef0a7038c44b480f10d5550b2f2f962# ME=”Christian Couder <chriscool@tuxfamily.org>”# DATE=$(date "+%s %z")# (echo -e "tree $TREE\nauthor $ME $DATE";

echo -e "committer $ME $DATE\n\nfirst commit")| git hash-object -t commit -w --stdin

37449e955443883a0a888ee100cfd0a7ba7927b3

Example of storing and retrieving a commit (2)

# git cat-file -p 37449e9554tree 0625da548ef0a7038c44b480f10d5550b2f2f962author Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200committer Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200

first commit

Git Objects Relations

Commit size

SHA1: e84c7...

parents

tree 29c43...

()

author Christian

committer ChristianInitial commit

Tree size

blob

tree

0de24...hello.c

doc 98ca9...

Commit size

SHA1: 98ca9...

parents

tree 5c11f...

(e84c7...)

author Arnaud

committer ArnaudChange hello.c

SHA1: 29c43...

Tree size

blob

blob

677f4...readme

install 23ae9...

SHA1: 98ca9...

Tree size

blob

tree

bc789...hello.c

doc 98ca9...

SHA1: 5c11f...

Blob size

SHA1: 0de24...

int main() { ... }

Blob size

SHA1: bc789...

int main(void) { ... }

Git Refs

●Head: branch,.git/refs/heads/

●Tag: lightweight tag,.git/refs/tags/

●Remote: distant repository,.git/refs/remotes/

●Note: note attached to an object,.git/refs/notes/

●Replace: replacement of an object,.git/refs/replace/

Example of storing and retrieving a branch

# git update-ref refs/heads/master 37449e9554

# git rev-parse master37449e955443883a0a888ee100cfd0a7ba7927b3

# git reset --hard masterHEAD is now at 37449e9 first commit

# cat whatever.txtWhatever...

Result from previous examples

master

commit 37449e9554

tree 0625da548e

blob aa02989467

Commits in Git form a DAG (Directed Acyclic Graph)

● history direction is from left to right● new commits point to their parents

B

git bisect

● B introduces a bad behavior called "bug" or "regression"

● red commits are called "bad"● blue commits are called "good"

Problem when bisecting

Sometimes the commit that introduced a bug will be in an untestable area of the graph.

For example:

X X1 X2 X3W Y Z

Commit X introduced a breakage, later fixed by commit Y.

Possible solutions

Possible solutions to bisect anyway:●apply a patch before testing and remove it

afterwards (can be done using "git cherry-pick"), or

●create a fixed up branch (can be done with "git rebase -i"), for example:

X X1 X2 X3W Y Z

X + Y X1' X2' X3' Z'

Z1

A good solution

The idea is that we will replace Z with Z' so that we bisect from the beginning using the fixed up branch.

X X1 X2 X3W Y Z

X + Y X1' X2' X3' Z' Z1

$ git replace Z Z'

Grafts

Created mostly for projects like linux kernel with old repositories.

●“.git/info/grafts” file●each line describe parents of a

commit●<commit> <parent> [<parent>]*●this overrides the content in the

commit

Problem with Grafts

They are neither objects nor refs, so they cannot be easily transferred.

We need something that is either:

● an object, or● a ref

Solution, part 1: replace ref

● It is a ref in .git/refs/replace/● Its name is the SHA-1 of the

object that should be replaced.● It contains, so it points to, the

SHA-1 of the replacement object.

Solution, part 2: git replace

● git replace [ -f ] <object> <replacement>:to create a replace ref

● git replace -d <object>:to delete a replace ref

● git replace [ -l [ pattern ] ]:to list some replace refs

Replace ref transfer

●as with heads, tags, notes, remotes●except that there are no shortcuts and

you must be explicit●refspec: refs/replace/*:refs/replace/*●refspec can be configured (in .git/config),

or used on the command line (after git push/fetch <remote>)

Creating replacement objects

When it is needed the following commands can help:

●git rebase [ -i ]●git cherry-pick●git hash-object●git filter-branch

What can it be used for?

Create new views of your history.

Right now only 2 views are possible:

● the view with all the replace refs enabled●the view with all the replace refs disabled,

using --no-replace-objects or the GIT_NO_REPLACE_OBJECTS environment variable

Why new views?

● split old and new history or merge them●fix bugs to bisect on a clean history●fix mistakes in author, committer,

timestamps●remove big files to have something lighter

to use, when you don’t need them●prepare a repo cleanup●mask/unmask some steps●...

Limitations

●everything is still in the repo●so the repo is still big●there are probably bugs●confusing?●...

Current and future work

●a script to replace grafts●fix bugs●allow subdirectories in .git/refs/replace/●maybe allow “views” as set of active

subdirectories●...

Considerations

●best of both world: immutability and configurability of history

●no true view●history is important for freedom

Many thanks to:

●Junio Hamano (comments, help, discussions, reviews, improvements),

● Ingo Molnar,●Linus Torvalds,●many other great people in the Git and Linux

communities, especially: Andreas Ericsson, Johannes Schindelin, H. Peter Anvin, Daniel Barkalow, Bill Lear, John Hawley, ...

●OSDC/OWF organizers and attendants,●Murex the company I am working for.

Questions ?

top related