How does 'git log --graph' or 'hg graphlog' work?
I know that the history in Git is stored in a data structure called a DAG. I've heard about DFS and know it's somewhat related.
I'm curious, how do programs such as git log --graph or hg graphlog draw the history? I always thought it's quite complicated to draw the lanes and everything in such a nice way.
Could someone write some pseudo code that demonstrates it?
note: I tried looking around Git or hg's code but it's very hard to follow and get a general idea of what's going on.
First, one obtains a list of commits (as with git rev-list), and parents of each commit. A "column reservation list" is kept in memory.
For each commit then:
- If the commit has no column reserved for it, assign it to a free column. This is how the branch heads will start.
- Print the tree graphics according to the column reservation list, and then the commit message
- The reservation's list entry for the current column/commit is updated with the first parent of the current commit, such that the parent is going to be printed in the same column.
- Other parents get a new free column.
- If this was a merge, the next line will try to link the second parent to a column where the commit is expected (this makes for the loops and the "≡ bridge")
Example showing output of git-forest on aufs2-util with an extra commit to have more than one branch).
With lookahead, one can anticipate how far down the merge point will be and squeeze the wood between two columns to give a more aesthetically pleasing result.
I tried looking around Git or hg's code but it's very hard to follow and get a general idea of what's going on.
For hg, did you try to follow the code in hg itself, or in graphlog?
Because the code of graphlog is pretty short. You can find it in hgext/graphlog.py, and really the important part is the top ~200 lines, the rest is the extension's bootstrapping and finding the revision graph selected. The code generation function is ascii, with its last parameter being the result of a call to asciiedge (the call itself is performed on the last line of generate, the function being provided to generate by graphlog)
This particular problem isn't that hard, compared to graph display in general. Because you want to keep the nodes in the order they were committed the problem gets much simpler.
Also note that the display model is grid based, rows are commits and columns are edges into the past/future.
While I didn't read the git source you probably just walk the list of commits, starting from the newest, and maintain a list of open edges into the past. Following the edges naturally leads to splitting/merging columns and you end up with the kind of tree git/hg display.
When merging edges you want to avoid crossing other edges, so you'll have to try to order your columns ahead of time. This is actally the only part that may not be straightforward. For example one could do a two-pass algorithm, making up a column order for the edges in the first pass and doing the drawing in the second pass.
Note: Git 2.18 (Q2 2018) does now precompute and store information necessary for ancestry traversal in a separate file to optimize graph walking.
That notion of commits graph does change how 'git log --graph' does work.
As mentioned here:
git config --global core.commitGraph true git config --global gc.writeCommitGraph true cd /path/to/repo git commit-graph write
See commit 7547b95, commit 3d5df01, commit 049d51a, commit 177722b, commit 4f2542b, commit 1b70dfd, commit 2a2e32b (10 Apr 2018), and commit f237c8b, commit 08fd81c, commit 4ce58ee, commit ae30d7b, commit b84f767, commit cfe8321, commit f2af9f5 (02 Apr 2018) by Derrick Stolee (derrickstolee). (Merged by Junio C Hamano -- gitster -- in commit b10edb2, 08 May 2018)
You now have the command git commit-graph: Write and verify Git commit graph files.
Write a commit graph file based on the commits found in packfiles. Includes all commits from the existing commit graph file.
The design document states:
Git walks the commit graph for many reasons, including:
- Listing and filtering commit history.
- Computing merge bases.
These operations can become slow as the commit count grows. The merge base calculation shows up in many user-facing commands, such as 'merge-base' or 'status' and can take minutes to compute depending on history shape.
There are two main costs here:
- Decompressing and parsing commits.
- Walking the entire graph to satisfy topological order constraints.
The commit graph file is a supplemental data structure that accelerates commit graph walks. If a user downgrades or disables the 'core.commitGraph' config setting, then the existing ODB is sufficient.
The file is stored as "commit-graph" either in the .git/objects/info directory or in the info directory of an alternate.
The commit graph file stores the commit graph structure along with some extra metadata to speed up graph walks. By listing commit OIDs in lexicographic order, we can identify an integer position for each commit and refer to the parents of a commit using those integer positions. We use binary search to find initial commits and then use the integer positions for fast lookups during the walk.
You can see the test use cases:
git log --oneline $BRANCH git log --topo-order $BRANCH git log --graph $COMPARE..$BRANCH git branch -vv git merge-base -a $BRANCH $COMPARE
This will improve git log performance.
Git 2.19 (Q3 2018) will take care of the lock file:
See commit 33286dc (10 May 2018), commit 1472978, commit 7adf526, commit 04bc8d1, commit d7c1ec3, commit f9b8908, commit 819807b, commit e2838d8, commit 3afc679, commit 3258c66 (01 May 2018), and commit 83073cc, commit 8fb572a (25 Apr 2018) by Derrick Stolee (derrickstolee). Helped-by: Jeff King (peff). (Merged by Junio C Hamano -- gitster -- in commit a856e7d, 25 Jun 2018)
commit-graph: fix UX issue when .lock file exists
We use the lockfile API to avoid multiple Git processes from writing to the commit-graph file in the .git/objects/info directory. In some cases, this directory may not exist, so we check for its existence.
The existing code does the following when acquiring the lock:
- Try to acquire the lock.
- If it fails, try to create the .git/object/info directory.
- Try to acquire the lock, failing if necessary.
The problem is that if the lockfile exists, then the mkdir fails, giving an error that doesn't help the user:
"fatal: cannot mkdir .git/objects/info: File exists"
While technically this honors the lockfile, it does not help the user.
Instead, do the following:
- Check for existence of .git/objects/info; create if necessary.
- Try to acquire the lock, failing if necessary.
The new output looks like:
fatal: Unable to create '<dir>/.git/objects/info/commit-graph.lock': File exists.
Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue.
Note: The commit-graph facility did not work when in-core objects that are promoted from unknown type to commit (e.g. a commit that is accessed via a tag that refers to it) were involved, which has been corrected with Git 2.21 (Feb. 2019)