How to add a git repo as a submodule of itself? (Or: How to generate GitHub Pages programmatically?)
I want to start using GitHub Pages for my project's website. This simply requires a branch (subtree) named gh-pages in the repo, and serves up its content. The problem is that part of the website (manual, changelog, download page...) is auto-generated by the build system, so I want to find the best way to commit these changes to the gh-pages branch while the main repo remains on master (or wherever).
To commit to the gh-pages branch, I could write a script that clones the repo into a temporary directory, makes the modifications, commits them, and then pushes them back to the main repo. But this sounds like an error-prone process, so I'm hoping there is an easier way.
A friend suggested that I might add the gh-pages branch as a submodule to the main repository. I ran a little experiment, but it doesn't quite work:
$ git init main Initialized empty Git repository in /tmp/main/.git/ $ cd main $ touch main.txt $ git add . $ git commit -m'Initial commit in main branch.' [master (root-commit) 1c52a4e] Initial commit in main branch. 0 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 main.txt $ git symbolic-ref HEAD refs/heads/gh-pages $ rm .git/index $ git clean -fdx Removing main.txt $ touch index.html $ git add . $ git commit -m'Initial commit in website branch.' [gh-pages (root-commit) 94b10f2] Initial commit in website branch. 0 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 index.html $ git checkout master Switched to branch 'master' $ git submodule add -b gh-pages . gh-pages repo URL: '.' must be absolute or begin with ./|../ $ git submodule add -b gh-pages ./ gh-pages remote (origin) does not have a url defined in .git/config
I'm new to submodules; have done some reading, of course, but I don't understand this behaviour. Why does it need an origin remote? Ideally, I want the submodule to always reference the repo that it resides in, so it should not reference origin or any other remotes. If somebody clones the repo and runs git submodule init ; git submodule update, it should ideally pull from the newly cloned repo.
Is it possible to add a repo as a submodule of itself? Is it desirable? Are there any pitfalls that I need to be aware of? Is there a better way to achieve what I want?
In this case, the behaviour seems to be that git is trying to set the origin of the original repository to be the origin of the submodule. This is confirmed by the git submodule man page, which says [my emphasis]:
<repository> is the URL of the new submodule’s origin repository. This may be either an absolute URL, or (if it begins with ./ or ../), the location relative to the superproject’s origin repository.
A workaround that seems fine for me is to do the following:
# Define origin to be the absolute path to this repository - we'll remove # this later: $ cd /tmp/main/ $ git remote add origin /tmp/main/ # Now add the submodule: $ git submodule add -b gh-pages ./ gh-pages Initialized empty Git repository in /tmp/main/gh-pages/.git/ Branch gh-pages set up to track remote branch gh-pages from origin. # Now .gitmodules looks sensible: $ cat .gitmodules [submodule "gh-pages"] path = gh-pages url = ./ # However, the origin for the submodule isn't what we want: $ cd gh-pages $ git remote -v origin /tmp/main/ (fetch) origin /tmp/main/ (push) # So remove it and add the right origin (just ".."): $ git remote rm origin $ git remote add origin .. # Change back to the main repository and commit: $ cd .. $ git commit -m "Added the gh-pages branch as a submodule of this repository" [master 6849d53] Added the gh-pages branch as a submodule of this repository 2 files changed, 4 insertions(+), 0 deletions(-) create mode 100644 .gitmodules create mode 160000 gh-pages
This seems to work OK - if I change into another directory and do:
$ cd /var/tmp $ git clone --recursive /tmp/main/
... the submodule is updated and initialized correctly. (Update: although as you point out in a comment below, origin in the submodule will be set to the URL you cloned from rather than ..)
As for whether this is a good idea or not: I've worked on a project which used a similar setup in the past and which subsequently abandoned it. The reasons for this, however, were (a) that the alternative branches in the main repository were huge and bloated the repository even for people who didn't need the submodule and (b) that it caused confusion for people who weren't sure what was going on.
For your use case, however, I think it's a rather neat solution :)
An alternative to using Git Submodules to generate GitHub Pages is to use Git Subtree Merge Strategy. There are many sites that show how to do this and that argue the pros and cons of Submodules vs Subtree-Merge. There is even a newish git-subtree command that may or may not be installed with your version of Git. IMO the only things you really need to know are these two points.
The subtree merge strategy matches the trees (git's notion of a directory tree) of two repositories/branches when merging so that extraneous files & folders are not merged, only the relevant trees. This is exactly what you want for Github Pages, since it is in an orphan branch, it has a completely different tree your master branch.
In general, the subtree merge has a simplified workflow and less chance for losing revisions than submodules do.
Here's how to use subtree merge strategy with Github Pages:
If you don't have a branch called gh-pages in either local or remote repos, then create one using the --orphan flag so that it will be empty. Github has instructions for creating Github pages manually.. If you used the Automatic Page Generation then you can can skip this step, but replace the local branch gh-pages with the remote branch origin/gh-pages everywhere else in this post, otherwise fetch the remote branch locally. NOTE: You can skip creating the .nojekyll file, but you must remove all files from the orphan branch and commit it or it will not be created.
. $ (master) git checkout --orphan gh-pages . $ (gh-pages) git rm -rf. . $ (gh-pages) echo >> .nojekyll . $ (gh-pages) git add .nojekyll . $ (gh-pages) git commit -m "create github pages, ignore jekyll"
If you have documentation in a sub tree in your main branch already you could pull it in and commit it using git-read-tree right now, but you would have to know tree-ish. Presumably you could first use git-write-tree which will output the SHA-1 of the tree named by the --prefix flag in the current index. Then use the -u flag to update the gh-pages branch with changes form the main branch and commit the changes.
. $ (master) git write-tree --prefix=docs/_build/html master abcdefghijklmnopqrstuvwxyz1234567890abcd . $ (master) git checkout gh-pages . $ (gh-pages) git read-tree abcdefghijklmnopqrstuvwxyz1234567890abcd . $ (gh-pages) git commit -m "update gh-pages html from master docs"
Checkout master and use git-read-tree to copy the working copy of the gh-pages branch to some path in master, EG: ./docs/_build/html. The -u flag updates files in the working copy if merge is successful. This step may be unnecessary if there are no files in gh-pages branch that you want to merge back with master, but if there are, it may help the subtree merge strategy to figure out in what tree your files are. As usual, Git won't let you merge over files that already exist or if there are uncommitted changes in your repo. Use this step if you want to merge the pages you created using Automatic Page Generation back into a different tree, EG: docs in your master branch. Don't forget to commit the new files to your master branch.
. $ (gh-pages) git checkout master . $ (master) git read-tree --prefix=docs/_build/html -u gh-pages . $ (master) git commit -m "read gh-pages tree into master at ./docs/_build/html"
Make changes to your documentation and generate some html by whatever means you prefer. EG: Jekyll, Pelican or Sphinx. NOTE: If you are not using Jekyll, and will need underscore folders/files, EG: for *.css or *.js files, then be sure to add a file called .nojekyll to your html directory.
./docs $ (master) sphinx-quickstart ... ./docs $ (master) make html ./docs/_build/html $ (master) echo >> .nojekyll
Update your gh-pages branch using the subtree merge strategy (-s subtree), squash all of the commits so your Github pages history isn't polluted (--squash) and wait till after the merge to commit so you can review (--no-commit). NOTE: When you checkout your gh-pages branch, files & folders from master will probably remain as Untracked, just ignore them and concentrate on what is actually in the index. NOTE: Git will not checkout gh-pages if there are any uncommited or unstashed modifications in master.
. $ (master) git checkout origin/gh-pages . $ (gh-pages) git merge --no-commit --squash -s subtree master
Git makes its best guess as to what trees you want to merge using the subtree merge strategy, however, if there isn't much to go on, you might be better explicitly telling Git which tree to merge.
. $ (gh-pages) git merge --no-commit --squash -s recursive -Xsubtree=docs/_build/html/ master
Review your changes and commit. Merge generates a message for you containing the short log of all of the commits being merged.
. $ (gh-pages) git commit
Pushing your gh-pages branch deploys your GitHub Pages website.
. $ (gh-pages) git push origin gh-pages
Return to master.
. $ (gh-pages) git checkout master
If you need to pull changes from your gh-pages for whatever reason, use the subtree merge strategy in the opposite direction. EG git merge --squash --no-commit -s subtree gh-pages
To do a diff of the two branches matching the trees, use diff-tree
. $ git diff-tree master gh-pages
Put this into a script or a post-commit hook that runs whenever you edit your documents or add it to the Makefile you use to generate html and voila! programmatically generated GitHub pages.