What a git SHA depends on?
I was wondering on what all the parameters a git SHA depends on ? I am guessing there would be some other parameters like timestamp etc., besides content of the commit, on which construction of SHA depends on.
I am interested in all such parameters on which this depends on. I am also interested in situation when all such parameters would be same, or enforced to be same resulting in exactly same git SHA of any two commits made by two persons on this planet.
- The tree (all the files and directories) ID which is made up of...
- The content of all the files, not the diff, called a blob.
- The directory tree (names of files and directories and how they're organized).
- The permissions of all the files and directories.
- The parent commit ID(s).
- The log message.
- The committer name and email and date.
- The author name and email date.
If you change just about anything about the commit the commit ID changes.
Including the parent commit IDs is very important. It means two commits with exactly the same content, but built on different parents, will still have different IDs. Why would you do that? It means if the ID of two commits are the same you know their entire history is the same. This makes it very efficient to compare and update Git repositories. "I have branch foo at commit ABC123, you do too? Great, we're in sync!"
When comparing Git to other version control systems, remember that in many popular "reliable" systems, like Subversion or CVS, anyone with the file permissions can go in and undetectably change history in the central repository. With Git such tampering will be immediately detected because it will change all the downstream commit IDs, or if they brute force matched the IDs the content would be complete nonsense.
The possibility of a SHA1 collision possibility has already been considered. Long story short, in a conflict the existing object wins.
The probability of a SHA1 collision happening accidentally is so vanishingly small, I hope your asteroid, cosmic ray, and wolf attack insurances are paid up.
If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (3.6 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.
Seriously, there are better things to worry about, like the 1 in 100 chance of a drive failure. How are your backups?
There are several different types of objects stored in the Git repository. A blob object stores the raw data of a file and the tree object stores the file mode (e.g. whether it is read-only), object type and name.
You can find more details in the Git Community Book.
There are so many hash values that the chances of accidental collision are vanishingly small.
However, truly identical content will have an identical hash: so if two people independently make identical changes to a file then the two (identical) blob objects will have the same hash; the commit objects will be different and will have different hashes, but both commits will refer to the same blob hash. If those two commits are later merged, only one copy of the blob will remain (which is fine because the content is identical).