# Non colliding hash algorithm for strings up to 255 characters

I am looking for a hash-algorithm, to create as close to a unique hash of a string (max len = 255) as possible, that produces a long integer (DWORD).

I realize that 26^255 >> 2^32, but also know that the number of words in the English language is far less than 2^32.

The strings I need to 'hash' would be mostly single words or some simple construct using two or three words.

**The answer**:

One of the FNV variants should meet your requirements. They're fast, and produce fairly evenly distributed outputs. (Answered by Arachnid)

## Answers

See here for a previous iteration of this question (and the answer).

One technique is to use a well-known hash algorithm (say, MD5 or SHA-1) and use only the first 32 bits of the result.

Be aware that the risk of hash collisions increases faster than you might expect. For information on this, read about the Birthday Paradox.

Ronny Pfannschmidt did a test with common english words yesterday and hasn't encountered any collisions for the 10000 words he tested in the Python string hash function. I haven't tested it myself, but that algorithm is very simple and fast, and seems to be optimized for common words.

Here the implementation:

static long string_hash(PyStringObject *a) { register Py_ssize_t len; register unsigned char *p; register long x; if (a->ob_shash != -1) return a->ob_shash; len = Py_SIZE(a); p = (unsigned char *) a->ob_sval; x = *p << 7; while (--len >= 0) x = (1000003*x) ^ *p++; x ^= Py_SIZE(a); if (x == -1) x = -2; a->ob_shash = x; return x; }

H(key) = [GetHash(key) + 1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))] % hashsize

Java's String.hash() can be easily viewed here, its algorithm is

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]