Non colliding hash algorithm for strings up to 255 characters

I am looking for a hash-algorithm, to create as close to a unique hash of a string (max len = 255) as possible, that produces a long integer (DWORD).

I realize that 26^255 >> 2^32, but also know that the number of words in the English language is far less than 2^32.

The strings I need to 'hash' would be mostly single words or some simple construct using two or three words.


The answer:

One of the FNV variants should meet your requirements. They're fast, and produce fairly evenly distributed outputs. (Answered by Arachnid)


Answers


See here for a previous iteration of this question (and the answer).


One technique is to use a well-known hash algorithm (say, MD5 or SHA-1) and use only the first 32 bits of the result.

Be aware that the risk of hash collisions increases faster than you might expect. For information on this, read about the Birthday Paradox.


Ronny Pfannschmidt did a test with common english words yesterday and hasn't encountered any collisions for the 10000 words he tested in the Python string hash function. I haven't tested it myself, but that algorithm is very simple and fast, and seems to be optimized for common words.

Here the implementation:

static long
string_hash(PyStringObject *a)
{
    register Py_ssize_t len;
    register unsigned char *p;
    register long x;

    if (a->ob_shash != -1)
        return a->ob_shash;
    len = Py_SIZE(a);
    p = (unsigned char *) a->ob_sval;
    x = *p << 7;
    while (--len >= 0)
        x = (1000003*x) ^ *p++;
    x ^= Py_SIZE(a);
    if (x == -1)
        x = -2;
    a->ob_shash = x;
    return x;
}

H(key) = [GetHash(key) + 1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))] % hashsize

MSDN article on HashCodes


Java's String.hash() can be easily viewed here, its algorithm is

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Need Your Help

TFS Alert Based On A Tag

tfs2013 tagging

I'm stuck and can't find info anywhere - does anyone know if there is a way to create a TFS Alert based on a tag placed on Workitem? Or should I just build an SSRS report based on the tables?

Camera Rotation in SceneKit

ios iphone camera rotation scenekit

I have posted a similar question to this