Probability of collision when using a 32 bit hash

I have a 10 character string key field in a database. I've used CRC32 to hash this field but I'm worry about duplicates. Could somebody show me the probability of collision in this situation?

p.s. my string field is unique in the database. If the number of string fields is 1 million, what is probability of collision ?

Answers


Duplicate of Expected collisions for perfect 32bit crc

The answer referenced this article: http://arstechnica.com/civis/viewtopic.php?f=20&t=149670

Found the image below from: http://preshing.com/20110504/hash-collision-probabilities


In the case you cite, at least one collision is essentially guaranteed. The probability of at least one collision is about 1 - 3x10-51. The average number of collisions you would expect is about 116.

In general, the average number of collisions in k samples, each a random choice among n possible values is:

The probability of at least one collision is:

In your case, n = 232 and k = 106.

The probability of a three-way collision in your case is about 0.01. See the Birthday Problem.


Need Your Help

Implementing retry logic for deadlock exceptions

c# entity-framework try-catch repository-pattern database-deadlocks

I've implemented a generic repository and was wondering if there is a smart way to implement a retry logic in case of a deadlock exception?

Is it wrong to use the hand cursor for clickable items such as buttons?

user-interface cursor desktop-application user-experience design-decisions

I've always thought the hand cursor to be the ideal visual indicator for "you may click here" to the user. We are used to see it in this context daily because of it's usage on hyperlinks and hence ...