generate 16M unique random numbers
I'm trying to generate 16 000 000 unique random numbers (10-digits: range 1 000 000 000 - 9 999 999 999) and insert them into an empty table (or fill this table up if not empty).
CREATE TABLE `codes` ( `code_id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT, `code` bigint(20) UNSIGNED NOT NULL, `is_used` tinyint(1) NOT NULL DEFAULT '0', PRIMARY KEY (`code_id`), UNIQUE KEY `code` (`code`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
...and the function:
DELIMITER $$ CREATE DEFINER=`root`@`localhost` FUNCTION `codes`(`minRange` BIGINT UNSIGNED, `maxRange` BIGINT UNSIGNED, `_amount` BIGINT UNSIGNED) RETURNS tinyint(1) MODIFIES SQL DATA BEGIN DECLARE pick BIGINT; while (SELECT count(*) FROM codes) < _amount do begin SET pick = minRange + FLOOR(RAND() * (maxRange - minRange + 1)); INSERT IGNORE INTO codes (code) VALUES (pick); end; end while; RETURN 1; END$$ DELIMITER ; -- call: SELECT codes(1000000000,9999999999,16000000);
The function is extremaly slow: generating 20k rows takes 2.5 min., so 16M - about 33 hours... Is there any way to optimise it?
What you are asking for is an oxymoron. If the 16M numbers are unique then they aren't random. Think about it: a truly 10 digit random number has a prob of 1/9E9 being any given number. Yet your 16M'th number has 0 prob of being one of 15,999,999 numbers and a 1/983,000,001 prob of being one of the remaining ones. You should always expect duplicates. There are roughly 1% for 16M out of 1B.
So my suggestion is to generate (16M + a bit) random numbers, do a unique sort; truncate to 16M then random sort them. My Q&D to do this:
php -r 'for( $i=0;$i<16500000;$i++) echo mt_rand(100000000, 999999999),"\n";'\ | sort -u | head -16000000 | sort -R > /tmp/loadfile.lst
took just over 7 mins on my 4 year old laptop. It will be a lot faster on a modern quad-core desktop. I used PHP-CLI because I have it on my laptop and the mt_rand routine is a good one. You could do this as a 3 liner-C prog or any language to hand. As Zercms says, the load will be quick as long as disable indexes before the load and reenable after. You also only need to load the column code.
The fastest way I can think of is to generate plain text file with unique numbers and use LOAD DATA INFILE
Also you need to disable keys using ALTER TABLE before you start loading the data, and enable after that
You are running a SELECT count(*) for each random number inserted. A simple optimization would be to just try to cram 16000000 random numbers without checking, then see how many were actually inserted, repeat from the start with the number of remaining numbers to add.
Maybe you should try to generate your numbers not via MySQL but via PHP if you have it installed. The PHP mt_rand() does generate much better random values and you can fill your values anto an array and directly check if it is already in ther via in_array().
Also, you MySQL is slowed down a lot by having to update the table index every time, maybe you should disable it for the time of the number generation if you need to do it in MySQL.