How do I handle duplicate entries in a single column database?
A user may enter keywords into a text field and separate the keys using comma. So the input may be bananas, apple, orange, pineapple.
In my database, I have a table called keyword, and it has only one column keyword which also is the primary key.
I add the keywords to the database, by $myArray = expload(',', $keywords). Then I loop through the array and do a `INSERT INTO myTable'.
Now, if the keyword already exists, I will get an error message.
I can overcome the error message by using the INSERT IGNORE INTO statement. If the record is a duplicate, the IGNORE keyword tells MySQL to discard it silently without generating an error.
My question is: Is this a good way of doing it? Or should I first check to see if the keyword exists? I'm kind of thinking two queries vs one. And will this affect server load?
insert ignore is great, it kills 2 birds with one stone.
insert ignore is theoretically non-standard sql, but it's still very useful. if you ever need to go with some other storage engine, you can ameliorate those minor things if such an event ever comes to pass... no need to go through hoops to pre-port your code in this case.
Although I would not discourage using INSERT IGNORE out of principal reasons in this case, I think there are a few practical considerations here.
I think it will be much faster if you'd deduplicate the array before sending it. I looks like you're using PHP - you could use array_unique() to deduplicate. (See http://nl2.php.net/manual/en/function.array-unique.php)
The benefit of this is:
- No need for INSERT IGNORE - your solution will work for other databases as well
- Less statements to generate, less statements to send, less statements for MySQL to parse - faster.
- For typical web apps, database actions are slower than PHP processing, so it doesn't hurt either.
Another thing I would recommend is to generate one big INSERT statement like so:
$sql = INSERT INTO tab (keyword) VALUES ('word1'), ('word2'), ...
Contrary to what many people think this is standard SQL. The benefit of this is that you have one roundtrip, - way less overhead
The INSERT IGNORE is perfectly fine. The only other syntax I know would be the REPLACE INTO, but it is meant for another use case (it deletes and inserts the row if it already exists, which is probably slower).
Using two queries is a bad idea, because the SELECT query might take a long time to find the row you are looking for.
just taking a blind shot at this...you might want to look at other responses:
When a DBMS system provides an option like "ignore" you have mentioned, they would most probably have optimized it for better performance than the "check and insert" alternative you have mentioned as this is the only alternate approach against which, "ignore" would have to compete and if it is not any better than "check and insert" it does not make any sense to go for "ignore". from my experience, when there is a feature provided by dbms, it is almost always better to use it versus a manual equivalent..at the very least, it minimizes latency of your transaction and at best, it is often heavily optimized internally for better performance...but again, things cannot be generalized so broadly and would have to be examined on a case-to-case basis...this is just to be taken as a guideline