Mysql Performance Question - Essentially about normalizing efficiency
Just a quick question about database performance. I'll outline my site purpose below as background.
I'm creating a dictionary site that saves the words users define to a database. What I'm wondering is whether or not to create a words table for each user or to keep one massive words table. This site will be used for entire schools so the single words table would be massive!
The database structure is as follows:
A user table with:
- User_ID PRIMARY KEY,
And one word table with:
- User_ID PRIMARY KEY
So what I'm asking is , performance-wise, should I create a new table for each user when they join the site - each user could have hundreds or thousands of words over time? Or is it better to have one massive table with thousands and thousands of records and filter by User_ID. I don't think I'll perform many table joins.
My gut feeling is to create a new table for each user, but I thought I'd ask for expert advice! Thanks in advance.
I think you should go with one table for all users and the user_id.
There are not so much words in any language. By so much I understand over couple of millions. Database are just working fine with 1-2 million of records, and you won't reach that level soon considering that all words in English count over 170.000
With very large data sets, you may get better performance by storing the dictionary words in a separate table for each user.
However, if you wanted to run queries against all the words, for example, for statistical analysis, it's going to be tough to write a query to access everyone's words.
You could store all of the words in the same table, then if performance becomes a problem, you could always partition the table, hashing on the user id. Look up "partitioning" for MySQL. It basically stores the data in separate files for you, but lets you keep all the data in the same logical table, so it remains easy to query against and stays in normal form.
As long as you index the words on user_id, performance isn't likely to degrade for quite some time and your application may never reach that threshold.
From a development perspective, you will save hours of time by keeping things simple and storing all of the words in the same table. Since you have a workaround for the future, in case of performance problems, keep it simple and get the project completed with minimal effort.
Performance-wise, rely on indices. If some set of columns prefix the key of an index, the index can generally be used to get the rows without scanning the table. Some queries won't use the index (such as if a column appears in only some branches of an AND), but those queries don't include simply finding words for a given user; additionally, those queries will be much harder with a table for each user.
CREATE TABLE users ( id INT PRIMARY KEY AUTO_INCREMENT, username VARCHAR(32) UNIQUE, first VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci, last VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ... ) Engine=InnoDB; -- table of english words CREATE TABLE vocabulary ( id INT PRIMARY KEY AUTO_INCREMENT, user_id INT, word VARCHAR(45), ... -- searches for words of a given user should use `user_word` UNIQUE INDEX user_word (user_id, word), INDEX (word), FOREIGN KEY user (user_id) REFERENCES users (id) ON DELETE CASCADE ON UPDATE CASCADE ) Engine=InnoDB CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Rather than first and last columns, we could have surname, given_name and middle_names columns, since not every culture puts the given name first. Of course, we'd then need to record the order of names to be printed. Another option is to have columns for the full name and given name.
The word column is 45 characters to allow for the longest word in English, the constructed word "pneumonoultramicroscopicsilicovolcanoconiosis". For German words, we'd want at least 63 chars. "Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz" was actually used, and not simply an attempt for the longest word. Given the nature of German, trying to find a longest word length is an exercise in futility; better to arbitrarily pick one. The limits on key sizes (3072 bytes in MySQL 5.0.17 and later, 1023 in MySQL 5.0.15 and earlier) puts an upper limit of 3066 (1018 in 5.0.15) bytes on the size of word, which is 3066 (1018) characters in the latin1_german1_ci (dictionary collation) and 1022 (339) characters in UTF-8.