How do I choose the best SQL query if there are different ways of accomplishing the same task?
I'm learning SQL (using SQLite 3 and its sqlite3 command-line tool) and I've noticed that I can do some things in several ways, and sometimes it is not clear which one is better. Here are three queries which do the same thing, one executed through intersect, another through inner join and distinct, the last one similar to the second one but it incorporates filtering through where. (The first one was written by the author of the book I'm reading, and the others I wrote myself.)
The question is, which of these queries is better and why? And, more generally, how can I know when one query is better than another? Are there some guidelines I missed or perhaps I should learn SQLite internals despite the declarative nature of SQL?
(In the following example, there are tables that describe food names that are mentioned in some TV series. Foods_episodes is many-to-many linking table while others describe food names and episode names together with season number. Note that all-time ten top foods (based on the count of their appearances in all series) are being looked for, not just top foods in seasons 3..5)
-- task -- find the all-time top ten foods that appear in seasons 3 through 5 -- schema -- CREATE TABLE episodes ( -- id integer primary key, -- season int, -- name text ); -- CREATE TABLE foods( -- id integer primary key, -- name text ); -- CREATE TABLE foods_episodes( -- food_id integer, -- episode_id integer ); select f.* from foods f inner join (select food_id, count(food_id) as count from foods_episodes group by food_id order by count(food_id) desc limit 10) top_foods on f.id=top_foods.food_id intersect select f.* from foods f inner join foods_episodes fe on f.id = fe.food_id inner join episodes e on fe.episode_id = e.id where e.season between 3 and 5 order by f.name; select distinct f.* from foods_episodes as fe inner join episodes as e on e.id = fe.episode_id inner join foods as f on fe.food_id = f.id inner join (select food_id from foods_episodes group by food_id order by count(*) desc limit 10) as lol on lol.food_id = fe.food_id where e.season between 3 and 5 order by f.name; select distinct f.* from foods_episodes as fe inner join episodes as e on e.id = fe.episode_id inner join foods as f on fe.food_id = f.id where fe.food_id in (select food_id from foods_episodes group by food_id order by count(*) desc limit 10) and e.season between 3 and 5 order by f.name; -- output (same for these thee): -- id name -- ---------- ---------- -- 4 Bear Claws -- 146 Decaf Capp -- 153 Hennigen's -- 55 Kasha -- 94 Ketchup -- 164 Naya Water -- 317 Pizza -- CPU Time: user 0.000000 sys 0.000000
Similar to MySQL, it looks like SQLlite has an EXPLAIN command. Prepend your select with the EXPLAIN keyword and it will return information about the query, including the number of rows scanned, and the indexes used.
By running EXPLAIN on various selects you can determine which queries (and sub-queries) are more efficient than others.
And here is a general overview of SQLlite's query planner and optimization: http://sqlite.org/optoverview.html
SQLlite3 also supports a callback function to trace queries. You have to implement it though: http://www.sqlite.org/c3ref/profile.html
Generally, there is more than 1 way to solve a problem. If you are getting correct answers, the only other question is whether the process/script/statement needs to be improved, or if it works well now.
In SQL generally, there may be a "best' way, but it's usually not the goal to find a canonical best way to do something - you want a way that efficiently blances your needs from the program, and your time. You can spend months optimizing a process, but if the process is used only weekly, and it only takes 5 minutes now, reducing it to 4 minutes isn't much help.
it's weird to transition from a context where there are correct answers (like school,) to a context where the goal is to get something done well, and works well enough trumps perfect, because there are time constraints. It's something that took me a while to appreciate, but I'm not sure there is a better answer. Hope the perspective helps a bit!