How do I choose the best SQL query if there are different ways of accomplishing the same task?

I'm learning SQL (using SQLite 3 and its sqlite3 command-line tool) and I've noticed that I can do some things in several ways, and sometimes it is not clear which one is better. Here are three queries which do the same thing, one executed through intersect, another through inner join and distinct, the last one similar to the second one but it incorporates filtering through where. (The first one was written by the author of the book I'm reading, and the others I wrote myself.)

The question is, which of these queries is better and why? And, more generally, how can I know when one query is better than another? Are there some guidelines I missed or perhaps I should learn SQLite internals despite the declarative nature of SQL?

(In the following example, there are tables that describe food names that are mentioned in some TV series. Foods_episodes is many-to-many linking table while others describe food names and episode names together with season number. Note that all-time ten top foods (based on the count of their appearances in all series) are being looked for, not just top foods in seasons 3..5)

-- task
--     find the all-time top ten foods that appear in seasons 3 through 5
-- schema
--     CREATE TABLE episodes (
--       id integer primary key,
--       season int,
--       name text );
--     CREATE TABLE foods(
--       id integer primary key,
--       name text );
--     CREATE TABLE foods_episodes(
--       food_id integer,
--       episode_id integer );



select f.* from foods f 
inner join 
    (select food_id, count(food_id) as count 
        from foods_episodes 
        group by food_id 
        order by count(food_id) desc limit 10) top_foods 
    on f.id=top_foods.food_id 
intersect 
select f.* from foods f 
    inner join foods_episodes fe on f.id = fe.food_id 
    inner join episodes e on fe.episode_id = e.id 
where
    e.season between 3 and 5 
order by
    f.name; 



select
    distinct f.*
from
    foods_episodes as fe
    inner join episodes as e on e.id = fe.episode_id
    inner join foods as f on fe.food_id = f.id
    inner join (select food_id from foods_episodes
        group by food_id order by count(*) desc limit 10) as lol
        on lol.food_id = fe.food_id
where
    e.season between 3 and 5
order by 
    f.name;



select
    distinct f.*
from
    foods_episodes as fe
    inner join episodes as e on e.id = fe.episode_id
    inner join foods as f on fe.food_id = f.id
where
    fe.food_id in (select food_id from foods_episodes
        group by food_id order by count(*) desc limit 10)
    and e.season between 3 and 5
order by 
    f.name;

-- output (same for these thee):

-- id          name      
-- ----------  ----------
-- 4           Bear Claws
-- 146         Decaf Capp
-- 153         Hennigen's
-- 55          Kasha     
-- 94          Ketchup   
-- 164         Naya Water
-- 317         Pizza     
-- CPU Time: user 0.000000 sys 0.000000

Answers


Similar to MySQL, it looks like SQLlite has an EXPLAIN command. Prepend your select with the EXPLAIN keyword and it will return information about the query, including the number of rows scanned, and the indexes used.

http://www.sqlite.org/lang_explain.html

By running EXPLAIN on various selects you can determine which queries (and sub-queries) are more efficient than others.

And here is a general overview of SQLlite's query planner and optimization: http://sqlite.org/optoverview.html

SQLlite3 also supports a callback function to trace queries. You have to implement it though: http://www.sqlite.org/c3ref/profile.html


Generally, there is more than 1 way to solve a problem. If you are getting correct answers, the only other question is whether the process/script/statement needs to be improved, or if it works well now.

In SQL generally, there may be a "best' way, but it's usually not the goal to find a canonical best way to do something - you want a way that efficiently blances your needs from the program, and your time. You can spend months optimizing a process, but if the process is used only weekly, and it only takes 5 minutes now, reducing it to 4 minutes isn't much help.

it's weird to transition from a context where there are correct answers (like school,) to a context where the goal is to get something done well, and works well enough trumps perfect, because there are time constraints. It's something that took me a while to appreciate, but I'm not sure there is a better answer. Hope the perspective helps a bit!


Need Your Help

CodeIgniter form_dropdown selected value

php codeigniter

I have a form using CI's form_dropdown that pulls an array from DB....

Migrating Eclipse Android Junit Tests into Android Studio Gradle

eclipse junit gradle android-studio

We are trying to migrate our Eclipse projects using ant builds into Android Studio using gradle. So far all is good except for our JUnit tests that use external json files. We have a ton of these...