Database Design Pattern for Multiple Large Lists

Consider a trip itinerary. There are 20 possible stops on a tour. A standard tour involves stops 1 through 20 in order. However, each user may create their own tour consisting of 5 or more stops in any order with possibility for repeats. What is the most efficient way to model this in a database?

If we use a join table user_id, stop_id, order we would have millions of records very quickly but we could easily pull the stop & user attributes on queries.

If we stored the stops as an array, user_id, stop_id_array_in_order we have a much smaller, non-normalized table and we cannot easily access the stop attributes.

Are there other options that allow for accessing of parent attributes while minimizing table size?

Answers


You're thinking that saving some space will help you. It won't. It's also arguable how much space you'd actually save.

You'd also be using an unordered data structure - that's something you don't want. You want ordered structure (table) which can relate to other records - and that's exactly the reason why we normalize tables - so we can extrapolate all kinds of data without altering physical location. The other benefit is that ordered structures can be indexed and we can reduce the amount of time finding the records. Tradeoff is spending space to keep the index records.

However, millions, billions - even trillions of rows are ok. Just imagine how difficult it would be querying a structure where an array is saved as a comma separated list in a column (or multiple columns). It would be a nightmare to write a query, and performance would go down linearly as amount of records goes up.

TL;DR: keep it normalized.


I would define the entities and create tables for them with the relations between them in separate tables as you described in the first example:

users table
tours table
stops table
tours_users table (a User can go to a Tour more than once)
stops_order table: stop_id, order, tours_users_id

For querying the tables, for any user you want to check their tour you can achieve this with the tours_users table , if the stops needs to be retrieved , you can easily join the tours_users table with the stops_order table through the tours_users_id.

If the tables are indexed correctly, there should be no problem with performance and you will be using the relational database engine as you supposed to.


Need Your Help

Meteor method got triggered multiple times

meteor

I have a fairly simple Meteor application.

How to track a user clicking on a link to a document

python html sqlite flask

I'm trying to keep track of when my user clicks on a link to a pdf document. I want to store this click as a view in an already existing views table in my sqlite database but I am unsure how I can