Database Design Pattern for Multiple Large Lists
Consider a trip itinerary. There are 20 possible stops on a tour. A standard tour involves stops 1 through 20 in order. However, each user may create their own tour consisting of 5 or more stops in any order with possibility for repeats. What is the most efficient way to model this in a database?
If we use a join table user_id, stop_id, order we would have millions of records very quickly but we could easily pull the stop & user attributes on queries.
If we stored the stops as an array, user_id, stop_id_array_in_order we have a much smaller, non-normalized table and we cannot easily access the stop attributes.
Are there other options that allow for accessing of parent attributes while minimizing table size?
You're thinking that saving some space will help you. It won't. It's also arguable how much space you'd actually save.
You'd also be using an unordered data structure - that's something you don't want. You want ordered structure (table) which can relate to other records - and that's exactly the reason why we normalize tables - so we can extrapolate all kinds of data without altering physical location. The other benefit is that ordered structures can be indexed and we can reduce the amount of time finding the records. Tradeoff is spending space to keep the index records.
However, millions, billions - even trillions of rows are ok. Just imagine how difficult it would be querying a structure where an array is saved as a comma separated list in a column (or multiple columns). It would be a nightmare to write a query, and performance would go down linearly as amount of records goes up.
TL;DR: keep it normalized.
I would define the entities and create tables for them with the relations between them in separate tables as you described in the first example:
users table tours table stops table tours_users table (a User can go to a Tour more than once) stops_order table: stop_id, order, tours_users_id
For querying the tables, for any user you want to check their tour you can achieve this with the tours_users table , if the stops needs to be retrieved , you can easily join the tours_users table with the stops_order table through the tours_users_id.
If the tables are indexed correctly, there should be no problem with performance and you will be using the relational database engine as you supposed to.