Database: flat vs nested data

I would like to store data in a database that can be laid out nested like

[
{
  id: 'deadbeef',
  url: 'https://lol.cat/1234',
  revisions: [
    {
      id: '1',
      title: 'foo',
      authors: ['lol', 'cat'],
      content: 'yadda yadda',
      // ...
    },
    {
      id: '2',
      title: 'foo',
      authors: ['lol', 'cat'],
      content: 'yadda yadda bla',
      // ...
    },
    // ...   
  ]
},
// ...
]

(One can imagine more levels here.)

Alternatively, the same data could be organized flat like

[
{
  documentId: 'deadbeef',
  url: 'https://lol.cat/1234',
  id: '1',
  title: 'foo',
  authors: ['lol', 'cat'],
  content: 'yadda yadda',
  // ...
},
{
  documentId: 'deadbeef',
  url: 'https://lol.cat/1234',
  id: '2',
  title: 'foo',
  authors: ['lol', 'cat'],
  content: 'yadda yadda bla',
  // ...
},
// ...
]

with basically only the leaves of the approach above stored, along with all the information belonging to them.

Typical requests would be:

  • Give all revisions of document deadbeef.
  • Give me revision 6 of document caffee.

Is either one of the approaches obviously better? What are advantages/disadvantages of either approach?

Answers


Your second schema is a denormalized version of the first. It might be useful to compare a more relational approach:

{
    documents: [
        {
            id: 'deadbeef',
            url: 'https://lol.cat/1234',
            // ...
        },
        // ...
    ],
    revisions: [
        {
            id: '1',
            documentId: 'deadbeef'
            title: 'foo',
            authors: ['lol', 'cat'],
            content: 'yadda yadda',
            // ...
        },
        {
            id: '2',
            documentId: 'deadbeef',
            title: 'foo',
            authors: ['lol', 'cat'],
            content: 'yadda yadda bla',
            // ...
        },
        // ...
    ]
}

The nested approach suffers from a problem called access path dependence. Basically, by assuming a preferred hierarchy, it makes queries that require a different hierarchy more difficult.

The denormalized version can suffer from update anomalies, which means partial updates can put the database into an inconsistent state.

The relational approach, on the other hand, doesn't favor any hierarchy, thereby supporting ad-hoc querying, and normalization helps to eliminate update anomalies. RDBMSs also incorporate numerous integrity checks and constraints to ensure the validity of data.


Need Your Help

minimal bounding box of a clipped point cloud

algorithm language-agnostic geometry 2d bounding-box

I am trying to find the minimal bounding box of a 2d point cloud, where only a part of the point cloud is visible.

Automatic mysql backup on xampp windows 8?

mysql xampp backup

I recently got in problem, my hard disk got crashed!