Google BigQuery Underlying Architecture
So I just started messing around with Google BigQuery about 10 minutes ago, and I was wondering if anyone is aware of the underlying architecture that they're using to store the data? For example, is this just the next generation of their own BigTable infrastructure?
Also, is it clear what sorts of strategies they're using for indexes, index rebuilds, etc? I'm just trying to analyze whether this is mature enough at this point where you can be 100% sure of what's going on with your data end-to-end, or is there a bit of a black box area where "things just work"?
There are no indexes... every query is a table scan. The query architecture is described here. Your data is stored in a proprietary columnar format called ColumnIO on Colossus (a successor to GFS). Colossus replicates the data within a datacenter and your data is also replicated to other geographic regions to make sure it stays available even if a Google datacenter goes offline.
To answer your specific questions
- While data may be temporarily stored in Bigtable, all data is stored long-term in Colossus (for now!).
- New data added to bigquery is encrypted at rest (that is, whenever it is written out to permanent storage). It is also encrypted when sent over the network.
- As mentioned, no indexes, so there are no strategies for rebuilding the index. Depending on how you add data to your table, your table may be coalesced, which means rewriting the underlying files in a more efficient manner.
- Colossus underlies a massive amount of Google data across a wide range of services, ColumnIO is a standard throughout Google. I would call both of these technologies mature.
- However, you should also consider it a black box. All of the details here may change as storage systems at Google mature or architectures change. However, it should always "just work" (within SLA caveats, of course)
If you're interested in more details about how BigQuery works under the covers or how to use it effectively, here is a shameless plug for our book on the subject which is due out in June.