How large can a hbase table actually grow?
Would there be any reason to split a hbase table into smaller entities, or can it grow forever (assuming available disk space)?
We have realtime data (measurements), up to lets say 500,000/s, which consists essentially of timestamp, value, flags. If we distribute the values to different tables, it would also mean to insert each of the entries individually, which is a performance killer. If we insert in bulk it is much faster. The question is, are there any downsides to have a hbase table with an extreme size?
I don't see the point in manually splitting an HBase table, HBase does this on his own and extremely well (which called HBase table regions)
HBase has been made to handle extremely large data, so I like to believe that the limit depends on your hardware only (of course so configurations might impact performance such as automatic major compaction etc...)
There could be a strong reason behind splitting a table, which is avoiding RegionServer hotspotting, by distributing the load across multiple RegionServers. HBase, by virtue of its nature, stores rows sequentially at one place. Rows with similar keys go to the same server(timeseries data, for example). This is to facilitate better range queries. However, this starts becoming a bottleneck once your data grows too big(and your disk still has space).
In cases like above data will continue to go to the same RegionServer, leading to hotspotting. So, we split tables manually to distribute the data uniformly across the cluster.