Priority queue which can be stored on disk?
I need to implement a application which has a priority queue which has more than 100M records. My problem is I am not able to keep all this data in memory, so I need to store it on disk. Is there any cache solution where I can store all this information to disk?
I think that you can solve this by using a B-tree with a few minor modifications.
B-trees are specifically designed to store sorted elements on-disk in a way that minimizes the number of disk reads necessary to locate any element. Because they store their elements in sorted order, you can use them as priority queues by performing insertions as normal and finding the minimum element by taking the leftmost element in the tree (i.e. the first element of the leftmost leaf node).
In a B-tree of order d, you can find the minimum element using O(logd n) disk reads and writes, where n is the total number of elements. Insertions and deletions will also only require O(logd n) disk reads and writes.
However, you can significantly speed this up by storing a pointer to the leftmost leaf node in the B-tree. This node will store the minimum key plus other keys close to the minimum. If you have this pointer, you can look up the minimum value in a single disk read by taking the first element in the node. This also speeds up the extract-min operation: you can delete the key directly from that node without having to search for it. There may be some B-tree rebalancing operations necessary to make this work, though you can show that these operations happen so infrequently that the amortized work to do a deletion is only O(1).
In other words, using a B-tree with a pointer to the leftmost leaf has the following time complexities in terms of disk reads and writes:
- find-min: O(1)
- insert: O(logd n)
- extract-min: O(1) amortized
Hope this helps!