Search tree with canonical layout

I'm looking for a search tree that can be organized in a canonical form.

Please excuse me if I'm using the term "canonical" wrong here. What I mean is that; given a set of items to be stored in the tree, the resulting node structure will be the same no matter what operations were made to make those items to end up in there.

I don't need a structure that always provide this feature; it is sufficient if there would be a "make it canonical please"-feature.

I've currently created a basic AVL tree implementation in C#.

When I add seven items (1-7) to it, by appending them, I end up with the following nodes:

              4
           /     \
        2          6
      /   \      /   \
    1      3    5     7

If I instead first add items 1, 2, 0, 3, 4, 5, 6, 0, 7, by appending them, and then remove those two zeroes I end up with the following nodes instead:

           3
         /   \
        2     5
       /    /   \
      1    4     7      
                /
               6

Enumerating the items in those two trees yield the same expected result. But the node structure differ, which is what I'd like to avoid.

I understand that I could implement the "please make it canonical"-feature by just creating a new tree from scratch. But that doesn't scale.

I'd also be happy to trade the canonical property for speed in cases when getting to the canonical form would require too much effort.


Why "canonical"?

I'm pursuing an idea where I'm going to chunk up the tree using a content-based chunking algorithm (TTTD) and then store those chunks in an immutable blob store.

With this approach some child nodes would be inline: stored in the same chunk as the parent. Other nodes would be external: referenced by a content-based address (SHA-1 hash).

When two subtrees have the same content (incl. structure) they would have the same address. This property is useful for many reasons; including:

  • efficiently computing the difference between two subtrees.
  • caching.
  • synchronization.

Answers


You can use a trie, which has a maximum update and search time of the bit-length of your keys, or you can use A. Andersson and Th. Ottmann. "Faster Uniquely Represented Dictionaries", which has a maximum update and search time of \Theta(n^{1/3}).


Need Your Help

Any DI Frameworks that convert from string to constructor Types?

c# .net dependency-injection

I'm just wondering if there is a DI Framework for .net that handles string-to-some type conversion for me?

Entity Framework - Unable to create a constant value of type

c# linq entity-framework

I've read other questions about this but I can't seem to figure it..