Best Collection for Fast String Lookup

I need a list of strings and a way to quickly determine if a string is contained within that list.

To enhance lookup speed, I considered SortedList and Dictionary; however, both work with KeyValuePairs when all I need is a single string.

I know I could use a KeyValuePair and simply ignore the Value portion. But I do prefer to be efficient and am just wondering if there is a collection better suited to my requirements.

Answers


If you're on .NET 3.5 or higher, use HashSet<String>.

Failing that, a Dictionary<string, byte> (or whatever type you want for the TValue type parameter) would be faster than a SortedList if you have a lot of entries - the latter will use a binary search, so it'll be O(log n) lookup, instead of O(1).


If you just want to know if a string is in the set use HashSet<string>


This sounds like a job for

 var keys = new HashSet<string>();

Per MSDN: The Contains function has O(1) complexity.

But you should be aware that it does not give an error for duplicates when adding.


HashSet<string> is like a Dictionary, but with only keys.


If you feel like rolling your own data structure, use a Trie. http://en.wikipedia.org/wiki/Trie

worst-case is if the string is present: O(length of string)


I know this answer is a bit late to this party, but I was running into an issue where our systems were running slow. After profiling we found out there was a LOT of string lookups happening with the way we had our data structures structured.

So we did some research, came across these benchmarks, did our own tests, and have switched over to using SortedList now.

if (sortedlist.ContainsKey(thekey))
{   
//found it.
}

Even though a Dictionary proved to be faster, it was less code we had to refactor, and the performance increase was good enough for us.

Anyway, wanted to share the website in case other people are running into similar issues. They do comparisons between data structures where the string you're looking for is a "key" (like HashTable, Dictionary, etc) or in a "value" (List, Array, or in a Dictionary, etc) which is where ours are stored.


I know the question is old as hell, but I just had to solve the same problem, only for a very small set of strings(between 2 and 4).

In my case, I actually used manual lookup over an array of strings which turned up to be much faster than HashSet<string>(I benchmarked it).

for (int i = 0; i < this.propertiesToIgnore.Length; i++)
{
    if (this.propertiesToIgnore[i].Equals(propertyName))
    {
        return true;
    }
}

Note, that it is better than hash set for only for tiny arrays!

EDIT: works only with a manual for loop, do not use LINQ, details in comments


Need Your Help

How to get the day of the week from a Unix timestamp (PHP)?

php unix-timestamp weekday

How do I get the day (1-7) from a Unix timestamp in PHP? I also need the day date (1-31) and month (1-12).

How to pass a class type as a function parameter

swift generics type-inference

I have a generic function that calls a web service and serialize the JSON response back to an object.