Books on string algorithms
There have been numerous posts on string algorithms:
- Algorithm to find articles with similar text,
- Similar String algorithm,
- Efficient string matching algorithm
However, no general literature was mentioned.
Could anyone recommend a book(s) that would thoroughly explore various string algorithms? The topic which is of special interest is approximate string matching [things like google-offered corrected search string variants :) ].
Thanks a lot for advice.
I'm surprised no-one has mentioned Dan Gusfield's excellent book Algorithms on Strings, Trees and Sequences which covers string algorithms in more detail than anyone would probably need. It served me very well for a project on protein sequencing that I was working on a few years ago. After reading this book you will learn:
- Naive String Matching
- Preprocessor Based algorithms (Boyer Moore, Knuth-Morris-Pratt)
- Regex matching algorithms
- Karp-Rabin and similar methods
- Suffix tree methods (Ukkonen's method, etc)
- Sequence alignment (Levenshtein distance and string similarity, and multiple sequence alignment)
- Applications to DNA sequencing, gene prediction and other areas.
This is not a book recommendation, but this library and site is a library that offers plenty of efficient string matching algorithm implementations:
It also provides links to further learning for each and where each is best applicable.
CLR has some string processing algorithms, but it's not specific to them.
TRE is an open-source library that implements approximate matching. The About page has some interesting hints about how it works, although I'm not sure it provides the sort of in-depth analysis you're looking for. The source code is probably more enlightening from that perspective.