Which Data Mining Algorithm is the best?
Long time listener, first time caller.
I'm a full time SE during the day and a full time data mining student at night. I've taken the courses, and heard what our professors think. Now, I come to you - the stackoverflowers, to bring out the real truth.
What is your favorite data mining algorithm and why? Are there any special techniques you've used that have helped you to be successful in the past?
Most of my professional experience involved last-minute feature additions like, "Hey, we should add a recommendation system to this e-Commerce site." The solution was usually a quick and dirty nearest neighbor search - brute force, euclidean distance, doomed to fail if the site ever became popular. But hey, premature optimization and all that...
I do enjoy the idea that data mining can be elegant and wonderful. I've followed the Netflix Prize and played with its dataset. In particular, I like the fact the imagination and experimentation have played such a large part in developing the top ten entries:
- Acmehill blog
- Acmehill New York Times article
- Just a guy in a garage blog
- Just a guy in a garage Wired article
So mostly, like a lot of software dev, I think the best algorithm is an open mind and some creativity.
There is a lot of data mining algorithms for different tasks so I found it a little bit hard to choose.
It would say that my favorite data mining algorithm is Apriori because it has inspired hundred of other algorithms and it has several applications. The Apriori algorithm in itself is quite simple. But it has laid the basis for many other algorithms (FPGrowth, PrefixSpan, etc.) that use the so called "Apriori property".