Why use cross validation?

I am entering several Kaggle Machine Learning competitions at the moment and I just have a quick question. Why do we use cross validation to assess our algorithms effectiveness in these competitions?

Surely in these competitions your score in the public leaderboard, where your algorithm is tested against actual live data would give you a more accurate representation of your algorithms efficacy?

Answers


Cross-validation is a necessary step in model construction. If cross-validation gives you poor results, there is no sense in even trying it on live data. Your set on which you are training and validating is also live data, isn't it? So, the results should be similar. Without validating your model you don't have any insight into its performance whatsoever. Models which give 100% accuracy on training set could give random results on validation set.

Let me re-iterate, cross-validation is not a replacement for live data test, it is a part of model construction process.


Need Your Help

How to create a patch without commit in Git

git patch

I did some search online. I know that you can use format-patch after commit, but my situation is a little different.

"java.security.AccessControlException: access denied" executing a signed Java Applet

java security browser applet accesscontrolexception

I have a little Java Applet and I have an annoying issue. I have signed my JAR with my own keystore using jarsigner tool (following these instructions).