running mahout RecommenderJob on EMR
I'm trying to run a RecommenderJob on amazon EMR. I have a jar called SmartJukebox.jar (not runnable) and it contains a class main.TrackRecommander (and that's it).
I created a job flow with the jar:
main.TrackRecommander --input s3n://smartjukebox/ratings.csv --output s3n://smartjukebox/output --usersFile s3n://smartjukebox/user.txt.
The class TrackRecommander uses the class RecommenderJob.
I run the job flow and i get this in the error log -
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/cf/taste/hadoop/item/RecommenderJob at main.TrackRecommander.main(TrackRecommander.java:136) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.item.RecommenderJob at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 6 more
now i see that the JVM can't find RecommenderJob and i didn't put RecommenderJob in my jar. I thought EMR would have mahout jars built in, but i can't find anything about that.
what is the solution here?
You're problem is exactly what you say: "I didn't put RecommenderJob in my jar." Unless you put those classes in your JAR, of course it can't be found. Why would EMR have this built in? Add the Mahout ".job" file classes to your JAR first.
You will need to create a job jar which contains all the classes required by the code to run which includes the mahout classes too. Take a look at https://github.com/tdunning/MiA
Check how to create a job jar using maven assembly plugin in pom.xml and the job.xml in the src/main/resources directory. IF you exclude the hadoop classes then you can run it on any hadoop instance.