Spark job deployment failure to cloudera
I am using guice architecture upon developing my spark strreaming program. It can run in my eclipse without any error. However, after compiling and deployed with spark-submit command, it returns an error:
After googling through, I noticed that this error only appears if we are using guice 3.0. But I am using guice 4.0. My spark version is 1.5.2, and my cloudera version is 5.3.2. Is there any work around on this error?
Unfortunately for you, Spark v1.5.2 depends on com.google.inject:guice:3.0.
So I suspect that what is happening is that that your project is pulling both:
- Guice 4.0 (as a direct dependency stated in your dependencies file like pom.xml or build.sbt); and
- Guice 3.0 (a transitive dependency pulled by Spark v1.5.2)
Basically your classpath ends up being a mess, and depending on the way classes are loaded by the classloader at runtime you will (or will not) experience such kind of errors.
You will have to use the already provided version of Guice (pulled by Spark) or start juggling with classloaders.
Indeed the org.apache.spark:spark-core_2.10:1.5.2 pulls com.google.inject:guice:3.0 :
+-org.apache.spark:spark-core_2.10:1.5.2 [S] + ... ... +-org.apache.hadoop:hadoop-client:2.2.0 | +-org.apache.hadoop:hadoop-mapreduce-client-app:2.2.0 | | +-com.google.protobuf:protobuf-java:2.5.0 | | +-org.apache.hadoop:hadoop-mapreduce-client-common:2.2.0 | | | +-com.google.protobuf:protobuf-java:2.5.0 | | | +-org.apache.hadoop:hadoop-mapreduce-client-core:2.2.0 | | | | +-com.google.protobuf:protobuf-java:2.5.0 | | | | +-org.apache.hadoop:hadoop-yarn-common:2.2.0 (VIA PARENT org.apache.hadoop:hadoop-yarn:2.2.0 and then VIA ITS PARENT org.apache.hadoop:hadoop-project:2.2.0) | | | | | +-com.google.inject:guice:3.0 ...
The spark-core pom.xml is here.
The hadoop-yarn-common pom.xml is here.
The hadoop-yarn pom.xml is here.
The hadoop-project pom.xml is here.