0

I have a spark scala program which loads a jar I wrote in java. From that jar a static function is called, which tried to read a serialized object from a file (Pattern.class), but throws a java.lang.ClassNotFoundException. Running the spark program locally works, but on the cluster workers it doesn't. It's especially weird because before I try to read from the file, I instantiate a Pattern object and there are no problems.

I am sure that the Pattern objects I wrote in the file are the same as the Pattern objects I am trying to read.

I've checked the jar in the slave machine and the Pattern class is there.

Does anyone have any idea what the problem might be ? I can add more detail if it's needed.

This is the Pattern class

public class Pattern implements Serializable {
private static final long serialVersionUID = 588249593084959064L;

public static enum RelationPatternType {NONE, LEFT, RIGHT, BOTH};
RelationPatternType type;
String entity;
String pattern;
List<Token> tokens;
Relation relation = null;

public Pattern(RelationPatternType type, String entity, List<Token> tokens, Relation relation) {
    this.type = type;
    this.entity = entity;
    this.tokens = tokens;
    this.relation = relation;
    if (this.tokens != null)
        this.pattern = StringUtils.join(" ", this.tokens.toString());
}

}

I am reading the file from S3 the following way:

AmazonS3 s3Client = new AmazonS3Client(credentials);
S3Object confidentPatternsObject = s3Client.getObject(new GetObjectRequest("xxx","confidentPatterns"));
objectData = confidentPatternsObject.getObjectContent();
ois = new ObjectInputStream(objectData);
confidentPatterns = (Map<Pattern, Tuple2<Integer, Integer>>) ois.readObject();

LE: I checked the classpath at runtime and the path to the jar was not there. I added it for the executors but I still have the same problem. I don't think that was it, as I have the Pattern class inside the jar that is calling the readObject function.

7
  • Do you have your class in the classpath in case of a cluster? Commented May 7, 2016 at 18:41
  • Pattern is your own class right? it is not the one from the JDK? Commented May 7, 2016 at 18:42
  • It's in the jar that is run in the cluster, so I presume it should have access to it Commented May 7, 2016 at 18:42
  • And yeah, Pattern is my own class Commented May 7, 2016 at 18:42
  • maybe you could show us more, for example your class Pattern and how you serialize and deserialize it Commented May 7, 2016 at 18:44

1 Answer 1

0

Would suggest this adding this kind method to find out the classpath resources before call, to make sure that everything is fine from caller's point of view

public static void printClassPathResources() {
        final ClassLoader cl = ClassLoader.getSystemClassLoader();
        final URL[] urls = ((URLClassLoader) cl).getURLs();
        LOG.info("Print All Class path resources under currently running class");
        for (final URL url : urls) {
            LOG.info(url.getFile());
        }

    }
  • This is sample configuration spark 1.5

--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \ --conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*" \ --conf "spark.executor.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')

  • As described by this Trouble shooting guide :Class Not Found: Classpath Issues Another common issue is seeing class not defined when compiling Spark programs this is a slightly confusing topic because spark is actually running several JVM’s when it executes your process and the path must be correct for each of them. Usually this comes down to correctly passing around dependencies to the executors. Make sure that when running you include a fat Jar containing all of your dependencies, (I recommend using sbt assembly) in the SparkConf object used to make your Spark Context. You should end up writing a line like this in your spark application:

val conf = new SparkConf().setAppName(appName).setJars(Seq(System.getProperty("user.dir") + "/target/scala-2.10/sparktest.jar"))

This should fix the vast majority of class not found problems. Another option is to place your dependencies on the default classpath on all of the worker nodes in the cluster. This way you won’t have to pass around a large jar.

The only other major issue with class not found issues stems from different versions of the libraries in use. For example if you don’t use identical versions of the common libraries in your application and in the spark server you will end up with classpath issues. This can occur when you compile against one version of a library (like Spark 1.1.0) and then attempt to run against a cluster with a different or out of date version (like Spark 0.9.2). Make sure that you are matching your library versions to whatever is being loaded onto executor classpaths. A common example of this would be compiling against an alpha build of the Spark Cassandra Connector then attempting to run using classpath references to an older version.

Sign up to request clarification or add additional context in comments.

8 Comments

I did output the classpath using this. The path to the jar was no there. I added it, and I still have the same problem.
Can you paste your spark-submit command here ? what classpath options you are using like driverClassPath , executorClassPath etc...
~/spark/bin/spark-submit --jars /root/work/project-1.0-SNAPSHOT.jar --class peoplegraph.Main --driver-memory 50g pipeline-1.0.jar and I set the classpath like this: conf.set("spark.executor.extraClassPath", "./")
please see trouble shooting guide added in the answer which can help you
@Tomy: Any progress on this. was it helpful ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.