0

I'm trying to write the result of multiple operations into an AWS Aurora PostgreSQL cluster. All the calculations performs right but, when I try to write the result into the database I get the next error:

py4j.protocol.Py4JJavaError: An error occurred while calling o12179.jdbc.
: java.lang.StackOverflowError
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)

I already tried to increase cluster size (15 r4.2xlarge machines), change number of partitions for the data to 120 partitions, change executor and driver memory to 4Gb each and I'm facing the same results.

The current SparkSession configuration is the next:

spark = pyspark.sql.SparkSession\
                   .builder\
                   .appName("profile")\
                   .config("spark.sql.shuffle.partitions", 120)\
                   .config("spark.executor.memory", "4g").config("spark.driver.memory", "4g")\
                   .getOrCreate()

I don't know if is a Spark configuration problem or if it's a programming problem.

1 Answer 1

1

Finally I found the problem.

The problem was an iterative read from S3 creating a really big DAG. I changed the way I read CSV files from S3 with the following instruction.

    df = spark.read\
              .format('csv')\
              .option('header', 'true')\
              .option('delimiter', ';')\
              .option('mode', 'DROPMALFORMED')\
              .option('inferSchema', 'true')\
              .load(list_paths)

Where list_paths is a precalculated list of paths to S3 objects.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.