java.lang.StackoverflowError when writing dataframe into Postgresql using JDBC

Question

I'm trying to write the result of multiple operations into an AWS Aurora PostgreSQL cluster. All the calculations performs right but, when I try to write the result into the database I get the next error:

py4j.protocol.Py4JJavaError: An error occurred while calling o12179.jdbc.
: java.lang.StackOverflowError
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)

I already tried to increase cluster size (15 r4.2xlarge machines), change number of partitions for the data to 120 partitions, change executor and driver memory to 4Gb each and I'm facing the same results.

The current SparkSession configuration is the next:

spark = pyspark.sql.SparkSession\
                   .builder\
                   .appName("profile")\
                   .config("spark.sql.shuffle.partitions", 120)\
                   .config("spark.executor.memory", "4g").config("spark.driver.memory", "4g")\
                   .getOrCreate()

I don't know if is a Spark configuration problem or if it's a programming problem.

Álvaro Paniagua Tena · Accepted Answer · 2019-09-30 07:52:37Z

1

Finally I found the problem.

The problem was an iterative read from S3 creating a really big DAG. I changed the way I read CSV files from S3 with the following instruction.

    df = spark.read\
              .format('csv')\
              .option('header', 'true')\
              .option('delimiter', ';')\
              .option('mode', 'DROPMALFORMED')\
              .option('inferSchema', 'true')\
              .load(list_paths)

Where list_paths is a precalculated list of paths to S3 objects.

answered Sep 30, 2019 at 7:52

Álvaro Paniagua Tena

193 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

java.lang.StackoverflowError when writing dataframe into Postgresql using JDBC

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related