0

I have a PySpark DataFrame with two numeric columns (integers, negative and positive) and when I try to select a random sample of it, it generates an error. This is the code I'm trying to run:

df_renamed_combined_df_clean_prueba = df_renamed_combined_df_clean.drop('inputdays_diff_2024', 'inputdays_diff_2023')

n = 50000
sample_df_prueba = df_renamed_combined_df_clean.orderBy(F.rand(seed=42)).limit(n)sample_df_prueba = sample_df_prueba.toPandas()
display(prueba.healimit(50))

And the error is:

DateTimeException: [CANNOT_PARSE_TIMESTAMP] Text '0' could not be parsed at index 0. Use `try_to_date` to tolerate invalid input string and return NULL instead. SQLSTATE: 22007

But I'm sure I don't have any datetime data in the PySpark DataFrame and I'm not trying to convert to it.

Thanks for your answers and help.

I have tried converting the integers columns to double or decimal, but it doesn't work.

2
  • what is the schema? df.printSchema() if you read the data without an explicit schema, pyspark may for some reason wrongly assume the column type for some reason. so maybe you can tell pyspark how to read your data when loading it spark.apache.org/docs/latest/api/python/reference/pyspark.sql/… Commented Jul 4 at 9:05
  • Thanks Philipp Steiner but it doesn't the problem because I've already printed the schema and there is not any timestamp data. I were careful of leave the columns as decimal and I tryed too with integers and double and not any of them worked. Yes i'm reading a file through pandas with dates to calculate durations in days with the main pyspark dataframe but, as soon as I calculate them I drop those columns and before I tryed to select the sample it shows me that error. Anyway let me know if you have others ideas. Thanks Commented Jul 4 at 22:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.