DateTimeException: [CANNOT_PARSE_TIMESTAMP] in PySpark dataframe without timestamp data

I have a PySpark DataFrame with two numeric columns (integers, negative and positive) and when I try to select a random sample of it, it generates an error. This is the code I'm trying to run:

df_renamed_combined_df_clean_prueba = df_renamed_combined_df_clean.drop('inputdays_diff_2024', 'inputdays_diff_2023')

n = 50000
sample_df_prueba = df_renamed_combined_df_clean.orderBy(F.rand(seed=42)).limit(n)sample_df_prueba = sample_df_prueba.toPandas()
display(prueba.healimit(50))

And the error is:

DateTimeException: [CANNOT_PARSE_TIMESTAMP] Text '0' could not be parsed at index 0. Use `try_to_date` to tolerate invalid input string and return NULL instead. SQLSTATE: 22007

But I'm sure I don't have any datetime data in the PySpark DataFrame and I'm not trying to convert to it.

Thanks for your answers and help.

I have tried converting the integers columns to double or decimal, but it doesn't work.

edited Jul 4 at 8:36

Programmer.zip

8103 gold badges9 silver badges24 bronze badges

asked Jul 3 at 16:03

Carlos Andrés Rodríguez

what is the schema? df.printSchema() if you read the data without an explicit schema, pyspark may for some reason wrongly assume the column type for some reason. so maybe you can tell pyspark how to read your data when loading it spark.apache.org/docs/latest/api/python/reference/pyspark.sql/…

Philipp Steiner
– Philipp Steiner

2025-07-04 09:05:24 +00:00
Commented Jul 4 at 9:05
Thanks Philipp Steiner but it doesn't the problem because I've already printed the schema and there is not any timestamp data. I were careful of leave the columns as decimal and I tryed too with integers and double and not any of them worked. Yes i'm reading a file through pandas with dates to calculate durations in days with the main pyspark dataframe but, as soon as I calculate them I drop those columns and before I tryed to select the sample it shows me that error. Anyway let me know if you have others ideas. Thanks

Carlos Andrés Rodríguez
– Carlos Andrés Rodríguez

2025-07-04 22:29:49 +00:00
Commented Jul 4 at 22:29

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

DateTimeException: [CANNOT_PARSE_TIMESTAMP] in PySpark dataframe without timestamp data

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest