I have a PySpark DataFrame with two numeric columns (integers, negative and positive) and when I try to select a random sample of it, it generates an error. This is the code I'm trying to run:
df_renamed_combined_df_clean_prueba = df_renamed_combined_df_clean.drop('inputdays_diff_2024', 'inputdays_diff_2023')
n = 50000
sample_df_prueba = df_renamed_combined_df_clean.orderBy(F.rand(seed=42)).limit(n)sample_df_prueba = sample_df_prueba.toPandas()
display(prueba.healimit(50))
And the error is:
DateTimeException: [CANNOT_PARSE_TIMESTAMP] Text '0' could not be parsed at index 0. Use `try_to_date` to tolerate invalid input string and return NULL instead. SQLSTATE: 22007
But I'm sure I don't have any datetime data in the PySpark DataFrame and I'm not trying to convert to it.
Thanks for your answers and help.
I have tried converting the integers columns to double or decimal, but it doesn't work.