I would like to convert a datetime string to timestamp in dask cudf and then sort the dataframe by this column.
Example:
import dask_cudf as ddf
import pandas as pd
# Sample data (replace with your actual data)
cdf = cudf.DataFrame({
'city': ['Dallas', 'Bogota', 'Chicago', 'Juarez'],
'timestamp': ['2019-12-29 14:15:08 UTC', '2019-12-30 10:30:15 UTC', '2019-12-31 18:45:30 UTC', '2020-01-01 03:20:45 UTC']
})
# Create a Dask-cuDF DataFrame
dask_df = ddf.from_cudf(cdf, npartitions=2)
def to_timestamp(x):
import time
import datetime
element = datetime.datetime.strptime(x,"%Y-%m-%d %H:%M:%S UTC")
return datetime.datetime.timestamp(element)
dask_df['timestamp'] = dask_df['timestamp'].map_partitions(to_timestamp, meta=("timestamp", "str"))
dask_df.head()
I got error:
TypeError: strptime() argument 1 must be str, not Series
How can I do this for large dataframe on dask cudf ?
==========update ==========
I have tried this:
dask_df["timestamp"] = dask_df["timestamp"].map_partitions(to_timestamp, meta=("timestamp", "str"))
and got error:
TypeError: strptime() argument 1 must be str, not Series