I am trying to read data from GP and ingest to HDFS using Spark. I need an integer column to partition the data which I read from the GP table. The problem here is I don't have a primary column or any column that has unique values. In this scenario, column that I can rely on the most is the timestamp column where I can convert it to Integer/Long .
The data in timestamp column is present in the format:
select max(last_updated_timestamp) from schema.tablename => 2018-12-13 13:29:55
Could anyone let me know how can I cast the timestamp column including its milliseconds and produce an EPOCH value from it which I can use it in my spark code ?