I have two columns with date in the YYMMDD format and a time in the HHMMSS format, they are strings like 150103 132244. These are close to a quarter of a billion records. What would be the best way to sanitize the data prior to importing to PostgreSQL? Is there a way to do this while importing, for instance?
1 Answer
Your data can be converted to timestamp with time zone using the function to_timestamp():
with example(d, t) as (
values ('150103', '132244')
)
select d, t, to_timestamp(concat(d, t), 'yymmddhh24miss')
from example;
d | t | to_timestamp
--------+--------+------------------------
150103 | 132244 | 2015-01-03 13:22:44+01
(1 row)
You can import a file into a table with temporary columns (d, t):
create table example(d text, t text);
copy example from ....
add a timestamp with time zone column, convert the data and drop redundant text columns:
alter table example add tstamp_column timestamptz;
update example
set tstamp_column = to_timestamp(concat(d, t), 'yymmddhh24miss');
alter table example drop d, drop t;
COPY FROMintotextcolumns and then re-format the fields once in the DB. But there's any number of scripting languages that could prepare the file first if you preferred.