0

I have two columns with date in the YYMMDD format and a time in the HHMMSS format, they are strings like 150103 132244. These are close to a quarter of a billion records. What would be the best way to sanitize the data prior to importing to PostgreSQL? Is there a way to do this while importing, for instance?

2
  • 1
    It depends what else you need to do with the data, I guess. If it's otherwise in a suitable CSV format, I'd probably use COPY FROM into text columns and then re-format the fields once in the DB. But there's any number of scripting languages that could prepare the file first if you preferred. Commented Feb 1, 2017 at 16:26
  • @IMSoP: Yes, this is a raw csv file. Sorry, do you mean pre-process before importing with, say Python? Or you mean doing something after importing? Commented Feb 1, 2017 at 16:29

1 Answer 1

2

Your data can be converted to timestamp with time zone using the function to_timestamp():

with example(d, t) as (
    values ('150103', '132244')
)

select d, t, to_timestamp(concat(d, t), 'yymmddhh24miss')
from example;

   d    |   t    |      to_timestamp      
--------+--------+------------------------
 150103 | 132244 | 2015-01-03 13:22:44+01
(1 row)

You can import a file into a table with temporary columns (d, t):

create table example(d text, t text);
copy example from ....

add a timestamp with time zone column, convert the data and drop redundant text columns:

alter table example add tstamp_column timestamptz;

update example
set tstamp_column = to_timestamp(concat(d, t), 'yymmddhh24miss');

alter table example drop d, drop t;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.