Uploading part of an ascii text file into PostgreSQL table

Question

I am trying to upload part of a text file into a database table. The text file is around 12 GB. Screenshot of the text file. I am parsing the text file line by line and inserting it into the table.

The following is the code that I am using to upload the data:

import psycopg2 as pg
import os
import datetime

sub_column_list = ['', 'SUB', 'GIS', 'MO', 'DA', 'YR', 'AREAkm2', 'PRECIPmm', 'SNOMELTmm', 'PETmm', 'ETmm', 'SWmm', 'PERCmm',
          'SURQmm', 'GW_Qmm', 'WYLDmm', 'SYLDt/ha', 'ORGNkg/ha', 'ORGPkg/ha', 'NSURQkg/ha', 'SOLPkg/ha',
          'SEDPkg/ha', 'LATQmm', 'LATNO3kg/ha', 'GWNO3kg/ha', 'CHOLAmic/L', 'CBODUmg/L', 'DOXQmg/L', 'TNO3kg/ha']

sub_vars = ['PRECIPmm', 'PETmm', 'ETmm', 'SWmm', 'SURQmm']

conn = psycopg2.connect('dbname=swat_db user=admin password=pass host=localhost port=5435')

cur = conn.cursor()

watershed_id = 1
if file.endswith('.sub'):

    sub_path = os.path.join(output_path, file)
    f = open(sub_path)
    for skip_line in f:
        if 'AREAkm2' in skip_line:
            break

    for num, line in enumerate(f, 1):
        line = str(line.strip())
        columns = line.split()
        for idx, item in enumerate(sub_vars):
            sub = int(columns[1])
            dt = datetime.date(int(columns[5]), int(columns[3]), int(columns[4]))
            var_name = item
            val = float(columns[sub_column_list.index(item)])
            cur.execute("""INSERT INTO output_sub (watershed_id, month_day_year, sub_id, var_name, val)
                         VALUES ({0}, '{1}', {2}, '{3}', {4})""".format(watershed_id, dt, sub, var_name, val))

        conn.commit()
    conn.close()

The sub_column_list is the list of all the columns in the text file. The sub_vars list is a list of the variables that I would like to put into the database. This approach is taking a very long time to insert the values into the database. What would be a good way to improve the speed at which the values are inserted into the database?

eatmeimadanish · Accepted Answer · 2018-10-16 19:19:29Z

The first I notice is you are going through the file twice. Once for AREAkm2 Search and then again you start over and begin dumping into your database. Maybe this is what you wanted?

if file.endswith('.sub'):

    sub_path = os.path.join(output_path, file)
    f = open(sub_path)
    for num, line in enumerate(f):
        if 'AREAkm2' in line:
            continue
        line = str(line.strip())
        columns = line.split()
        for idx, item in enumerate(sub_vars):
            sub = int(columns[1])
            dt = datetime.date(int(columns[5]), int(columns[3]), int(columns[4]))
            var_name = item
            val = float(columns[sub_column_list.index(item)])
            cur.execute("""INSERT INTO output_sub (watershed_id, month_day_year, sub_id, var_name, val)
                         VALUES ({0}, '{1}', {2}, '{3}', {4})""".format(watershed_id, dt, sub, var_name, val))

Collectives™ on Stack Overflow

Uploading part of an ascii text file into PostgreSQL table

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related