1
import cx_Oracle
import pandas as pd
from sqlalchemy import create_engine

# credentials
username = "user"
password = "password"
connectStr = "ip:port/service_name"

df = pd.read_csv("data.csv")

# connection
dsn = cx_Oracle.makedsn('my_ip',service_name='my_service_name')

engine = create_engine('oracle+cx_oracle://%s:%s@%s' % (username, 
password, dsn))

# upload dataframe to ORCLDB
df.to_sql(name="test",con=engine, if_exists='append', index=False)

How can I speed up the .to_sql function in Pandas? It's taking me 20mins to write a 120kb file with 1,000 rows as a dataframe into the DB. The column types are all VARCHAR2(256).

Database columns: https://i.sstatic.net/17k8B.jpg

1

1 Answer 1

0

What is happening here is that for every row you insert, it has to wait for the transaction to be completed before the next one can start. The work around here is to do a "bulk insert" using a CSV file that is loaded into memory. I know how this is done using postgres (what I am using) but for oracle, I am not sure. Here is the code I am using for postgres, perhaps it will be of some help.

def bulk_insert_sql_replace(engine, df, table, if_exists='replace', sep='\t', encoding='utf8'):

    # Create Table
    df[:0].to_sql(table, engine, if_exists=if_exists, index=False)
    print(df)

    # Prepare data
    output = io.StringIO()
    df.to_csv(output, sep=sep, index=False, header=False, encoding=encoding)
    output.seek(0)

    # Insert data
    connection = engine.raw_connection()
    cursor = connection.cursor()
    cursor.copy_from(output, table, sep=sep, null='')
    connection.commit()
    cursor.close()

Here is a link to another thread that has tons of great information regarding this issue: Bulk Insert A Pandas DataFrame Using SQLAlchemy

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.