3

I am using SQLAlchemy for the first time to export around 6 million records to MySQL. Following is the error I receive:

OperationalError: (mysql.connector.errors.OperationalError) 2055: Lost connection to MySQL server at '127.0.0.1:3306', system error: 10053 An established connection was aborted by the software in your host machine

Code:

import pandas as pd
import sqlalchemy

df=pd.read_excel(r"C:\Users\mazin\1-601.xlsx")

database_username = 'root'
database_password = 'aUtO1115'
database_ip       = '127.0.0.1'
database_name     = 'patenting in psis'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'.
                                               format(database_username, database_password, 
                                                      database_ip, database_name), pool_recycle=1, pool_timeout=30).connect()

df.to_sql(con=database_connection, name='sample', if_exists='replace')
database_connection.close()

Note: I do not get the error if I export around 100 records. After referring to similar posts, I have added the pool_recycle and pool_timeout parameters but the error still persists.

3
  • If you're inserting 6 million rows, you for sure exceed timeout of 30 seconds. Have you tried to input chunks instead all at once. to_sql has optional parameter chunksize that you can use. Commented Feb 9, 2018 at 21:20
  • @PerunSS - I got the same error when I used a timeout of 57600 seconds. Also, when I use the chunksize parameter, it gives me Programming Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%(Maintenance Status (US))s, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'Ma' at line 1 Commented Feb 10, 2018 at 4:01
  • @PerunSS - The use of the chunksize parameter and setting appropriate values for pool_recycle and pool_timeout made the code work. Do you want to post it as an answer? Commented Feb 23, 2018 at 15:26

1 Answer 1

5

Problem is that you're trying to import 6 million rows as one chunk. And it is taking time. With your current config, pool_recycle is set to 1 second, meaning connection will close after 1 second, and that for sure is not enough time to insert 6 mill rows. My suggestion is next:

database_connection = sqlalchemy.create_engine(
    'mysql+mysqlconnector://{0}:{1}@{2}/{3}'.format(
        database_username, 
        database_password,
        database_ip, database_name
    ), pool_recycle=3600, pool_size=5).connect()
df.to_sql(
    con=database_connection, 
    name='sample', 
    if_exists='replace',
    chunksize=1000
)

This will set pool of 5 connections with recycle time of 1 hour. And second line will insert 1000 at a time (instead of all the rows at once). You can experiment with values to achieve best performance.

Sign up to request clarification or add additional context in comments.

1 Comment

This was super useful

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.