pandas.to_sql with new columns to existing table, add automatically new columns?

Question

I want to write a dataframe to an existing sqlite (or mysql) table and sometimes the dataframe will contain a new column that is not yet present in the database. What do I need to do to avoid this throwing an error? Is there a way to tell pandas or sqlalchemy to automatically expand the database table with potential new columns?

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) table match_exact_both has no column named ....

sparrow · Accepted Answer · 2019-03-18 18:55:02Z

19

Here is my solution using mySQL and sqlalchemy. The basic idea is that if possible I would like to append to the SQL database instead of re-writing the whole thing, but if there is a new column then I can combine the data in Pandas and then overwrite the existing database.

import pymysql
from sqlalchemy import create_engine
import pandas as pd
cnx = create_engine('mysql+pymysql://username:password@hostname/database_name')
try:
    #this will fail if there is a new column
    df.to_sql(name='sql_table', con=cnx, if_exists = 'append', index=False)
except:
    data = pd.read_sql('SELECT * FROM sql_table', cnx)
    df2 = pd.concat([data,df])
    df2.to_sql(name='sql_table', con=cnx, if_exists = 'replace', index=False)

answered Mar 18, 2019 at 18:55

sparrow

11.6k12 gold badges61 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user12921571 Over a year ago

WARNING: THIS WOULD DELETE THE EXISTING TABLE

sparrow Over a year ago

How? The entire table is saved in pandas before being re-written. I’ve used this technique thousands of times in automated scripts and no table has been deleted.

rahenri Over a year ago

if to_sql fails after deleting the table, you will lose all your data, that sounds like a bad idea to me

sparrow Over a year ago

I've done this 100s of times with no issue (knock on wood). If you want to play it safe you can always create a redundant database and halt the program if there is a problem.

vitalious Over a year ago

One more thing to note is that if_exists = 'replace' will drop the table and will reset the data types. The right way to do this is to read the schema of the table and create the new columns.

|

mysterious_guy · Accepted Answer · 2016-08-05 21:14:12Z

0

if there are extra columns in your dataframe then you need to manually add that column to the database table for the df.to_sql() to work.

answered Aug 5, 2016 at 21:14

mysterious_guy

4351 gold badge11 silver badges24 bronze badges

Comments

Jacob Chapman · Accepted Answer · 2021-11-26 13:15:20Z

You might also consider something like this:

create transaction
rename old table to tempy table
df to sql fail
insert data from old table
drop tempy table
commit transaction

note that this way will fail when you remove columns while the pd.concat option will merge the schema

also this will probably only work in databases which support transactional DDL https://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis

dependency: https://github.com/rvkulikov/pg-deps-management

import sqlalchemy
from sqlalchemy import text, sql

        try:
            df.to_sql(
                table_name,
                con=engine,
                if_exists="append",
                index=False,
                chunksize=70,
                method="multi",
            )
        except:
            auto_add_new_columns(engine, table_name, df)

        finally:
            print("Finished updating db")


def auto_add_new_columns(engine, table_name, df, schema="public"):
    with engine.connect() as conn:
        with conn.begin() as transaction:
            md = sqlalchemy.MetaData()
            table = sqlalchemy.Table(table_name, md, autoload=True, autoload_with=conn)

            conn.execute(
                f"select deps_save_and_drop_dependencies('{schema}', '{table}')"
            )
            conn.execute(
                text(
                    "alter table "
                    + sql.quoted_name(table_name, quote=False)
                    + " rename to "
                    + sql.quoted_name(table_name + "_backup", quote=False)
                )
            )

            df.to_sql(
                table_name,
                con=conn,
                if_exists="fail",
                index=False,
                chunksize=70,
                method="multi",
            )

            cols_list = [column.key for column in table.columns]

            conn.execute(
                text(
                    "insert into "
                    + sql.quoted_name(table_name, quote=False)
                    + f" ({','.join(cols_list)}) "
                    + " select "
                    + f" {','.join(cols_list)} "
                    + "from "
                    + sql.quoted_name(table_name + "_backup", quote=False)
                )
            )

            conn.execute(
                text(
                    "drop table " + sql.quoted_name(table_name + "_backup", quote=False)
                )
            )

            conn.execute(f"select deps_restore_dependencies('{schema}', '{table}')")
            transaction.commit()

or perhaps a better way would be to make a map between postgres and sqlalchemy types then zip together and run DDL commands

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

pandas.to_sql with new columns to existing table, add automatically new columns?

3 Answers 3

6 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related