18

I want to write a dataframe to an existing sqlite (or mysql) table and sometimes the dataframe will contain a new column that is not yet present in the database. What do I need to do to avoid this throwing an error? Is there a way to tell pandas or sqlalchemy to automatically expand the database table with potential new columns?

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) table match_exact_both has no column named ....

3 Answers 3

19

Here is my solution using mySQL and sqlalchemy. The basic idea is that if possible I would like to append to the SQL database instead of re-writing the whole thing, but if there is a new column then I can combine the data in Pandas and then overwrite the existing database.

import pymysql
from sqlalchemy import create_engine
import pandas as pd
cnx = create_engine('mysql+pymysql://username:password@hostname/database_name')
try:
    #this will fail if there is a new column
    df.to_sql(name='sql_table', con=cnx, if_exists = 'append', index=False)
except:
    data = pd.read_sql('SELECT * FROM sql_table', cnx)
    df2 = pd.concat([data,df])
    df2.to_sql(name='sql_table', con=cnx, if_exists = 'replace', index=False)
Sign up to request clarification or add additional context in comments.

6 Comments

WARNING: THIS WOULD DELETE THE EXISTING TABLE
How? The entire table is saved in pandas before being re-written. I’ve used this technique thousands of times in automated scripts and no table has been deleted.
if to_sql fails after deleting the table, you will lose all your data, that sounds like a bad idea to me
I've done this 100s of times with no issue (knock on wood). If you want to play it safe you can always create a redundant database and halt the program if there is a problem.
One more thing to note is that if_exists = 'replace' will drop the table and will reset the data types. The right way to do this is to read the schema of the table and create the new columns.
|
0

if there are extra columns in your dataframe then you need to manually add that column to the database table for the df.to_sql() to work.

Comments

0

You might also consider something like this:

  • create transaction
  • rename old table to tempy table
  • df to sql fail
  • insert data from old table
  • drop tempy table
  • commit transaction

note that this way will fail when you remove columns while the pd.concat option will merge the schema

also this will probably only work in databases which support transactional DDL https://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis

dependency: https://github.com/rvkulikov/pg-deps-management

import sqlalchemy
from sqlalchemy import text, sql

        try:
            df.to_sql(
                table_name,
                con=engine,
                if_exists="append",
                index=False,
                chunksize=70,
                method="multi",
            )
        except:
            auto_add_new_columns(engine, table_name, df)

        finally:
            print("Finished updating db")


def auto_add_new_columns(engine, table_name, df, schema="public"):
    with engine.connect() as conn:
        with conn.begin() as transaction:
            md = sqlalchemy.MetaData()
            table = sqlalchemy.Table(table_name, md, autoload=True, autoload_with=conn)

            conn.execute(
                f"select deps_save_and_drop_dependencies('{schema}', '{table}')"
            )
            conn.execute(
                text(
                    "alter table "
                    + sql.quoted_name(table_name, quote=False)
                    + " rename to "
                    + sql.quoted_name(table_name + "_backup", quote=False)
                )
            )

            df.to_sql(
                table_name,
                con=conn,
                if_exists="fail",
                index=False,
                chunksize=70,
                method="multi",
            )

            cols_list = [column.key for column in table.columns]

            conn.execute(
                text(
                    "insert into "
                    + sql.quoted_name(table_name, quote=False)
                    + f" ({','.join(cols_list)}) "
                    + " select "
                    + f" {','.join(cols_list)} "
                    + "from "
                    + sql.quoted_name(table_name + "_backup", quote=False)
                )
            )

            conn.execute(
                text(
                    "drop table " + sql.quoted_name(table_name + "_backup", quote=False)
                )
            )

            conn.execute(f"select deps_restore_dependencies('{schema}', '{table}')")
            transaction.commit()

or perhaps a better way would be to make a map between postgres and sqlalchemy types then zip together and run DDL commands

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.