I've been working to develop a product which centers in the daily execution of a data analysis Python 3.7.0 script. Everyday at midnight it will proccess a huge amount of data, and then export the result to two MySQL tables. The first one will only contain the data relative to the current day, while the other table will contain the concatenated data of all executions.
To exemplify what I current have, see the code below, supposing df would be the final DataFrame generated from the data analysis:
import pandas as pd
import sqlalchemy
engine = sqlalchemy.create_engine(r"mysql+pymysql://user:psswd@localhost/pathToMyDB")
df = pd.DataFrame({'Something':['a','b','c']})
df.to_sql('DReg', engine, index = True, if_exists='replace') #daily database
df.to_sql('AReg', engine, index = False, if_exists='append') #anual database
As you can see in the parameters of my second to_sql function, I ain't setting an index to the anual database. However, my manager asked me to do so, creating an index that would center around a simple rule: it would be an auto increasing numeric index, that would automatically attribute a number to every row saved on the database corresponding to its position.
So basically, the first time I saved df, the database should look like:
index Something
0 a
1 b
2 c
And in my second execution:
index Something
0 a
1 b
2 c
3 a
4 b
5 c
However, when I set my index to True in the second df.to_sql command (turning it into df.to_sql('AReg', engine, index = True, if_exists='append')), after two executions my database ends up looking like:
index Something
0 a
1 b
2 c
0 a
1 b
2 c
I did some research, but could not find a way to allow this auto increase on the index. I considered reading the anual database at every execution and then adapting my dataframe's index to it, but my database can easily get REALLY huge, which would make it's execution absurdly slow (and also forbid me to simultaneously execute the same data analysis in two computers without compromising my index).
So what is the best solution to make this index work? What am I missing here?