I have dataframe and want to update it or create a new dataframe based on some input from an SQL table. The dataframe A has two columns (ID and Added_Date).
On the other hand, the SQL table has a few more columns including ID, Transaction_Date, Year, Month and Day. My idea is to merge contents of dataframe A to the SQL table and after the merge, pick all records transacted 30 days after the Transaction_Date in SQL table. In summary, I'm keen on having a dataframe with all transactions that happened 30 days (in SQL table) after the Added_Date in the df A. The SQL table is quite huge and is partitioned by Year, Month and Day. How can I optimize this process?
I understand the join can happen when the dataframe is converted to a tuple or may be dictionary but nothing past that. Sample code is below :
import sqlite3
import pandas as pd
# create df
data = {'ID': [1, 2, 3], 'Added_Date': ['2023-02-01', '2023-04-15', '2023-03-17']}
df_A = pd.DataFrame(data)
Below is code to create sample transactions in memory table in SQL
# Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
c = conn.cursor()
# Create the transactions table
c.execute('''CREATE TABLE transactions
(ID INTEGER, transaction_date DATE)''')
# Insert sample data into the transactions table
c.execute('''INSERT INTO transactions VALUES
(1, '2023-01-15'), (1, '2023-02-10'), (1, '2023-03-01'),
(2, '2023-04-01'), (2, '2023-04-20'), (2, '2023-05-05'),
(3, '2023-03-10'), (3, '2023-03-25'), (3, '2023-04-02')''')
Expected outcome should be something like this:
ID transaction_date
1 2023-02-10
1 2023-03-01
2 2023-04-20
2 2023-05-05
3 2023-03-10
3 2023-03-25
3 2023-04-02
I hope that's more clear.
sqlitetag) and Microsoft SQL Server (thesql-servertag) are not even remotely the same thing. Which database system are you actually using? (Please correct your tags.)'2023-03-10'forID == 3. That seems incorrect, given'2023-03-17'indf_A. No?