how to insert multiple python parameter into sql sting to create a dataframe

Question

I've been trying to create a dataframe based on a SQL string, but I want to query the same things (3 counts) over different periods (in this example, monthly).

For context, I've been working in a python notebook on Civis.

I came up with this :

START_DATE_LIST = ["2021-01-01","2021-02-01","2021-03-01","2021-04-01","2021-05-01","2021-06-01","2021-07-01","2021-08-01","2021-09-01","2021-10-01","2021-11-01","2021-12-01"]
END_DATE_LIST = ["2021-01-31","2021-02-28","2021-03-31","2021-04-30","2021-05-31","2021-06-30","2021-07-31","2021-08-31","2021-09-31","2021-10-31","2021-11-30","2021-12-31"]

for start_date, end_date in zip(START_DATE_LIST,END_DATE_LIST) :
    SQL = f"select count (distinct case when (CON.firstdonationdate__c< {start_date} and CON.last_gift_date__c>  {start_date} ) then CON.id else null end) as Donors_START_DATE, \
count (distinct case when (CON.firstdonationdate__c< {end_date} and CON.c_last_gift_date__c> {end_date}) then CON.id else null end) as Donors_END_DATE, \
count (distinct case when (CON.firstdonationdate__c> {start_date} and CON.firstdonationdate__c<{end_date}) then CON.id else null end) as New_Donors \
from staging.contact CON;"
    df2 =civis.io.read_civis_sql(SQL, "database", use_pandas=True)
    df2['START_DATE']=start_date
    df2['END_DATE']= end_date

It runs but then the output is only :

 donors_start_date  donors_end_date new_donors  START_DATE  END_DATE
0      47458                  0           0       2021-12-01     2021-12-31

I'm thinking I have two problems : 1/ it reruns the df each time and, I need to find a way to stack up the outputs for each month. 2/ why doesn't it compute the last two counts for the last month.

Any feedback is greatly appreciated!

Anton Yang-Wälder · Accepted Answer · 2022-02-03 12:40:07Z

1

I think you have correctly identified the problem yourself:

In each iteration, you perform an SQL query and assign the result to DataFrame object called df2 (thus overriding its previous value)
Instead, you want create a DataFrame object outside the loop, then append data to it:

import pandas as pd

START_DATE_LIST = ...
END_DATE_LIST = ...


df = pd.DataFrame()

for start_date, end_date in zip(START_DATE_LIST, END_DATE_LIST) :
    SQL = ...
    row = civis.io.read_civis_sql(SQL, "database", use_pandas=True)
    row['START_DATE'] = start_date
    row['END_DATE'] = end_date

    df = df.append(row)

answered Feb 3, 2022 at 12:40

Anton Yang-Wälder

9646 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user11713467 Over a year ago

Thanks ! it's running fine now, but the counts aren't working. Probably something to do with the data type not matching. Anyway, much thanks for your time :)

Anton Yang-Wälder Over a year ago

Are you referring to the count statements in the SQL query? I'm afraid that's where my knowledge ends. Might have to wait for an SQL guru :-)

Collectives™ on Stack Overflow

how to insert multiple python parameter into sql sting to create a dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related