0

I've been trying to create a dataframe based on a SQL string, but I want to query the same things (3 counts) over different periods (in this example, monthly).

For context, I've been working in a python notebook on Civis.

I came up with this :

START_DATE_LIST = ["2021-01-01","2021-02-01","2021-03-01","2021-04-01","2021-05-01","2021-06-01","2021-07-01","2021-08-01","2021-09-01","2021-10-01","2021-11-01","2021-12-01"]
END_DATE_LIST = ["2021-01-31","2021-02-28","2021-03-31","2021-04-30","2021-05-31","2021-06-30","2021-07-31","2021-08-31","2021-09-31","2021-10-31","2021-11-30","2021-12-31"]

for start_date, end_date in zip(START_DATE_LIST,END_DATE_LIST) :
    SQL = f"select count (distinct case when (CON.firstdonationdate__c< {start_date} and CON.last_gift_date__c>  {start_date} ) then CON.id else null end) as Donors_START_DATE, \
count (distinct case when (CON.firstdonationdate__c< {end_date} and CON.c_last_gift_date__c> {end_date}) then CON.id else null end) as Donors_END_DATE, \
count (distinct case when (CON.firstdonationdate__c> {start_date} and CON.firstdonationdate__c<{end_date}) then CON.id else null end) as New_Donors \
from staging.contact CON;"
    df2 =civis.io.read_civis_sql(SQL, "database", use_pandas=True)
    df2['START_DATE']=start_date
    df2['END_DATE']= end_date

It runs but then the output is only :

 donors_start_date  donors_end_date new_donors  START_DATE  END_DATE
0      47458                  0           0       2021-12-01     2021-12-31

I'm thinking I have two problems : 1/ it reruns the df each time and, I need to find a way to stack up the outputs for each month. 2/ why doesn't it compute the last two counts for the last month.

Any feedback is greatly appreciated!

1 Answer 1

1

I think you have correctly identified the problem yourself:

  • In each iteration, you perform an SQL query and assign the result to DataFrame object called df2 (thus overriding its previous value)
  • Instead, you want create a DataFrame object outside the loop, then append data to it:
import pandas as pd

START_DATE_LIST = ...
END_DATE_LIST = ...


df = pd.DataFrame()

for start_date, end_date in zip(START_DATE_LIST, END_DATE_LIST) :
    SQL = ...
    row = civis.io.read_civis_sql(SQL, "database", use_pandas=True)
    row['START_DATE'] = start_date
    row['END_DATE'] = end_date

    df = df.append(row)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks ! it's running fine now, but the counts aren't working. Probably something to do with the data type not matching. Anyway, much thanks for your time :)
Are you referring to the count statements in the SQL query? I'm afraid that's where my knowledge ends. Might have to wait for an SQL guru :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.