13

Is there an idiomatic equivalent to SQL's window functions in Pandas? For example, what's the most compact way to write the equivalent of this in Pandas?

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name 

Or this?:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name
3
  • Can you provide a sample data set and desired data set? Commented Jan 10, 2017 at 16:18
  • @JackManey, AFAIK it's not quite the same - at least for mentioned SQLs... Commented Jan 10, 2017 at 16:18
  • @JackManey the window functions in the Pandas docs are a subset of the functionality that SQL window functions have. Basically what I want to do is compute aggregates without reducing the data frame. Commented Jan 10, 2017 at 16:39

2 Answers 2

26

For the first SQL:

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name 

Pandas:

df.assign(national_population=df.state_population.sum()).sort_values('state_name')

For the second SQL:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name

Pandas:

df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
  .sort_values('state_name')

DEMO:

In [238]: df
Out[238]:
   region state_name  state_population
0       1        aaa               100
1       1        bbb               110
2       2        ccc               200
3       2        ddd               100
4       2        eee               100
5       3        xxx                55

national_population:

In [246]: df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Out[246]:
   region state_name  state_population  national_population
0       1        aaa               100                  665
1       1        bbb               110                  665
2       2        ccc               200                  665
3       2        ddd               100                  665
4       2        eee               100                  665
5       3        xxx                55                  665

regional_population:

In [239]: df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
     ...:   .sort_values('state_name')
Out[239]:
   region state_name  state_population  regional_population
0       1        aaa               100                  210
1       1        bbb               110                  210
2       2        ccc               200                  400
3       2        ddd               100                  400
4       2        eee               100                  400
5       3        xxx                55                   55
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this is what I was looking for. Didn't know about transform.
@2daaa, you are welcome. You may want to read Pandas: comparison with SQL
0

Another common window is OVER(ORDER BY ...). For example, the following.

SELECT *
    ,SUM(values) OVER(ORDER BY date) AS cum_sum
FROM df;

The pandas equivalent is cumsum()

df['cum_sum'] = df['values'].sort_values(by='date').cumsum()

Another common window function is ROW_NUMBER().

SELECT *
    ,ROW_NUMBER() OVER () AS row_number
FROM df;

It's equivalent pandas equivalent is range().

df['row_number'] = range(1, len(df)+1)

Also there is a module pandasql that's built on pandas that lets you run sql queries on local dataframes. So if you're comfortable with sql, then you can run a query directly on a dataframe.

# !pip isntall pandasql
from pandasql import sqldf
df = sqldf("""
SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name 
""")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.