Create dataframe in a loop

Question

I would like to create a dataframe in a loop and after use these dataframe in a loop. I tried eval() function but it didn't work.

For example :

for i in range(5):
    df_i = df[(df.age == i)]

There I would like to create df_0,df_1 etc. And then concatenate these new dataframe after some calculations :

final_df = pd.concat(df_0,df_1)

for i in range(2:5):
    final_df = pd.concat(final_df, df_i)

perl · Accepted Answer · 2019-03-21 09:58:42Z

1

You can create a dict of DataFrames x and have is as dict keys:

np.random.seed(42)
df = pd.DataFrame({'age': np.random.randint(0, 5, 20)})

x = {}
for i in range(5):
    x[i] = df[df['age']==i]

final = pd.concat(x.values())

Then you can refer to individual DataFrames as:

x[1]

Output:

And concatenate all of them with:

pd.concat(x.values())

Output:

edited Mar 21, 2019 at 9:58

answered Mar 21, 2019 at 9:53

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Babouch Over a year ago

Thank you for your help. Is it possible to give a name to the dataframe depending on i ? In fact, I will create dataframe depending on double loop...

perl Over a year ago

Technically, yes, you can create variables with exec like exec(f"df_{i} = df[df['age']==i]"), but it's normally not recommended. See for example stackoverflow.com/questions/5036700/…

Liuhonwun · Accepted Answer · 2019-03-22 02:32:49Z

The way is weird and not recommended, but it can be done.

Answer

for i in range(5):
    exec("df_{i} = df[df['age']=={i}]")

def UDF(dfi):
    # do something in user-defined function

for i in range(5):
    exec("df_{i} = UDF(df_{i})")

final_df = pd.concat(df_0,df_1)

for i in range(2:5):
    final_df = pd.concat(final_df, df_i)

Better Way 1

Using a list or a dict to store the dataframe should be a better way since you can access each dataframe by an index or a key.

Since another answer shows the way using dict (@perl), I will show you the way using list.

def UDF(dfi):
    # do something in user-defined function

dfs = [df[df['age']==i] for i in range(i)]
final_df = pd.concat(map(UDF, dfs))

Better Way 2

Since you are using pandas.DataFrame, groupby function is a 'pandas' way to do what you want. (maybe, I guess, cause I don't know what you want to do. LOL)

def UDF(dfi):
    # do something in user-defined function

final_df = df.groupby('age').apply(UDF)

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

Collectives™ on Stack Overflow

Create dataframe in a loop

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related