0

I would like to create a dataframe in a loop and after use these dataframe in a loop. I tried eval() function but it didn't work.

For example :

for i in range(5):
    df_i = df[(df.age == i)]

There I would like to create df_0,df_1 etc. And then concatenate these new dataframe after some calculations :

final_df = pd.concat(df_0,df_1)

for i in range(2:5):
    final_df = pd.concat(final_df, df_i)

2 Answers 2

1

You can create a dict of DataFrames x and have is as dict keys:

np.random.seed(42)
df = pd.DataFrame({'age': np.random.randint(0, 5, 20)})

x = {}
for i in range(5):
    x[i] = df[df['age']==i]

final = pd.concat(x.values())

Then you can refer to individual DataFrames as:

x[1]

Output:

    age
5     1
13    1
15    1

And concatenate all of them with:

pd.concat(x.values())

Output:

    age
18    0
5     1
13    1
15    1
2     2
6     2
...
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your help. Is it possible to give a name to the dataframe depending on i ? In fact, I will create dataframe depending on double loop...
Technically, yes, you can create variables with exec like exec(f"df_{i} = df[df['age']==i]"), but it's normally not recommended. See for example stackoverflow.com/questions/5036700/…
0

The way is weird and not recommended, but it can be done.

Answer

for i in range(5):
    exec("df_{i} = df[df['age']=={i}]")

def UDF(dfi):
    # do something in user-defined function

for i in range(5):
    exec("df_{i} = UDF(df_{i})")

final_df = pd.concat(df_0,df_1)

for i in range(2:5):
    final_df = pd.concat(final_df, df_i)

Better Way 1

Using a list or a dict to store the dataframe should be a better way since you can access each dataframe by an index or a key.

Since another answer shows the way using dict (@perl), I will show you the way using list.

def UDF(dfi):
    # do something in user-defined function

dfs = [df[df['age']==i] for i in range(i)]
final_df = pd.concat(map(UDF, dfs))

Better Way 2

Since you are using pandas.DataFrame, groupby function is a 'pandas' way to do what you want. (maybe, I guess, cause I don't know what you want to do. LOL)

def UDF(dfi):
    # do something in user-defined function

final_df = df.groupby('age').apply(UDF)

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.