Adding values to existing columns in pandas

Question

I loop into csv files in a directory and read them with pandas. For each csv files I have a category and a marketplace. Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.

the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.

The list of the products of the current CSV are retrived using:

df['PRODUCT']

I need to append them to the finalDf and I used:

finalDf['PRODUCT'] =  finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)

This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I'm trying to accomplish in the code below.

finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')

df = pd.read_csv(filename, header=None,
                             names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
                                    'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='\t')

finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)

print finalDf.head()

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       NaN    NaN
    1    ABB       NaN    NaN
    2    ABE       NaN    NaN
    3    DCB       NaN    NaN
    4    EFT       NaN    NaN

As you can see, I just have NaN values instead of the actual values. expected output:

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13

finalDF containing several csv would look like:

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13
    5    SDD       2114    13
    6    ERT       2114    13
    7    GHJ       2114    13
    8    MOD       2114    13
    9    GTR       2114    13
   10    WLY       2114    13
   11    WLO       2115    13
   12    KOP       2115    13

Any idea?

Thanks

Can you clarify what you mean by "add values to existing columns"? Do you mean add 2113 to a numeric column, or add "2113" string to the end of each item? — jpp
– jpp, Commented Apr 27, 2018 at 16:36
adding n times a numeric 2113 at the end of an existing column. n being the number of rows red from the csv files — Cyrille MODIANO
– Cyrille MODIANO, Commented Apr 27, 2018 at 16:45
Would you mind editing your question with expected output? Still a bit confused as to what you need (like probably the existing answer). — jpp
– jpp, Commented Apr 27, 2018 at 16:47
Added the expected output. You need to keep in mind that the catid and marketid will be different each time so I really need to append new values to the existing column and not to replace all values in that column by the new value — Cyrille MODIANO
– Cyrille MODIANO, Commented Apr 27, 2018 at 17:10
What would look like finalDf.tail()? Would it also have same values for CAT_ID and MARKET_ID? Where do you get catid and markedid from? I still don't get the whole panorama of your task. — Cedric Zoppolo
– Cedric Zoppolo, Commented Apr 27, 2018 at 17:30

Cyrille MODIANO · Accepted Answer · 2018-04-27 20:02:24Z

13

I finally found the solution, don't know why the other one didn't work though. But this one is simpler:

tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13

finalDf = pd.concat([finalDf,tempDf])

answered Apr 27, 2018 at 20:02

Cyrille MODIANO

2,4163 gold badges24 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Paul-Darius · Accepted Answer · 2018-04-27 16:48:01Z

0

You actually do not need catids and marketids:

finalDf['CAT_ID'] = catid
finalDf['MARKET_ID'] = marketid

Will work.

For the rest of the script, I would probably have made things a bit simpler in that way:

finalDf = pd.DataFrame()
finalDf['PRODUCT'] = df['PRODUCT'].reset_index()

Supposing that you are not interested in df's original index as your code implied.

edited Apr 27, 2018 at 16:48

answered Apr 27, 2018 at 16:37

Paul-Darius

1265 bronze badges

3 Comments

Cyrille MODIANO Over a year ago

I simplified the code but the catid will change in each loop, so it won't work

Paul-Darius Over a year ago

Then I do not understand your question.

Cyrille MODIANO Over a year ago

I will edit the question I may have oversimplified it

Collectives™ on Stack Overflow

Adding values to existing columns in pandas

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related