10

I loop into csv files in a directory and read them with pandas. For each csv files I have a category and a marketplace. Then I need to get the id of the category and the id of the marketplace from the database which will be valid for this csv file.

the finalDf is a dataframe containing all the products for all the csv files and I need to append it with data fron the current csv.

The list of the products of the current CSV are retrived using:

df['PRODUCT']

I need to append them to the finalDf and I used:

finalDf['PRODUCT'] =  finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)

This seems to work fine, and I now have to insert catid and marketid to the corresponding columns of the finalDf. because catid and marketid are consitent accross the current csv file I just need to add them as much time as there are rows in the df dataframe, this is what I'm trying to accomplish in the code below.

finalDf = pd.DataFrame(columns=['PRODUCT', 'CAT_ID', 'MARKET_ID'])
finalDf['PRODUCT'] = finalDf.PRODUCT.astype('category')

df = pd.read_csv(filename, header=None,
                             names=['PRODUCT', 'URL_PRODUCT', 'RANK', 'URL_IMAGE', 'STARS', 'PRICE', 'NAME', 'SNAPDATE',
                                    'CATEGORY', 'MARKETPLACE', 'PARENTCAT', 'LISTTYPE', 'VERSION', 'LEVEL'], sep='\t')

finalDf['PRODUCT'] = finalDf['PRODUCT'].append(df['PRODUCT'],ignore_index=True)
# Here I have a single value to add n times, n corresponding to the number of rows in the dataframe df
catid = 2113
marketid = 13
catids = pd.Series([catid]*len(df.index))
marketids = pd.Series([marketid]*len(df.index))
finalDf['CAT_ID'] = finalDf['CAT_ID'].append(catids, ignore_index=True)
finalDf['MARKET_ID'] = finalDf['MARKET_ID'].append(marketids, ignore_index=True)

print finalDf.head()

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       NaN    NaN
    1    ABB       NaN    NaN
    2    ABE       NaN    NaN
    3    DCB       NaN    NaN
    4    EFT       NaN    NaN

As you can see, I just have NaN values instead of the actual values. expected output:

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13

finalDF containing several csv would look like:

        PRODUCT  CAT_ID  MARKET_ID
    0    ABC       2113    13
    1    ABB       2113    13
    2    ABE       2113    13
    3    DCB       2113    13
    4    EFT       2113    13
    5    SDD       2114    13
    6    ERT       2114    13
    7    GHJ       2114    13
    8    MOD       2114    13
    9    GTR       2114    13
   10    WLY       2114    13
   11    WLO       2115    13
   12    KOP       2115    13

Any idea?

Thanks

6
  • Can you clarify what you mean by "add values to existing columns"? Do you mean add 2113 to a numeric column, or add "2113" string to the end of each item? Commented Apr 27, 2018 at 16:36
  • adding n times a numeric 2113 at the end of an existing column. n being the number of rows red from the csv files Commented Apr 27, 2018 at 16:45
  • Would you mind editing your question with expected output? Still a bit confused as to what you need (like probably the existing answer). Commented Apr 27, 2018 at 16:47
  • Added the expected output. You need to keep in mind that the catid and marketid will be different each time so I really need to append new values to the existing column and not to replace all values in that column by the new value Commented Apr 27, 2018 at 17:10
  • What would look like finalDf.tail()? Would it also have same values for CAT_ID and MARKET_ID? Where do you get catid and markedid from? I still don't get the whole panorama of your task. Commented Apr 27, 2018 at 17:30

2 Answers 2

13

I finally found the solution, don't know why the other one didn't work though. But this one is simpler:

tempDf = pd.DataFrame(columns=['PRODUCT','CAT_ID','MARKET_ID'])
tempDf['PRODUCT'] = df['PRODUCT']
tempDf['CAT_ID'] = catid
tempDf['MARKET_ID'] = 13

finalDf = pd.concat([finalDf,tempDf])
Sign up to request clarification or add additional context in comments.

Comments

0

You actually do not need catids and marketids:

finalDf['CAT_ID'] = catid
finalDf['MARKET_ID'] = marketid

Will work.

For the rest of the script, I would probably have made things a bit simpler in that way:

finalDf = pd.DataFrame()
finalDf['PRODUCT'] = df['PRODUCT'].reset_index()

Supposing that you are not interested in df's original index as your code implied.

3 Comments

I simplified the code but the catid will change in each loop, so it won't work
Then I do not understand your question.
I will edit the question I may have oversimplified it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.