1

I am trying to create Pandas dataframe out of XML. The XML looks like this:

<?xml version="1.0" encoding="utf-8"?>
<Products>
    <Info>
        <Msg>Shop items.</Msg>
    </Info>
    <shop shopNr="01">
        <ItemNr>1001</ItemNr>
        <ItemNr>1002</ItemNr>
        <ItemNr>1003</ItemNr>
        <ItemNr>1004</ItemNr>
                <ItemNr>1010</ItemNr>
    </shop>
    <shop shopNr="02">
        <ItemNr>1002</ItemNr>
        <ItemNr>1006</ItemNr>
        <ItemNr>1005</ItemNr>
    </shop>
    <shop shopNr="03">
        <ItemNr>1009</ItemNr>
        <ItemNr>1006</ItemNr>
        <ItemNr>1005</ItemNr>
        <ItemNr>1002</ItemNr>
    </shop>
</Products>

I have tried using XML Etree as the code below. I have two problems.

First, I can not get the values of ItemNr as children of the root. Instead of getting the value, ie. 1001, I get

<Element 'ItemNr' at 0x000001E2D6C41B38>.

The second problem is when I am creating a dataframe out of the lists. I end up having a list of lists in items. Although the result is empty now since I could not get the values above, I want to end up with a flattened list.

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()

shops = []
items = []
for node in root.iter('shop'):
    shops.append(node.attrib.get('shopNr'))
    items.append(list(node))

d = {'shops': shops, 'items': items}
df = pd.DataFrame(d)

The DataFrame produced.

 shops                 items
0    01  [[], [], [], [], []]
1    02          [[], [], []]
2    03      [[], [], [], []]

desired output is:


 shops                 items
0    01  [1001, 1002, 1003, 1004, 1010]
1    02          [1002, 1006, 1005]
2    03      [1009, 1006, 1005, 1002]

1

2 Answers 2

2

You want to append the text values from the ItemNr elements which are under the shop element to the items list and not the xml Element python object which is what you were doing.

The following code was working for me:

items.append([item_nr_element.text for item_nr_element in node.getchildren()])
Sign up to request clarification or add additional context in comments.

Comments

2

I hope this is the expected output:

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()
shops_items = []
all_shops_items = []
for ashop in root.iter('shop'):
    items = []
    shop_Nr = ashop.attrib.get('shopNr')
    for anitem in ashop.iter('ItemNr'):
        items.append(anitem.text)
    shops_items = [shop_Nr,items]
    all_shops_items.append(shops_items)
df = pd.DataFrame(all_shops_items,columns=['SHOP_NUMBER','ITEM_NUMBER'])        
print(df)

Output:

  SHOP_NUMBER                     ITEM_NUMBER
0          01  [1001, 1002, 1003, 1004, 1010]
1          02              [1002, 1006, 1005]
2          03        [1009, 1006, 1005, 1002]

If you want shops with individual items :

import xml.etree.ElementTree as ET
import pandas as pd
data = 'example_shops.xml'
tree = ET.parse(data)
root = tree.getroot()
shops_items = []
all_shops_items = []
for ashop in root.iter('shop'):
    shop_Nr = ashop.attrib.get('shopNr')
    for anitem in ashop.iter('ItemNr'):
        item_Nr = anitem.text
        shops_items = [shop_Nr,item_Nr]
        all_shops_items.append(shops_items)
df = pd.DataFrame(all_shops_items,columns=['SHOP_NUMBER','ITEM_NUMBER'])        
print(df)

output:

   SHOP_NUMBER ITEM_NUMBER
0           01        1001
1           01        1002
2           01        1003
3           01        1004
4           01        1010
5           02        1002
6           02        1006
7           02        1005
8           03        1009
9           03        1006
10          03        1005
11          03        1002

1 Comment

Yes the first option is exactly the output. The second option also might help me later. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.