How do i parse xml file with namespace?

Question

I have done following coding, but dont know why it come out empty dataframe.

     <Report xmlns="urn:crystal-reports:schemas:report-detail"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:crystal-reports:schemas:report-detail http://www.businessobjects.com/products/xml/CR2008Schema.xsd">
        <Details Level="1">
        <Field Name='ReportNo'><Value>90</Value>

ns = {"urn:crystal-reports:schemas:report-detail#"}


def test(xml_file, df_cols):
    global df
    xtree = et.parse(xml_file)
    xroot = xtree.getroot()
    out_xml = pd.DataFrame(columns=df_cols)

    for node in xroot.findall("urn：Group[1]/Details/Field", ns):
        name = node.attrib.get("Name")
        value = node.find("Value").text

You will need to show us a bit of the data too, especially the namespace declarations and some of those tags. — AKX
– AKX, Commented Nov 13, 2019 at 14:10

AKX · Accepted Answer · 2019-11-13 14:25:54Z

1

The XML snippet you pasted does not conform to the query you have, it's missing the <Group> element you're looking for.

Either way, you'll need to

have a correct namespace map (dict) – you currently have a set with one entry
need to separate the namespace alias with a real colon :, not a fullwidth colon ：
have the namespace on each element of the query, as well as the Value subnode query.

I chose r (short for "report") as the alias for urn:crystal-reports:schemas:report-detail here. If you don't want to use aliases, you can also use the longhand syntax {urn:crystal-reports:schemas:report-detail}Group, etc., in which case you don't need the namespace map.

All that fixed, we get something like

import xml.etree.ElementTree as et

data = """<?xml version="1.0"?>
<Report xmlns="urn:crystal-reports:schemas:report-detail" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:crystal-reports:schemas:report-detail http://www.businessobjects.com/products/xml/CR2008Schema.xsd">
  <Group>
      <Details Level="1">
        <Field Name="ReportNo"><Value>90</Value></Field>
        <Field Name="Other"><Value>644</Value></Field>
      </Details>
  </Group>
</Report>
"""

nsmap = {"r": "urn:crystal-reports:schemas:report-detail"}
xroot = et.XML(data)  # could read from file here

for node in xroot.findall("r:Group/r:Details/r:Field", nsmap):
    name = node.attrib.get("Name")
    value = node.find("r:Value", nsmap).text
    print(name, value)

The output here is

ReportNo 90
Other 644

– plugging it into a dataframe is left as an exercise to the reader.

answered Nov 13, 2019 at 14:25

AKX

171k16 gold badges148 silver badges230 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

kjhughes Over a year ago

Nice of you to add a custom answer here (+1), but note that this question has been asked and answered many times in the past: See this Python-specific XPath and namespace q/a or this general namespaces in XPath q/a as two examples.

Jovis ch Over a year ago

Thanks a lot, it work well. I will try to find solution go through stackoverflow next time.

Jovis ch Over a year ago

could u help me with this How do solve IndexError: single positional indexer is out-of-bounds? under stackoverflow.com/questions/58848561/…?

Collectives™ on Stack Overflow

How do i parse xml file with namespace?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related