0

I have done following coding, but dont know why it come out empty dataframe.

     <Report xmlns="urn:crystal-reports:schemas:report-detail"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:crystal-reports:schemas:report-detail http://www.businessobjects.com/products/xml/CR2008Schema.xsd">
        <Details Level="1">
        <Field Name='ReportNo'><Value>90</Value>
ns = {"urn:crystal-reports:schemas:report-detail#"}


def test(xml_file, df_cols):
    global df
    xtree = et.parse(xml_file)
    xroot = xtree.getroot()
    out_xml = pd.DataFrame(columns=df_cols)

    for node in xroot.findall("urn:Group[1]/Details/Field", ns):
        name = node.attrib.get("Name")
        value = node.find("Value").text

2
  • 1
    You will need to show us a bit of the data too, especially the namespace declarations and some of those tags. Commented Nov 13, 2019 at 14:10
  • Updated, thanks for response. Commented Nov 13, 2019 at 14:16

1 Answer 1

1

The XML snippet you pasted does not conform to the query you have, it's missing the <Group> element you're looking for.

Either way, you'll need to

  • have a correct namespace map (dict) – you currently have a set with one entry
  • need to separate the namespace alias with a real colon :, not a fullwidth colon
  • have the namespace on each element of the query, as well as the Value subnode query.

I chose r (short for "report") as the alias for urn:crystal-reports:schemas:report-detail here. If you don't want to use aliases, you can also use the longhand syntax {urn:crystal-reports:schemas:report-detail}Group, etc., in which case you don't need the namespace map.

All that fixed, we get something like

import xml.etree.ElementTree as et

data = """<?xml version="1.0"?>
<Report xmlns="urn:crystal-reports:schemas:report-detail" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:crystal-reports:schemas:report-detail http://www.businessobjects.com/products/xml/CR2008Schema.xsd">
  <Group>
      <Details Level="1">
        <Field Name="ReportNo"><Value>90</Value></Field>
        <Field Name="Other"><Value>644</Value></Field>
      </Details>
  </Group>
</Report>
"""

nsmap = {"r": "urn:crystal-reports:schemas:report-detail"}
xroot = et.XML(data)  # could read from file here

for node in xroot.findall("r:Group/r:Details/r:Field", nsmap):
    name = node.attrib.get("Name")
    value = node.find("r:Value", nsmap).text
    print(name, value)

The output here is

ReportNo 90
Other 644

– plugging it into a dataframe is left as an exercise to the reader.

Sign up to request clarification or add additional context in comments.

3 Comments

Nice of you to add a custom answer here (+1), but note that this question has been asked and answered many times in the past: See this Python-specific XPath and namespace q/a or this general namespaces in XPath q/a as two examples.
Thanks a lot, it work well. I will try to find solution go through stackoverflow next time.
could u help me with this How do solve IndexError: single positional indexer is out-of-bounds? under stackoverflow.com/questions/58848561/…?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.