9

I need to extract the value of an attribute in an XML document using Python.

For example, If I have an XML document like this:

<xml>
    <child type = "smallHuman"/>
    <adult type = "largeHuman"/>
</xml>

How would I be able get the text 'smallHuman' or 'largeHuman' to store in a variable?

Edit: I'm very new to Python and may require a lot of assistance.

This is what I've tried so far:

#! /usr/bin/python

import xml.etree.ElementTree as ET


def walkTree(node):
    print node.tag
    print node.keys()
    print node.attributes[]
    for cn in list(node):
        walkTree(cn)

treeOne = ET.parse('tm1.xml')
treeTwo = ET.parse('tm3.xml')

walkTree(treeOne.getroot())

Due to the way this script will be used, I cannot hard-code the XML into the .py file.

3
  • 1
    The first link in google should help you. Commented Feb 12, 2018 at 12:27
  • 1
    I've updated the question with the code written so far @James Commented Feb 12, 2018 at 12:29
  • Read more on the ElementTree module. You will get solution Commented Feb 12, 2018 at 12:36

4 Answers 4

8

To get the attribute value from an XML, you can do like this:

import xml.etree.ElementTree as ET

xml_data = """<xml>
<child type = "smallHuman"/>
<adult type = "largeHuman"/>
</xml>"""

# This is like ET.parse(), but for strings
root = ET.fromstring(xml_data)

for a child in root:
    print(child.tag, child.attrib)

You can find more details and examples on the link below: https://docs.python.org/3.5/library/xml.etree.elementtree.html

Sign up to request clarification or add additional context in comments.

Comments

6

Using ElementTree you can use find method & attrib .

Example:

import xml.etree.ElementTree as ET

z = """<xml>
    <child type = "smallHuman"/>
    <adult type = "largeHuman"/>
</xml>"""


treeOne = ET.fromstring(z)
print treeOne.find('./child').attrib['type']
print treeOne.find('./adult').attrib['type']

Output:

smallHuman
largeHuman

Comments

0

Another example using lxml library:

xml = '''<xml>
    <child type = "smallHuman"/>
    <adult type = "largeHuman"/>
</xml>'''

from lxml import etree as et

root = et.fromstring(xml)

# find attribute using xpath
child_type = root.xpath('//xml/child/@type')[0]
print(child_type)

adult_type = root.xpath('//xml/adult/@type')[0]
print(adult_type)

# combination of find / get
child_type = root.find('child').get('type')
adult_type = root.find('adult').get('type')

print(child_type)
print(adult_type)

Comments

0

Another example using SimplifiedDoc library:

from simplified_scrapy import SimplifiedDoc, utils
xml = '''<xml>
    <child type = "smallHuman"/>
    <adult type = "largeHuman"/>
</xml>'''
doc = SimplifiedDoc(xml).select('xml')

# first
child_type = doc.child['type']
print(child_type)

adult_type = doc.adult['type']
print(adult_type)

# second
child_type = doc.select('child').get('type')
adult_type = doc.select('adult').get('type')

print(child_type)
print(adult_type)

# second
child_type = doc.select('child>type()')
adult_type = doc.select('adult>type()')

print(child_type)
print(adult_type)

# third
nodes = doc.selects('child|adult>type()')
print(nodes)
# fourth
nodes = doc.children
print ([node['type'] for node in nodes])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.