Python parsing XML

Question

I have created the root like this:

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

And here's a sample of my XML:

<?xml version="1.0" encoding="UTF-8"?>
<feed gd:etag="&quot;Rn84fzVSLyt7I2A9XRVbFkwOQAE.&quot;" xmlns="http://www.w3.org/2005/Atom" xmlns:batch="http://schemas.google.com/gdata/batch" xmlns:gContact="http://schemas.google.com/contact/2008" xmlns:gd="http://schemas.google.com/g/2005" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/">
 <id>moha****[email protected]</id>
 <updated>2015-08-03T15:12:37.137Z</updated>
 <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#contact"/>
 <title>Mohammad Amin's Contacts</title>
 <link rel="alternate" type="text/html" href="https://www.google.com/"/>
 <link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/mohamma***ee%40gmail.com/full"/>
 <link rel="http://schemas.google.com/g/2005#post" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/mohamm***aee%40gmail.com/full"/>
 <link rel="http://schemas.google.com/g/2005#batch" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moha****ee%40gmail.com/full/batch"/>
 <link rel="self" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moham***ee%40gmail.com/full?max-results=25"/>
 <link rel="next" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moha****aee%40gmail.com/full?max-results=25&amp;start-index=26"/>
 <author>
  <name>Mohammad Amin</name>
  <email>moha****[email protected]</email>
 </author>
 <generator version="1.0" uri="http://www.google.com/m8/feeds">Contacts</generator>
 <openSearch:totalResults>131</openSearch:totalResults>
 <openSearch:startIndex>1</openSearch:startIndex>
 <openSearch:itemsPerPage>25</openSearch:itemsPerPage>
 <entry gd:etag="&quot;SXc5cTNQJit7I2A9XRRbGEsPQQY.&quot;">
  <id>http://www.google.com/m8/feeds/contacts/moh***ee%40gmail.com/base/15281000e768a31</id>
  <updated>2015-04-12T19:07:08.929Z</updated>
  <app:edited xmlns:app="http://www.w3.org/2007/app">2015-04-12T19:07:08.929Z</app:edited>
  <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#contact"/>
  <title>Sina Ghazi</title>
  <link rel="http://schemas.google.com/contacts/2008/rel#photo" type="image/*" href="https://www.google.com/m8/feeds/photos/media/moh***aee%40gmail.com/15****a31" gd:etag="&quot;WR1-e34pSit7I2BlWW4TbChNHHg6LF88WhE.&quot;"/>
  <link rel="self" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moham****aee%40gmail.com/full/1528****8a31"/>
  <link rel="edit" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/mohamm***ee%40gmail.com/full/15***a31"/>
  <gd:name>
   <gd:fullName>Si***i</gd:fullName>
   <gd:givenName>Si***a</gd:givenName>
   <gd:familyName>G***zi</gd:familyName>
  </gd:name>
  <gd:email rel="http://schemas.google.com/g/2005#home" address="si***[email protected]" primary="true"/>
  <gContact:website href="http://www.google.com/profiles/1167****31" rel="profile"/>
 </entry>
.....

I'm using XPath and I can extract the address attribute quite easily.

for item in root.findall('.//{http://schemas.google.com/g/2005}email'):
        email = item.get('address')

But when I want to get the title attribute it returns None. Any ideas?

show the code you used to extract the address tag. Btw, I can't find any address tag, did you mean attribute? — har07
– har07, Commented Aug 5, 2015 at 7:40
There is no "title" attribute in the XML. But there is a {http://www.w3.org/2005/Atom}title element in two places. — mzjn
– mzjn, Commented Aug 5, 2015 at 7:59

tobifasc · Accepted Answer · 2015-08-05 08:15:08Z

3

There is a section in the python documentation about parsing xml with namespaces.

You could either use har07s way, which works perfectly well, or you could do it like this if you don't want to type the whole namespace multiple times:

ns = {'ns': 'http://www.w3.org/2005/Atom'}

for element in root.findall('.//ns:title', ns):
    title = element.text

answered Aug 5, 2015 at 8:15

tobifasc

7655 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user5182158 Over a year ago

Is there a way I can loop through the file and extract each item with it's order? I don't want to extract emails and then titles. Can I do it at the same time?

BlackJack Over a year ago

@AminA You can iterate over <entry> nodes and then extract email address and title from each.

har07 · Accepted Answer · 2015-08-05 07:58:26Z

1

You can try this way :

for item in root.findall('.//{http://www.w3.org/2005/Atom}title'):
    title = item.text

answered Aug 5, 2015 at 7:58

har07

89.5k12 gold badges87 silver badges143 bronze badges

Collectives™ on Stack Overflow

Python parsing XML

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related