1

I have created the root like this:

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

And here's a sample of my XML:

<?xml version="1.0" encoding="UTF-8"?>
<feed gd:etag="&quot;Rn84fzVSLyt7I2A9XRVbFkwOQAE.&quot;" xmlns="http://www.w3.org/2005/Atom" xmlns:batch="http://schemas.google.com/gdata/batch" xmlns:gContact="http://schemas.google.com/contact/2008" xmlns:gd="http://schemas.google.com/g/2005" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/">
 <id>moha****[email protected]</id>
 <updated>2015-08-03T15:12:37.137Z</updated>
 <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#contact"/>
 <title>Mohammad Amin's Contacts</title>
 <link rel="alternate" type="text/html" href="https://www.google.com/"/>
 <link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/mohamma***ee%40gmail.com/full"/>
 <link rel="http://schemas.google.com/g/2005#post" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/mohamm***aee%40gmail.com/full"/>
 <link rel="http://schemas.google.com/g/2005#batch" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moha****ee%40gmail.com/full/batch"/>
 <link rel="self" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moham***ee%40gmail.com/full?max-results=25"/>
 <link rel="next" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moha****aee%40gmail.com/full?max-results=25&amp;start-index=26"/>
 <author>
  <name>Mohammad Amin</name>
  <email>moha****[email protected]</email>
 </author>
 <generator version="1.0" uri="http://www.google.com/m8/feeds">Contacts</generator>
 <openSearch:totalResults>131</openSearch:totalResults>
 <openSearch:startIndex>1</openSearch:startIndex>
 <openSearch:itemsPerPage>25</openSearch:itemsPerPage>
 <entry gd:etag="&quot;SXc5cTNQJit7I2A9XRRbGEsPQQY.&quot;">
  <id>http://www.google.com/m8/feeds/contacts/moh***ee%40gmail.com/base/15281000e768a31</id>
  <updated>2015-04-12T19:07:08.929Z</updated>
  <app:edited xmlns:app="http://www.w3.org/2007/app">2015-04-12T19:07:08.929Z</app:edited>
  <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#contact"/>
  <title>Sina Ghazi</title>
  <link rel="http://schemas.google.com/contacts/2008/rel#photo" type="image/*" href="https://www.google.com/m8/feeds/photos/media/moh***aee%40gmail.com/15****a31" gd:etag="&quot;WR1-e34pSit7I2BlWW4TbChNHHg6LF88WhE.&quot;"/>
  <link rel="self" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/moham****aee%40gmail.com/full/1528****8a31"/>
  <link rel="edit" type="application/atom+xml" href="https://www.google.com/m8/feeds/contacts/mohamm***ee%40gmail.com/full/15***a31"/>
  <gd:name>
   <gd:fullName>Si***i</gd:fullName>
   <gd:givenName>Si***a</gd:givenName>
   <gd:familyName>G***zi</gd:familyName>
  </gd:name>
  <gd:email rel="http://schemas.google.com/g/2005#home" address="si***[email protected]" primary="true"/>
  <gContact:website href="http://www.google.com/profiles/1167****31" rel="profile"/>
 </entry>
.....

I'm using XPath and I can extract the address attribute quite easily.

for item in root.findall('.//{http://schemas.google.com/g/2005}email'):
        email = item.get('address')

But when I want to get the title attribute it returns None. Any ideas?

3
  • show the code you used to extract the address tag. Btw, I can't find any address tag, did you mean attribute? Commented Aug 5, 2015 at 7:40
  • Yes. You're right. It was address attribute. Commented Aug 5, 2015 at 7:49
  • 1
    There is no "title" attribute in the XML. But there is a {http://www.w3.org/2005/Atom}title element in two places. Commented Aug 5, 2015 at 7:59

2 Answers 2

3

There is a section in the python documentation about parsing xml with namespaces.

You could either use har07s way, which works perfectly well, or you could do it like this if you don't want to type the whole namespace multiple times:

ns = {'ns': 'http://www.w3.org/2005/Atom'}

for element in root.findall('.//ns:title', ns):
    title = element.text
Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way I can loop through the file and extract each item with it's order? I don't want to extract emails and then titles. Can I do it at the same time?
@AminA You can iterate over <entry> nodes and then extract email address and title from each.
1

You can try this way :

for item in root.findall('.//{http://www.w3.org/2005/Atom}title'):
    title = item.text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.