Fix XML syntax errors in Python

Ask Question

Asked 7 years, 10 months ago

Modified 7 years, 10 months ago

Viewed 788 times

With my code I can extract information from all my sessions. All is okey but somethimes I have an error: xml.etree.ElementTree.ParseError: mismatched tag: line 22, column 6. I want to edit my code to correct such errors. And if it is possible any error that there is from the execution of the function until the end of each session

# encoding=utf8
# -*- coding: utf-8 -*-
import random
import xml.etree.ElementTree as ET
import requests
from requests.auth import HTTPBasicAuth
import sys
import csv
reload(sys)
sys.setdefaultencoding('utf-8')
lista = []
number = str(random.random())

cuenta = ['@a.com', '@b.com', '@c','@d', '@e', '@f']
for item in cuenta:
    user = 'user{}'.format(item)
    passwd = 'pass'
    url = 'url'
    login = requests.get(url, auth=HTTPBasicAuth(user, passwd))
    url_sitios = 'url_sitios'
    sitios = requests.get(url_sitios, auth=HTTPBasicAuth(user, passwd))
    sitios2 = sitios.text
    root = ET.fromstring(sitios2)
    for s in root.findall('sitio'):
        id = s.find('sitio_id')
        fa = s.find('fecha_alta')
        i24 = s.find('*/item[@id="imps24ad"]')  # Impresiones Vendidas ultimas 24HS
        estado = s.find('estado')
        url = s.find('url')
        nombre = s.find('nombre')

        a = id.text  # id del Sitio
        b = fa.text  # Fecha de alta
        c = i24.text  # Ultimas 24hs Impresiones
        d = estado.text  # Estado
        e = url.text  # url
        f = nombre.text  # nombre

        sitio = str(a), str(b), str(c), str(d), str(e), str(f)

        sitio_ok = (list(sitio))
        ff = lista.append(sitio_ok)
        print ff

    requests.get('url_out' + number + '&o=xml', auth=HTTPBasicAuth(user, passwd))


with open("data.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(lista)

edited Jan 30, 2018 at 0:55

Charles Duffy

299k43 gold badges442 silver badges498 bronze badges

asked Jan 30, 2018 at 0:47

Martin Bouhier

3612 gold badges7 silver badges19 bronze badges

1

Wouldn't the question be exactly the same if your XML was coming from a file on disk or a variable hardcoded in your program rather than requests? It seems like there's significant room to reduce this question's scope, and thereby to improve its conformance with the minimal reproducible example definition (which calls for posting only the shortest possible complete, standalone code that lets someone reproduce a problem).

Charles Duffy
– Charles Duffy

2018-01-30 00:55:57 +00:00
Commented Jan 30, 2018 at 0:55
2

...that said, this kind of request is generally problematic, because if your document is malformed, then a program necessarily needs to make guesses about the author's intent; and unlike a human, a machine isn't really equipped to consider context in evaluating actual intent.

Charles Duffy
– Charles Duffy

2018-01-30 00:58:45 +00:00
Commented Jan 30, 2018 at 0:58
2

(The case where it's not so problematic is if your content is really HTML, not XML -- in that case it's a matter of choosing a parser that understands the details of how HTML is permitted not to conform to the XML standard; lxml.html is a good place to start, or Beautiful Soup).

Charles Duffy
– Charles Duffy

2018-01-30 00:59:57 +00:00
Commented Jan 30, 2018 at 0:59
1

If it's truly XML and you need to parse it with etree then you have to fix the XML syntax before parsing it with etree. All conforming XML parsers are required to reject malformed documents because guessing at the correct form is explicitly out-of-scope in the standard. As @CharlesDuffy said, only relaxed HTML parsers are allowed to guess when presented with malformed documents, and the guesses are restricted by HTML convention and known usages.

Jim Garrison
– Jim Garrison

2018-01-30 01:03:49 +00:00
Commented Jan 30, 2018 at 1:03
1

You don't need to "convert" anything -- the simple, naive approach is just to try to parse the exact content you already have with a permissive HTML parser rather than an XML parser; two were already linked above. Other than that, the advice you're going to get boils down to "fix the program generating this content to actually output genuine XML".

Charles Duffy
– Charles Duffy

2018-01-30 01:21:34 +00:00
Commented Jan 30, 2018 at 1:21

| Show 1 more comment

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Fix XML syntax errors in Python

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked