0

With my code I can extract information from all my sessions. All is okey but somethimes I have an error: xml.etree.ElementTree.ParseError: mismatched tag: line 22, column 6. I want to edit my code to correct such errors. And if it is possible any error that there is from the execution of the function until the end of each session

# encoding=utf8
# -*- coding: utf-8 -*-
import random
import xml.etree.ElementTree as ET
import requests
from requests.auth import HTTPBasicAuth
import sys
import csv
reload(sys)
sys.setdefaultencoding('utf-8')
lista = []
number = str(random.random())

cuenta = ['@a.com', '@b.com', '@c','@d', '@e', '@f']
for item in cuenta:
    user = 'user{}'.format(item)
    passwd = 'pass'
    url = 'url'
    login = requests.get(url, auth=HTTPBasicAuth(user, passwd))
    url_sitios = 'url_sitios'
    sitios = requests.get(url_sitios, auth=HTTPBasicAuth(user, passwd))
    sitios2 = sitios.text
    root = ET.fromstring(sitios2)
    for s in root.findall('sitio'):
        id = s.find('sitio_id')
        fa = s.find('fecha_alta')
        i24 = s.find('*/item[@id="imps24ad"]')  # Impresiones Vendidas ultimas 24HS
        estado = s.find('estado')
        url = s.find('url')
        nombre = s.find('nombre')

        a = id.text  # id del Sitio
        b = fa.text  # Fecha de alta
        c = i24.text  # Ultimas 24hs Impresiones
        d = estado.text  # Estado
        e = url.text  # url
        f = nombre.text  # nombre

        sitio = str(a), str(b), str(c), str(d), str(e), str(f)

        sitio_ok = (list(sitio))
        ff = lista.append(sitio_ok)
        print ff

    requests.get('url_out' + number + '&o=xml', auth=HTTPBasicAuth(user, passwd))


with open("data.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(lista)
6
  • 1
    Wouldn't the question be exactly the same if your XML was coming from a file on disk or a variable hardcoded in your program rather than requests? It seems like there's significant room to reduce this question's scope, and thereby to improve its conformance with the minimal reproducible example definition (which calls for posting only the shortest possible complete, standalone code that lets someone reproduce a problem). Commented Jan 30, 2018 at 0:55
  • 2
    ...that said, this kind of request is generally problematic, because if your document is malformed, then a program necessarily needs to make guesses about the author's intent; and unlike a human, a machine isn't really equipped to consider context in evaluating actual intent. Commented Jan 30, 2018 at 0:58
  • 2
    (The case where it's not so problematic is if your content is really HTML, not XML -- in that case it's a matter of choosing a parser that understands the details of how HTML is permitted not to conform to the XML standard; lxml.html is a good place to start, or Beautiful Soup). Commented Jan 30, 2018 at 0:59
  • 1
    If it's truly XML and you need to parse it with etree then you have to fix the XML syntax before parsing it with etree. All conforming XML parsers are required to reject malformed documents because guessing at the correct form is explicitly out-of-scope in the standard. As @CharlesDuffy said, only relaxed HTML parsers are allowed to guess when presented with malformed documents, and the guesses are restricted by HTML convention and known usages. Commented Jan 30, 2018 at 1:03
  • 1
    You don't need to "convert" anything -- the simple, naive approach is just to try to parse the exact content you already have with a permissive HTML parser rather than an XML parser; two were already linked above. Other than that, the advice you're going to get boils down to "fix the program generating this content to actually output genuine XML". Commented Jan 30, 2018 at 1:21

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.