0

https://drive.google.com/open?id=1M66WaMkwfkDoFW41MSwuKG4ZHyuSxFrO preview Guys its the first time working with XML. I've read many posts but still cant handle my data. In the link there is a part of my data (3 entities (mensaje) induced in the file. original one is about 35.000 entities). From those data I need to create a pandas dataframe .

Each line of the dt should be refer to one <mensaje> The first column has to be <numerosolicitud>********</numerosolicitud> second column <codigocliente>**********</codigocliente> and then i need one column for each <cuestionario><pregunta cod=***. There are 98 i think "cod" all same on all "mensajes". I need those "cod" as headers and Text if contained as value.

I believe that it is a basic task but after several days reading tutorials and posts still I need help. Any advice is highly appreciated.

2 Answers 2

2

I've made a package for similar use case. It could work here too.

pip install pandas_read_xml

you can do something like

import pandas_read_xml as pdx

df = pdx.read_xml('filename.xml', ['data', 'mansaje'])

To flatten, you could

df = pdx.flatten(df)

or

df = pdx.fully_flatten(df)
Sign up to request clarification or add additional context in comments.

Comments

0

I found the solution for my problem. Probably someone could make something more efficient but this code worked for me.

import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse(r"C:\path\of\your\file")
root = tree.getroot()
df = pd.Dataframe()
counter=1
for mensaje in root.iter('mensaje'):
    df.loc[counter, 'numerosolicitud'] =  mensaje.find(".//numerosolicitud").text if not None else None
    df.loc[counter, 'codigocliente'] =  mensaje.find(".//codigocliente").text if not None else None
    df.loc[counter, 'riesgocb'] =  mensaje.find(".//riesgocb").text    if not None else None
    nodes = mensaje.findall(".//pregunta")
    for child in nodes:
        df.loc[counter, str(child.attrib["cod"] )] =  str((child.text if not None else None))
    print(counter)
    counter+=1

df.to_excel("output.xlsx")  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.