What would be the best way to construct a DF from the below nested XML data?
Each "properties" element has three "property" elements nested containing the "name" and "value" of our data. I tried doing two for loops, pandas read_xml option, and a few other pieces but haven't quite gotten the nested logic figured out. My current approach below is closer, but does not keep the names and values together.
Using Python 3.7+ in Jupyter on windows
Sample XML Data:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing"
xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"
xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">
<env:Header
xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<wsa:Action>RetrieveResponse</wsa:Action>
<wsa:MessageID>urn:uuid:1234</wsa:MessageID>
<wsa:RelatesTo>urn:uuid:1234</wsa:RelatesTo>
<wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To>
<wsse:Security>
<wsu:Timestamp wsu:Id="Timestamp-45333">
<wsu:Created>2022-11-07T17:02:44Z</wsu:Created>
<wsu:Expires>2022-11-07T17:07:44Z</wsu:Expires>
</wsu:Timestamp>
</wsse:Security>
</env:Header>
<soap:Body>
<RetrieveResponseMsg
xmlns="http://exacttarget.com/wsdl/partnerAPI">
<OverallStatus>MoreDataAvailable</OverallStatus>
<RequestID>asdfds455</RequestID>
<Results xsi:type="DataExtensionObject">
<PartnerKey xsi:nil="true" />
<ObjectID xsi:nil="true" />
<Type>DataExtensionObject</Type>
<Properties>
<Property>
<Name>FIELD_NAME</Name>
<Value>asdfdfd12</Value>
</Property>
<Property>
<Name>FIELD_NAME_2</Name>
<Value>asdf</Value>
</Property>
<Property>
<Name>FIELD_NAME_3</Name>
<Value>fasdsa</Value>
</Property>
</Properties>
</Results>
<Results xsi:type="DataExtensionObject">
<PartnerKey xsi:nil="true" />
<ObjectID xsi:nil="true" />
<Type>DataExtensionObject</Type>
<Properties>
<Property>
<Name>FIELD_NAME</Name>
<Value>fasd123</Value>
</Property>
<Property>
<Name>FIELD_NAME_2</Name>
<Value>asdfd</Value>
</Property>
<Property>
<Name>FIELD_NAME_3</Name>
<Value>a0A4f</Value>
</Property>
</Properties>
</Results>
<Results xsi:type="DataExtensionObject">
<PartnerKey xsi:nil="true" />
<ObjectID xsi:nil="true" />
<Type>DataExtensionObject</Type>
<Properties>
<Property>
<Name>FIELD_NAME</Name>
<Value>0034P00</Value>
</Property>
<Property>
<Name>FIELD_NAME_2</Name>
<Value>fasdfs</Value>
</Property>
<Property>
<Name>FIELD_NAME_3</Name>
<Value>a0fasd</Value>
</Property>
</Properties>
</Results>
</RetrieveResponseMsg>
</soap:Body>
</soap:Envelope>
What I've Attempted So Far:
data_output = []
for el in soup_de.find_all('Property'):
dict_ = {el.find('Name').text:el.find('Value').text}
data_output.append(dict_)
print(len(data_output))
# print(data_output)
testing_de_df = pd.DataFrame(data_output)
display(testing_de_df.info())
display(testing_de_df.head(25))
Desired Output:
details = {'FIELD_NAME': ['asdfdfd12', 'fasd123', '0034P00'],
'FIELD_NAME_2': ['asdf', 'asdfd', 'fasdfs'],
'FIELD_NAME_3': ['fasdsa', 'a0A4f', 'a0fasd']}
desired_output = pd.DataFrame(details)
print(desired_output)