0

Using Python or XSLT, I would like to know how to convert highly complex, hierarchical nested XML file to CSV including all the sub-elements and without hard coding as few element nodes as possible or is rational/effective?

Please find attached simplified XML example and the output CSV to get a better understanding of what I’m trying to achieve.

The actual XML file has much more elements but the data hierarchy and the nesting is like in the example. <InvoiceRow> element and its sub-elements are the only repeating elements in the XML file, all the other elements are static that are repeated in the output CSV as many times as there are <InvoiceRow> elements in the XML file.

It’s the repeating <InvoiceRow> element that is causing trouble for me. Elements that don’t repeat are easy to convert to CSV without hard coding any elements.

Complex XML scenarios, with hierarchical data structures and multiple one-to-many relationships all being stored in a single XML file. Structured text file.

Example XML input:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Invoice>
    <SellerDetails>
        <Identifier>1234-1</Identifier>
        <SellerAddress>
            <SellerStreet>Street1</SellerStreet>
            <SellerTown>Town1</SellerTown>
        </SellerAddress>
    </SellerDetails>
    <BuyerDetails>
        <BuyerIdentifier>1234-2</BuyerIdentifier>
        <BuyerAddress>
            <BuyerStreet>Street2</BuyerStreet>
            <BuyerTown>Town2</BuyerTown>
        </BuyerAddress>
    </BuyerDetails>
    <BuyerNumber>001234</BuyerNumber>
    <InvoiceDetails>
        <InvoiceNumber>0001</InvoiceNumber>
    </InvoiceDetails>
    <InvoiceRow>
        <ArticleName>Article1</ArticleName>
        <RowText>Product Text1</RowText>
        <RowText>Product Text2</RowText>
        <RowAmount AmountCurrencyIdentifier="EUR">10.00</RowAmount>
    </InvoiceRow>
    <InvoiceRow>
        <ArticleName>Article2</ArticleName>
        <RowText>Product Text11</RowText>
        <RowText>Product Text22</RowText>
        <RowAmount AmountCurrencyIdentifier="EUR">20.00</RowAmount>
    </InvoiceRow>
    <InvoiceRow>
        <ArticleName>Article3</ArticleName>
        <RowText>Product Text111</RowText>
        <RowText>Product Text222</RowText>
        <RowAmount AmountCurrencyIdentifier="EUR">30.00</RowAmount>
    </InvoiceRow>
    <EpiDetails>
        <EpiPartyDetails>
            <EpiBfiPartyDetails>
                <EpiBfiIdentifier IdentificationSchemeName="BIC">XXXXX</EpiBfiIdentifier>
            </EpiBfiPartyDetails>
        </EpiPartyDetails>
    </EpiDetails>
    <InvoiceUrlText>Some text</InvoiceUrlText>
</Invoice>


Example CSV output:

Identifier,SellerStreet,SellerTown,BuyerIdentifier,BuyerStreet,BuyerTown,BuyerNumber,InvoiceNumber,ArticleName,RowText,RowText,RowAmount,EpiBfiIdentifier,InvoiceUrlText
1234-1,Street1,Town1,1234-2,Street2,Town2,1234,1,Article1,Product Text1,Product Text2,10,XXXXX,Some text
1234-1,Street1,Town1,1234-2,Street2,Town2,1234,1,Article2,Product Text11,Product Text22,20,XXXXX,Some text
1234-1,Street1,Town1,1234-2,Street2,Town2,1234,1,Article3,Product Text111,Product Text222,30,XXXXX,Some text

2 Answers 2

2

Consider the following example:

XML

<Invoice>
    <SellerDetails>
        <Identifier>1234-1</Identifier>
        <SellerAddress>
            <SellerStreet>Street1</SellerStreet>
            <SellerTown>Town1</SellerTown>
        </SellerAddress>
    </SellerDetails>
    <BuyerDetails>
        <BuyerIdentifier>1234-2</BuyerIdentifier>
        <BuyerAddress>
            <BuyerStreet>Street2</BuyerStreet>
            <BuyerTown>Town2</BuyerTown>
        </BuyerAddress>
    </BuyerDetails>
    <BuyerNumber>001234</BuyerNumber>
    <InvoiceDetails>
        <InvoiceNumber>0001</InvoiceNumber>
    </InvoiceDetails>
    <InvoiceRow>
        <ArticleName>Article1</ArticleName>
        <RowText>Product Text1</RowText>
        <RowText>Product Text2</RowText>
        <RowAmount AmountCurrencyIdentifier="EUR">10.00</RowAmount>
    </InvoiceRow>
    <InvoiceRow>
        <ArticleName>Article2</ArticleName>
        <RowText>Product Text11</RowText>
        <RowText>Product Text22</RowText>
        <RowAmount AmountCurrencyIdentifier="EUR">20.00</RowAmount>
    </InvoiceRow>
    <InvoiceRow>
        <ArticleName>Article3</ArticleName>
        <RowText>Product Text111</RowText>
        <RowText>Product Text222</RowText>
        <RowAmount AmountCurrencyIdentifier="EUR">30.00</RowAmount>
    </InvoiceRow>
    <EpiDetails>
        <EpiPartyDetails>
            <EpiBfiPartyDetails>
                <EpiBfiIdentifier IdentificationSchemeName="BIC">XXXXX</EpiBfiIdentifier>
            </EpiBfiPartyDetails>
        </EpiPartyDetails>
    </EpiDetails>
    <InvoiceUrlText>Some text</InvoiceUrlText>
</Invoice>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:template match="Invoice">
    <xsl:variable name="common-head">
        <xsl:value-of select="SellerDetails/Identifier"/>
        <xsl:text>,</xsl:text>
        <xsl:value-of select="BuyerDetails/BuyerIdentifier"/>
        <xsl:text>,</xsl:text>
        <xsl:value-of select="InvoiceDetails/InvoiceNumber"/>
        <xsl:text>,</xsl:text>
        <!-- add more here -->
    </xsl:variable>
    <xsl:variable name="common-tail">
        <xsl:value-of select="EpiDetails/EpiPartyDetails/EpiBfiPartyDetails/EpiBfiIdentifier"/>
        <xsl:text>,</xsl:text>
        <!-- add more here -->
        <xsl:value-of select="InvoiceUrlText"/>
    </xsl:variable>
    <!-- header -->
    <xsl:text>SellerIdentifier,BuyerIdentifier,InvoiceNumber,ArticleName,RowText,RowText,RowAmount,EpiBfiIdentifier,InvoiceUrlText&#10;</xsl:text>
    <!-- data -->
    <xsl:for-each select="InvoiceRow">
        <xsl:copy-of select="$common-head"/>
        <xsl:value-of select="ArticleName"/>
        <xsl:text>,</xsl:text>  
        <xsl:value-of select="RowAmount"/>
        <xsl:text>,</xsl:text>  
        <!-- add more here -->
        <xsl:copy-of select="$common-tail"/>
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Result

SellerIdentifier,BuyerIdentifier,InvoiceNumber,ArticleName,RowText,RowText,RowAmount,EpiBfiIdentifier,InvoiceUrlText
1234-1,1234-2,0001,Article1,10.00,XXXXX,Some text
1234-1,1234-2,0001,Article2,20.00,XXXXX,Some text
1234-1,1234-2,0001,Article3,30.00,XXXXX,Some text

Added in response to:

Is there a way in XSLT to get the same results using loop? For example loop through and output all the elements and the sub-elements except the InvoiceRow elements and then vice versa?

If you prefer, you could try something like:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:template match="Invoice">
    <xsl:variable name="invoice-fields" select="//*[not(*) and not(ancestor::InvoiceRow)]" />
    <xsl:variable name="common-data">
        <xsl:for-each select="$invoice-fields">
            <xsl:value-of select="."/>
            <xsl:text>,</xsl:text>  
        </xsl:for-each> 
    </xsl:variable>
    <!-- header -->
    <xsl:for-each select="$invoice-fields">
        <xsl:value-of select="name()"/>
        <xsl:text>,</xsl:text>  
    </xsl:for-each>
    <xsl:for-each select="InvoiceRow[1]/*">
        <xsl:value-of select="name()"/>
        <xsl:if test="position()!=last()">,</xsl:if>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>
    <!-- data -->
    <xsl:for-each select="InvoiceRow">
        <xsl:copy-of select="$common-data"/>
        <xsl:for-each select="*">
            <xsl:value-of select="."/>
            <xsl:if test="position()!=last()">,</xsl:if>
        </xsl:for-each> 
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

The result here would be:

Identifier,SellerStreet,SellerTown,BuyerIdentifier,BuyerStreet,BuyerTown,BuyerNumber,InvoiceNumber,EpiBfiIdentifier,InvoiceUrlText,ArticleName,RowText,RowText,RowAmount
1234-1,Street1,Town1,1234-2,Street2,Town2,001234,0001,XXXXX,Some text,Article1,Product Text1,Product Text2,10.00
1234-1,Street1,Town1,1234-2,Street2,Town2,001234,0001,XXXXX,Some text,Article2,Product Text11,Product Text22,20.00
1234-1,Street1,Town1,1234-2,Street2,Town2,001234,0001,XXXXX,Some text,Article3,Product Text111,Product Text222,30.00

i.e. listing all invoice fields before the row fields.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much for the great example, really appreciate it. Your XSLT method works great but was hoping little less hardcoded elements. Is there a way in XSLT to get the same results using loop? For example loop through and output all the elements and the sub-elements except the InvoiceRow elements and then vice versa?
I suppose it's possible, but I don't see the advantage. And it would be difficult to maintain the output order you have shown in your question.
I have added an example to my answer - as you can see it's not very elegant and IMHO it will be harder to maintain.
0

I have done similar case like your requirements, I have created one package base on untangle, a package which can parse your XML to pure python objects like:

<?xml version="1.0"?>
<root>
    <child name="child1"/>
</root>

to

obj.root.child['name'] # u'child1'

then you can easily write some code to traverse the object to get what you want. For example, you can do something like get_items_by_tag(InvoiceRow). Hope it helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.