0

Here is sample file and we need to convert values into delimiter formatted file :-

test.xml

<?xml version="1.0" encoding="UTF-8" ?>
 <testjar>

 <testable>
 <trigger>Trigger1</trigger>
 <message>2012-06-14T00:03.54</message>
 <sales-info>
 <san-a>no</san-a>
 <san-b>no</san-b>
 <san-c>no</san-c>
 </sales-info>
 </testable>


  <testable>
  <trigger>Trigger2</trigger>
  <message>2012-06-15T00:03.54</message>
  <sales-info>
  <san-a>yes</san-a>
  <san-b>yes</san-b>
  <san-c>no</san-c>
  </sales-info>
 </testable>

 </testjar>

Each record should start on new line. Sample result set should be something like this sample.txt

Trigger1|2012-06-14T00:03.54|no|no|no  
Trigger2|2012-06-15T00:03.54|yes|yes|no

Note :- xmlstarlet is not installed on my server, is it possible to perform this without xmlstarlet?

1
  • 2
    Please fix <message->? Commented Jul 26, 2012 at 8:36

3 Answers 3

1

Here is an XSLT stylesheet that does what you want (saved in test.xsl):

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

<xsl:output method="text"/>
<xsl:strip-space elements="*"/>

 <xsl:template match="testable">
   <xsl:value-of select='trigger'/><xsl:text>|</xsl:text>
   <xsl:value-of select='message'/><xsl:text>|</xsl:text>
   <xsl:value-of select='sales-info/san-a'/><xsl:text>|</xsl:text>
   <xsl:value-of select='sales-info/san-b'/><xsl:text>|</xsl:text>
   <xsl:value-of select='sales-info/san-c'/><xsl:text>&#xA;</xsl:text>
 </xsl:template>

</xsl:stylesheet>

Command (here I am assuming that you have libxml2 and libxslt installed; xsltproc is a command line tool that uses these libraries):

xsltproc -o sample.txt test.xsl test.xml

Contents of sample.txt:

Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no
Sign up to request clarification or add additional context in comments.

1 Comment

xmlstarlet is not installed at my server.. Is it possible to without xmlstarlet?
1

Here's a pure bash solution:

egrep '<trigger>|<message>|<san-.>' test.xml | sed -e 's/<[^>]*>//g' | while read line; do [ $((++i % 5)) -ne 0 ] && echo -n "$line|" || echo $line ; done

However, it only works on a file formatted as in your sample (each element in a separate row), it's not even closely as flexible / reliable as the other answers involving proper XML parsing / transforming.

It can be enhanced to some extent though...

Comments

1

If you have xmlstarlet installed, you can try:

me@home$ xmlstarlet sel -t -m "//testable" -v trigger -o "|" -v message -o "|" -m sales-info -v san-a -o "|" -v san-b -o "|" -v san-c -n test.xml
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no

Breakdown of the command:

xmlstarlet sel -t 
    -m "//testable"       # match <testable>
      -v trigger -o "|"     # print out value of <trigger> followed by |
      -v message -o "|"     # print out value of <message> followed by | 
      -m sales-info         # match <sales-info>
        -v san-a -o "|"       # print out value of <san-a> followed by |
        -v san-b -o "|"       # print out value of <san-b> followed by | 
        -v san-c              # print out value of <san-c>
    -n                   # print new line
    test.xml             # INPUT XML FILE

To target tags that varies within <testable>, you can try the following which returns the text of all leaf nodes:

ma@home$ xmlstarlet sel -t -m "//testable" -m "descendant::*[not(*)]" -v 'text()' -i 'not(position()=last())' -o '|' -b -b -n test.xml 
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no

Beakdown of the command:

xmlstarlet sel -t 
    -m "//testable"                         # match <testable>
      -m "descendant::*[not(*)]"              # match all leaf nodes
        -v 'text()'                             # print text
        -i 'not(position()=last())' -o '|'      # print | if not last item
        -b -b                                   # break out of nested matches
    -n                                      # print new line
    test.xml                                # INPUT XML FILE

If you do not have access to xmlstarlet, then do look up what other tools you have at your disposal. Other options would include xsltproc (see mzjn's answer) and xpath.

If those tools are not available, I would suggest using a higher level language (Python, Perl) which gives you access to a proper XML library.

While it is possible to parse it manually using regex, such a solution would not be ideal especially with inconsistent inputs. For example, the following (assuming you have gawk and sed) takes your input and should spits out the expected output:

me@home$ gawk 'match($0, />(.*)</, a){printf("%s|",a[1])} /<\/testable>/{print ""}' test.xml | sed 's/.$//'
Trigger1|2012-06-14T00:03.54|no|no|no
Trigger2|2012-06-15T00:03.54|yes|yes|no

However, this would fail miserably if the input format changes and is therefore not a solution I would generally recommend.

7 Comments

catch here is my file(xml tages will increase or decrease)will keep changing..is there a command which can take care of this?
Do you mean the tags within <testable> is always different?
Yeah.. but we can store tages in another file and fetch that info here...we can manage that.. Big issue is unfortunately I dont have xmlstarlet at my server :-( Can this be possible without xmlstarlet?
What OS are you using? And are you allowed to install additional tools from the default package manager? (I'm trying to avoid a regex/text-parsing approach here since it can be unreliable especially if the input format is not consistent)
Linux version 2.6.9-100.ELsmp. I m not allowed to install additional tools. Thanks
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.