1

I've got svn log XML file and I want to retrieve changed files.

<Paths> can consist one or many child elements named <Path>.

In this case I want to retrieve /trunk/server/sub/sub1/scripts/fix/filename.sql.

Content of issues.xml (fragment):

<paths>
    <path
        action="A"
        prop-mods="false"
        text-mods="true"
        kind="file">/trunk/server/sub/sub1/scripts/fix/filename.sql</path>
</paths>

To do that, I am using following bash script:

#!bin/bash
filenames=($(grep -oP '<path[^>]*>(.+?)<\/path>' "issues.xml"))
echo $filenames

The output of this script is empty. I have no clue why. I've tried to output all array elements in a loop but that didn't work, too.

Any advice?

1
  • 2
    Do not use text processors for parsing XML, use a proper XML parser like xmllint or xmlstarlet Commented Jan 30, 2017 at 10:40

2 Answers 2

3

Using standard text processors for parsing XML is generally NOT recommended.

Suggest use a proper XML parser like xmllint or xmlstarlet which deals with your file even when the original file goes thorough a formatting change (e.g. a new white-space getting added) the regex used for the extraction needs to undergo a change.

Using xmllint with an xpath expression is literally too-easy. For your given input file, just do,

xmllint --xpath 'string(//path)' file
/trunk/server/sub/sub1/scripts/fix/filename.sql

Steps to download and install xmllint are pretty straight-forward.

Sign up to request clarification or add additional context in comments.

2 Comments

Unfortunately, I can not install additional libraries in my situation. Thank you for your solution anyway!
@Alex: Fair enough, but am going to keep this answer, if its fine. Because the below regex solution works only for your input provided, if your input file formatting changes, the solution needs to be changed again, but that is not the case with xmllint
0

May be you can try with this:

grep -oP '([^>]*)(?=</path>)' file

2 Comments

Thank you very much! Can you explain more why your regex works and mine doesn't?
Your regex is matching against two different lines of input. grep will work with one line at the time. So, I have used lookahead method to match only .*</path>.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.