Extracting from XML using Regex

Question

Need a regular expression to catch the everything nested in target tag.

<?xml version="1.0" encoding="utf-8"?>
<data>
<target>
"<x id="c400c8394f0a"  pid="NLCaption" name="NLCaption" />Caption"
</target>

<target />

<target><x id="a1e6b03cb682"  pid="NLSheets" name="NLSheets" />Sheets"</target>

</data>

Thanks to Brettz, Who helped me in writing the following regular expression

$pattern = "@<target(?:\s.*?)?>(.*?)</target\s*>@s";

This regular expression does the job and help me getting all the content. But the only problem is that it also catches the <target /> tag as well.

I wan to modify the regular expression that donot catches unpaired tag. i.e. <target />

Please help me

Manse · Accepted Answer · 2012-04-20 17:13:16Z

3

Use SimpleXML

$data = new SimpleXMLElement($xmlstr);
echo $data->target[0];

Here is an example of using SimpleXML with your XML

edited Apr 20, 2012 at 17:13

answered Apr 20, 2012 at 17:05

Manse

38.1k11 gold badges88 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Shahid Over a year ago

Thanks! you might be right. But right now I am need to fix this regular expression.. Basically, I wan to neutralize the nested tags in a target node.

Manse Over a year ago

@Shahid Parsing XML using an actual XML Parser is a lot simpler (as you are experiencing) that using Regex to do the same think ... use the right tool for the job ...

Shahid Over a year ago

I need to create a string like as follow and this will also need to written to actual file (i.e. update its content) and I was not able to do this using DOM parsing

<target> &quot;&lt;x id=&quot;c400c8394f0a&quot;  pid=&quot;NLCaption&quot; name="NLCaption" /&gt;Caption&quot; </target>

Manse Over a year ago

@Shahid put your required output in your question (formatted of course).... with SimpleXML everything in the XMl is accesible - you just need to extract it and output to the desired format

Shahid Over a year ago

How can get the inner content + nested tags of target node in one variable??? So that I can then convert the values to htmlentites equivalent. "<x id="c400c8394f0a" pid="NLCaption" name="NLCaption" />Caption"

Shahid · Accepted Answer · 2012-04-21 06:51:36Z

1

$tagname = 'target';
$pattern = "@<$tagname(?:\s.*?!/)?>(.*?)</$tagname\s*>@s";

answered Apr 21, 2012 at 6:51

Shahid

1,0811 gold badge10 silver badges14 bronze badges

Comments

squarephoenix · Accepted Answer · 2012-04-20 17:01:11Z

0

$pattern = "(?<=<target>).+(?=</target>)";

answered Apr 20, 2012 at 17:01

squarephoenix

1,0037 silver badges7 bronze badges

Comments

Andrew · Accepted Answer · 2012-04-20 17:22:23Z

0

You can change the end of the first section to only allow spaces and no other characters:

<target\s*>(.*?)</target\s*>

The xml parser is almost certainly still the right long-term solution, but this is a quick way to get your code working.

answered Apr 20, 2012 at 17:22

Andrew

1,3571 gold badge17 silver badges31 bronze badges

3 Comments

Shahid Over a year ago

$tagname = 'target'; $pattern = "<$tagname\s*>(.*?)</$tagname\s*>"; $content = preg_replace_callback($pattern, html_entities, $xml); Unknown modifier '(' in.....???

Shahid Over a year ago

I want to only include in first section that [^/] i.e. / cannot occur in this section so how can do this???

Andrew Over a year ago

Check the syntax for preg_replace: change your code to $pattern = "#<" . $tagname . "\s*>(.*?)</" . $tagname . "\s*>#is";

Collectives™ on Stack Overflow

Extracting from XML using Regex

4 Answers 4

5 Comments

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related