0

Need a regular expression to catch the everything nested in target tag.

<?xml version="1.0" encoding="utf-8"?>
<data>
<target>
"<x id="c400c8394f0a"  pid="NLCaption" name="NLCaption" />Caption"
</target>

<target />

<target><x id="a1e6b03cb682"  pid="NLSheets" name="NLSheets" />Sheets"</target>

</data>

Thanks to Brettz, Who helped me in writing the following regular expression

$pattern = "@<target(?:\s.*?)?>(.*?)</target\s*>@s";

This regular expression does the job and help me getting all the content. But the only problem is that it also catches the <target /> tag as well.

I wan to modify the regular expression that donot catches unpaired tag. i.e. <target />

Please help me

4 Answers 4

3

Use SimpleXML

$data = new SimpleXMLElement($xmlstr);
echo $data->target[0];

Here is an example of using SimpleXML with your XML

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! you might be right. But right now I am need to fix this regular expression.. Basically, I wan to neutralize the nested tags in a target node.
@Shahid Parsing XML using an actual XML Parser is a lot simpler (as you are experiencing) that using Regex to do the same think ... use the right tool for the job ...
I need to create a string like as follow and this will also need to written to actual file (i.e. update its content) and I was not able to do this using DOM parsing <target> &quot;&lt;x id=&quot;c400c8394f0a&quot; pid=&quot;NLCaption&quot; name="NLCaption" /&gt;Caption&quot; </target>
@Shahid put your required output in your question (formatted of course).... with SimpleXML everything in the XMl is accesible - you just need to extract it and output to the desired format
How can get the inner content + nested tags of target node in one variable??? So that I can then convert the values to htmlentites equivalent. "<x id="c400c8394f0a" pid="NLCaption" name="NLCaption" />Caption"
1
$tagname = 'target';
$pattern = "@<$tagname(?:\s.*?!/)?>(.*?)</$tagname\s*>@s"; 

Comments

0
$pattern = "(?<=<target>).+(?=</target>)";

Comments

0

You can change the end of the first section to only allow spaces and no other characters:

<target\s*>(.*?)</target\s*>

The xml parser is almost certainly still the right long-term solution, but this is a quick way to get your code working.

3 Comments

$tagname = 'target'; $pattern = "<$tagname\s*>(.*?)</$tagname\s*>"; $content = preg_replace_callback($pattern, html_entities, $xml); Unknown modifier '(' in.....???
I want to only include in first section that [^/] i.e. / cannot occur in this section so how can do this???
Check the syntax for preg_replace: change your code to $pattern = "#<" . $tagname . "\s*>(.*?)</" . $tagname . "\s*>#is";

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.