Add html tag to string in PHP

Question

I would like to add html tag to string of HTML in PHP, for example:

<h2><b>Hello World</b></h2>
<p>First</p>
Second
<p>Third</p>

Second is not wrapped with any html element, so system will add p tag into it, expected result:

<h2><b>Hello World</b></h2>
<p>First</p>
<p>Second</p>
<p>Third</p>

Tried with PHP Simple HTML DOM Parser but have no clue how to deal with it, here is my example of idea:

function htmlParser($html)
{
    foreach ($html->childNodes() as $node) {
        if ($node->childNodes()) {
            htmlParser($node);
        }
        // Ideally: add p tag to node innertext if it does not wrapped with any tag
    }

    return $html;
}

But childNode will not loop into Second because it has no element wrapped inside, and regex is not recommended to deal with html tag, any idea on it?

Much appreciate, thanks.

bcperth · Accepted Answer · 2018-09-06 09:47:30Z

2

This was a cool question because it promoted thought about the DoM.

I raised a question How do HTML Parsers process untagged text which was commented generously by @sideshowbarker, which made me think, and improved my knowledge of the DoM, especially about text nodes.

Below is a DoM based way of finding candidate text nodes and padding them with 'p' tags. There are lots of text nodes that we should leave alone, like the spaces, carriage returns and line feeds we use for formatting (which an "uglifier" may strip out).

<?php

$html = file_get_contents("nodeTest.html"); // read the test file
$dom = new domDocument;            // a new dom object
$dom->loadHTML($html);             // build the DoM
$bodyNodes = $dom->getElementsByTagName('body');  // returns DOMNodeList object
foreach($bodyNodes[0]->childNodes as $child)      // assuming 1 <body> node
{
    $text="";
    // this tests for an untagged text node that has more than non-formatting characters
    if ( ($child->nodeType == 3) && ( strlen( $text = trim($child->nodeValue)) > 0 ) )
    { // its a candidate for adding tags
        $newText = "<p>".$text."</p>";  
        echo str_replace($text,$newText,$child->nodeValue);
    }
    else
    {   // not a candidate for adding tags
        echo $dom->saveHTML($child);
    }
}

nodeTest.html contains this.

<!DOCTYPE HTML> 
<html>
<body>
    <h2><b>Hello World</b></h2>
    <p>First</p>
    Second
    <p>Third</p>
    fourth
    <p>Third</p>
    <!-- comment -->
</body>
</html>

and the output is this.... I did not bother echoing the outer tags. Notice that comments and formatting are properly treated.

<h2><b>Hello World</b></h2>
<p>First</p>
<p>Second</p>
<p>Third</p>
<p>fourth</p>
<p>Third</p>
<!-- comment -->

Obviously you need to traverse the DoM and repeat the search/replace at each element node if you wish to make the thing more general. We are only stopping at the Body node in this example and processing each direct child node.

I'm not 100% sure the code is the most efficient possible and I may think some more on that and update if I find a better way.

edited Sep 6, 2018 at 9:47

answered Sep 6, 2018 at 9:23

bcperth

2,2891 gold badge14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

bcperth Over a year ago

Here is a DoM based way of tagging untagged text. I think you are right that Regex based search/replace is workable, but we are never sure all conditions are met and the DoM itself can change. With a little bit of extra work it should be possible to update the source HTML file directly, but I have only shown echoing here for simplicity.

Momo Over a year ago

Ya, regex will messed up everything, thanks for the code

bcperth Over a year ago

ok great. If it works, you can consider accepting the answer by clicking on the tick, under the vote arrows at top left of answer. But by all means ask for more help if needed.

bcperth Over a year ago

if you feel like it you can read this <stackoverflow.com/questions/52176319/…>

Momo · Accepted Answer · 2018-09-04 09:03:12Z

1

Used a stupid way to solve this problem, here is my code:

function addPTag($html)
{
    $contents = preg_split("/(<\/.*?>)/", $html, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
    foreach ($contents as &$content) {
        if (substr($content, 0, 1) != '<') {
            $chars = preg_split("/(<)/", $content, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
            $chars[0] = '<p>' . $chars[0] . '</p>';
            $content = implode($chars);
        }
    }

    return implode($contents);
}

Hope there is other elegant way rather than this, thanks.

answered Sep 4, 2018 at 9:03

Momo

4822 gold badges5 silver badges19 bronze badges

1 Comment

bcperth Over a year ago

Looks as good as any! Detecting HTML tags reliably in every possible case is not so easy because you can have '<' and '>' characters inside quoted strings etc and all the stuff like classes etc that can appear before the first closing '>'. So if you just need to solve for restricted situations, whatever works is good!

Marat Badykov · Accepted Answer · 2018-09-04 06:40:59Z

0

You can try Simple HTML Dom Parser

$stringHtml = 'Your received html';

$html = str_get_html(stringHtml);

//Find necessary element and edit it
$exampleText = $html->find('Your selector here', 0)->last_child()->innertext

answered Sep 4, 2018 at 6:40

Marat Badykov

8444 silver badges8 bronze badges

1 Comment

bcperth Over a year ago

May not work as he has no tags, ids or class to select on :-)

Collectives™ on Stack Overflow

Add html tag to string in PHP

3 Answers 3

4 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related