20

I am wondering if there is an elegant way to trim some text but while being HTML tag aware?

For example, I have this string:

$data = '<strong>some title text here that could get very long</strong>';

And let's say I need to return/output this string on a page but would like it to be no more than X characters. Let's say 35 for this example.

Then I use:

$output = substr($data,0,20);

But now I end up with:

<strong>some title text here that 

which as you can see the closing strong tags are discarded thus breaking the HTML display.

Is there a way around this? Also note that it is possible to have multiple tags in the string for example:

<p>some text here <strong>and here</strong></p>
5
  • Do you need to keep any of the tags? You could use strip_tags() to take the tags out, trim the text and use it; add new <p></p> if they are needed. Commented Jan 19, 2012 at 21:29
  • 2
    I don't know if it's an option, but maybe you can use a browser-side solution like text-overflow: ellipsis or overflow: hidden. Commented Jan 19, 2012 at 21:31
  • How complex is the html going to be? Are you stuffing entire chunks of a DOM tree, or just a tag or two? Commented Jan 19, 2012 at 21:32
  • possible duplicate of Using PHP substr() and strip_tags() while retaining formatting and without breaking HTML Commented Jan 19, 2012 at 21:34
  • Yes it needs to retain HTML tags, hence the title or else I could simply use strip_tags...And no it will not be complicated...Just few tags and possibly nested tags.. Commented Jan 19, 2012 at 21:37

3 Answers 3

7

A few mounths ago I created a special function which is solution for your problem.

Here is a function:

function substr_close_tags($code, $limit = 300)
{
    if ( strlen($code) <= $limit )
    {
        return $code;
    }

    $html = substr($code, 0, $limit);
    preg_match_all ( "#<([a-zA-Z]+)#", $html, $result );

    foreach($result[1] AS $key => $value)
    {
        if ( strtolower($value) == 'br' )
        {
            unset($result[1][$key]);
        }
    }
    $openedtags = $result[1];

    preg_match_all ( "#</([a-zA-Z]+)>#iU", $html, $result );
    $closedtags = $result[1];

    foreach($closedtags AS $key => $value)
    {
        if ( ($k = array_search($value, $openedtags)) === FALSE )
        {
            continue;
        }
        else
        {
            unset($openedtags[$k]);
        }
    }

    if ( empty($openedtags) )
    {
        if ( strpos($code, ' ', $limit) == $limit )
        {
            return $html."...";
        }
        else
        {
            return substr($code, 0, strpos($code, ' ', $limit))."...";
        }
    }

    $position = 0;
    $close_tag = '';
    foreach($openedtags AS $key => $value)
    {   
        $p = strpos($code, ('</'.$value.'>'), $limit);

        if ( $p === FALSE )
        {
            $code .= ('</'.$value.'>');
        }
        else if ( $p > $position )
        {
            $close_tag = '</'.$value.'>';
            $position = $p;
        }
    }

    if ( $position == 0 )
    {
        return $code;
    }

    return substr($code, 0, $position).$close_tag."...";
}

Here is DEMO: http://sandbox.onlinephpfunctions.com/code/899d8137c15596a8528c871543eb005984ec0201 (click "Execute code" to check how it works).

Sign up to request clarification or add additional context in comments.

2 Comments

Upvoted for offering a code solution, rather than a link to a third party website (which was down at the time of commenting).
text inside <pre> returns the whole text. Test $string = <<<'EOT' <pre class="lang-php prettyprint prettyprinted"> jhfhjfghjfghjfghfghjfghjfghfgjhfgjhghjfgjfghjghjf ghjfghjfgjhfghjfghjfgjhfghjfghjfghjfgjfghjfghjfgh aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ccccccccccccccccccccccccccccccccccccccccccccccccc rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr fffffffffffffffffffffffffffffffffffffffffffffffff ggggggggggggggggggggggggggggggggggggggggggggggggg</pre> EOT;
0

Using @newbieuser his function, I had the same issue, like @pablo-pazos, that it was (not) breaking when $limit fell into an html tag (in my case <br /> at the r)

Fixed with some code

if ( strlen($code) <= $limit ){
    return $code;
}

$html = substr($code, 0, $limit);       

//We must find a . or > or space so we are sure not being in a html-tag!
//In my case there are only <br>
//If you have more tags, or html formatted text, you must do a little more and also use something like http://htmlpurifier.org/demo.php

$_find_last_char = strrpos($html, ".")+1;
if($_find_last_char > $limit/3*2){
    $html_break = $_find_last_char;
}else{
    $_find_last_char = strrpos($html, ">")+1;
    if($_find_last_char > $limit/3*2){ 
        $html_break = $_find_last_char;
    }else{
        $html_break = strrpos($html, " ");
    }
}

$html = substr($html, 0, $html_break);
preg_match_all ( "#<([a-zA-Z]+)#", $html, $result );
......

Comments

-3

substr(strip_tags($content), 0, 100)

1 Comment

he want to keep the tags... not remove them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.