PHP preg_replace match HTML attribute

Question

I'm attempting to remove the title attribute from HTML elements.

function remove_title_attributes($input) {
    return remove_html_attribute('title', $input);
}

/**
 * To remove an attribute from an html tag
 * @param string $attr the attribute
 * @param string $str the html
 */
function remove_html_attribute($attr, $str){
    return preg_replace('/\s*'.$attr.'\s*=\s*(["\']).*?\1/', '', $str);
}

However, it can't tell the difference between <img title="something"> and [shortcode title="something"]. How can I target only the code in HTML tags (such as <img> or <a href=""><a>)?

Don't use regular expressions to parse HTML. You cannot reliably parse HTML with regular expressions. As soon as the HTML changes from your expectations, your code will be broken. See htmlparsing.com/php.html for examples of how to properly parse HTML with PHP modules. — Andy Lester
– Andy Lester, Commented Mar 6, 2013 at 16:30
possible duplicate of How to parse and process HTML/XML with PHP? — Quentin
– Quentin, Commented Mar 6, 2013 at 16:43
When I use this function, I don't have a fully-formed HTML document. Just the body content of a blog post without a root tag. Something like this: stuff <a href="link" title="something">linkme</a>more stuffeven more stuff — Force Flow
– Force Flow, Commented Mar 6, 2013 at 16:49

ozahorulia · Accepted Answer · 2013-03-06 17:02:53Z

4

Do not use regexp, use a DOM parser instead. Go the the official reference page and study it. In your case you need the DOMElement::removeAttribute() method. Here is an example:

<?php

$html = '<p>stuff <a href="link" title="something">linkme</a></p><p>more stuff</p><p>even more stuff</p>';

$dom = new DOMDocument();
$dom->loadHTML($html);

$domElement = $dom->documentElement;

$a = $domElement->getElementsByTagName('a')->item(0);
$a->removeAttribute('title');

$result =  $dom->saveHTML();

edited Mar 6, 2013 at 17:02

answered Mar 6, 2013 at 16:41

ozahorulia

10k8 gold badges53 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Force Flow Over a year ago

This doesn't work when there is no root tag. The HTML code I'm working with is the content for a page or blog post. For example, the HTML code I have is something like this: stuff <a href="link" title="something">linkme</a>more stuffeven more stuff

naomi Over a year ago

You say "you need the DOMNode::removeChild() method" but in your code you used removeAttribute

ozahorulia Over a year ago

@naomi yes, sorry, it was just a mistake.

Force Flow Over a year ago

When the <a> tag doesn't exist, I get this: Fatal error: Call to a member function removeAttribute() on a non-object. Can I target all tags, or do I have to specify tags on an individual basis?

ozahorulia Over a year ago

@ForceFlow error shown because you have to check if the title attribute exists in the element using hasAttribure method. Yes, you can. Write a recursive function that will loop throgh all html elements. Or use DOMXPath query: php.net/manual/en/class.domxpath.php

Force Flow · Accepted Answer · 2013-03-07 16:03:45Z

I used the code from @Hast as a building block. It looks like this does the trick (unless there's a better way?)

/**
 * To remove an attribute from an html tag
 * @param string $attr the attribute
 * @param string $str the html
 */
function remove_html_attribute($attr, $input){
    //return preg_replace('/\s*'.$attr.'\s*=\s*(["\']).*?\1/', '', $input);

    $result='';

    if(!empty($input)){

        //check if the input text contains tags
        if($input!=strip_tags($input)){
            $dom = new DOMDocument();

            //use mb_convert_encoding to prevent non-ASCII characters from randomly appearing in text
            $dom->loadHTML(mb_convert_encoding($input, 'HTML-ENTITIES', 'UTF-8'));

            $domElement = $dom->documentElement;

            $taglist = array('a', 'img', 'span', 'li', 'table', 'td'); //tags to check for specified tag attribute

            foreach($taglist as $target_tag){
                $tags = $domElement->getElementsByTagName($target_tag);

                foreach($tags as $tag){
                    $tag->removeAttribute($attr);
                }
            }

            //$result =  $dom->saveHTML();
            $result = innerHTML( $domElement->firstChild ); //strip doctype/html/body tags
        }
        else{
            $result=$input;
        }
    }

    return $result; 
}

/**
 * removes the doctype/html/body tags
 */
function innerHTML($node){
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

  return $doc->saveHTML();
}

Collectives™ on Stack Overflow

PHP preg_replace match HTML attribute

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related