1

I'm attempting to remove the title attribute from HTML elements.

function remove_title_attributes($input) {
    return remove_html_attribute('title', $input);
}

/**
 * To remove an attribute from an html tag
 * @param string $attr the attribute
 * @param string $str the html
 */
function remove_html_attribute($attr, $str){
    return preg_replace('/\s*'.$attr.'\s*=\s*(["\']).*?\1/', '', $str);
}

However, it can't tell the difference between <img title="something"> and [shortcode title="something"]. How can I target only the code in HTML tags (such as <img> or <a href=""><a>)?

4
  • 3
    use a HTML parser for this, not regex functions. Commented Mar 6, 2013 at 16:29
  • 4
    Don't use regular expressions to parse HTML. You cannot reliably parse HTML with regular expressions. As soon as the HTML changes from your expectations, your code will be broken. See htmlparsing.com/php.html for examples of how to properly parse HTML with PHP modules. Commented Mar 6, 2013 at 16:30
  • 1
    possible duplicate of How to parse and process HTML/XML with PHP? Commented Mar 6, 2013 at 16:43
  • When I use this function, I don't have a fully-formed HTML document. Just the body content of a blog post without a root tag. Something like this: <p>stuff <a href="link" title="something">linkme</a></p><p>more stuff</p><p>even more stuff</p> Commented Mar 6, 2013 at 16:49

2 Answers 2

4

Do not use regexp, use a DOM parser instead. Go the the official reference page and study it. In your case you need the DOMElement::removeAttribute() method. Here is an example:

<?php

$html = '<p>stuff <a href="link" title="something">linkme</a></p><p>more stuff</p><p>even more stuff</p>';

$dom = new DOMDocument();
$dom->loadHTML($html);

$domElement = $dom->documentElement;

$a = $domElement->getElementsByTagName('a')->item(0);
$a->removeAttribute('title');

$result =  $dom->saveHTML();
Sign up to request clarification or add additional context in comments.

5 Comments

This doesn't work when there is no root tag. The HTML code I'm working with is the content for a page or blog post. For example, the HTML code I have is something like this: <p>stuff <a href="link" title="something">linkme</a></p><p>more stuff</p><p>even more stuff</p>
You say "you need the DOMNode::removeChild() method" but in your code you used removeAttribute
@naomi yes, sorry, it was just a mistake.
When the <a> tag doesn't exist, I get this: Fatal error: Call to a member function removeAttribute() on a non-object. Can I target all tags, or do I have to specify tags on an individual basis?
@ForceFlow error shown because you have to check if the title attribute exists in the element using hasAttribure method. Yes, you can. Write a recursive function that will loop throgh all html elements. Or use DOMXPath query: php.net/manual/en/class.domxpath.php
0

I used the code from @Hast as a building block. It looks like this does the trick (unless there's a better way?)

/**
 * To remove an attribute from an html tag
 * @param string $attr the attribute
 * @param string $str the html
 */
function remove_html_attribute($attr, $input){
    //return preg_replace('/\s*'.$attr.'\s*=\s*(["\']).*?\1/', '', $input);

    $result='';

    if(!empty($input)){

        //check if the input text contains tags
        if($input!=strip_tags($input)){
            $dom = new DOMDocument();

            //use mb_convert_encoding to prevent non-ASCII characters from randomly appearing in text
            $dom->loadHTML(mb_convert_encoding($input, 'HTML-ENTITIES', 'UTF-8'));

            $domElement = $dom->documentElement;

            $taglist = array('a', 'img', 'span', 'li', 'table', 'td'); //tags to check for specified tag attribute

            foreach($taglist as $target_tag){
                $tags = $domElement->getElementsByTagName($target_tag);

                foreach($tags as $tag){
                    $tag->removeAttribute($attr);
                }
            }

            //$result =  $dom->saveHTML();
            $result = innerHTML( $domElement->firstChild ); //strip doctype/html/body tags
        }
        else{
            $result=$input;
        }
    }

    return $result; 
}

/**
 * removes the doctype/html/body tags
 */
function innerHTML($node){
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

  return $doc->saveHTML();
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.