PHP Regex exclude comments finding error suppression

Question

I'm trying to do a regex to look through a pre-existing code base that seems to abuse the hell out of the php error suppression character (@) on both variable references and function calls. As a result, I want to search through the entire code base to create a list of all the usages. The problem is, much of the code also include perldoc and I'm not sure how to exclude obvious comments.

most of the perldoc seems to be predicated by a minimum of whitespace-asterix-whitespace. e.g.:

  /**
   * @param int $somvar
   */

so it could be matched with something like /^\s*\*\s+/ reasonably consistently.

The regex I'm using to find the usages of the error suppression character (but that grabs the perldoc also) is:

/(@[\$\w][\w\d]*)/

It's results are satisfactory save for picking up all the perldoc.

I tried looking at some of the examples of negative look-ahead, but don't seem to be evading those perldoc comments with anything I've yet tried. One example of one that doesn't work is as follows:

(?!\s*[\*\/])(@[\$\w][\w\d]*)

Any help is appreciated

unfortunately, some of them are actually used by things - such as doctrine or other interfaces that put their configs in comment blocks with @ symbols — Scott
– Scott, Commented Dec 29, 2014 at 22:08
By the way, I tried both negative lookaheads and negative lookbehinds, but they didn't skip the quotes or they ended up skipping too much. Part of the problem is that I specifically need to look for the whitespace-asterix-whitespace at the beginning of the line. Comments after the character won't matter and I don't want to skip lines with multiplication operands either. — Scott
– Scott, Commented Dec 29, 2014 at 22:16
at this point I am just trying to create a list of the uses so we can decide if it's even worth trying to edit some of them out. (some of this codebase has code from an old version of joomla which literally has thousands of them in it!) — Scott
– Scott, Commented Dec 29, 2014 at 22:17
(?<!^\s*[*\/])(@[\$\w][\w\d]*) is returning an error in one of the test tools I have but runs very slow and still returns perldoc. I also tried it as (@[\$\w][\w\d]*)(?<!^\s*[*\/]) and it ran normal speed but still returned the perldoc. — Scott
– Scott, Commented Dec 29, 2014 at 22:21

Mike · Accepted Answer · 2015-01-02 04:53:08Z

1

You can use PHP's token_get_all() to find all of the @ symbols instead of regex. This way you're letting PHP's own internal parser parse the file for you:

$source_file = 'source_file_to_open.php';
$source = file_get_contents($source_file);
$tokens = token_get_all($source);

// Loop through all the tokens
for ($i=0; $i < count($tokens); $i++) {
    // If the token is equal to @, then get the line number (3rd value in array)
    // of the *following* token because the @ does not have a line number because
    // it's not listed as an array, just a string.
    if ($tokens[$i] == '@') {
        echo "@ found in $source_file on line: {$tokens[$i+1][2]}<br />\n";
    }
}

answered Jan 2, 2015 at 4:53

Mike

24.6k14 gold badges84 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PHP Regex exclude comments finding error suppression

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related