2

I'm trying to do a regex to look through a pre-existing code base that seems to abuse the hell out of the php error suppression character (@) on both variable references and function calls. As a result, I want to search through the entire code base to create a list of all the usages. The problem is, much of the code also include perldoc and I'm not sure how to exclude obvious comments.

most of the perldoc seems to be predicated by a minimum of whitespace-asterix-whitespace. e.g.:

  /**
   * @param int $somvar
   */

so it could be matched with something like /^\s*\*\s+/ reasonably consistently.

The regex I'm using to find the usages of the error suppression character (but that grabs the perldoc also) is:

/(@[\$\w][\w\d]*)/

It's results are satisfactory save for picking up all the perldoc.

I tried looking at some of the examples of negative look-ahead, but don't seem to be evading those perldoc comments with anything I've yet tried. One example of one that doesn't work is as follows:

(?!\s*[\*\/])(@[\$\w][\w\d]*)

Any help is appreciated

11
  • unfortunately, some of them are actually used by things - such as doctrine or other interfaces that put their configs in comment blocks with @ symbols Commented Dec 29, 2014 at 22:08
  • would not a ?<! be a 'negative look behind'? Commented Dec 29, 2014 at 22:10
  • By the way, I tried both negative lookaheads and negative lookbehinds, but they didn't skip the quotes or they ended up skipping too much. Part of the problem is that I specifically need to look for the whitespace-asterix-whitespace at the beginning of the line. Comments after the character won't matter and I don't want to skip lines with multiplication operands either. Commented Dec 29, 2014 at 22:16
  • at this point I am just trying to create a list of the uses so we can decide if it's even worth trying to edit some of them out. (some of this codebase has code from an old version of joomla which literally has thousands of them in it!) Commented Dec 29, 2014 at 22:17
  • (?<!^\s*[*\/])(@[\$\w][\w\d]*) is returning an error in one of the test tools I have but runs very slow and still returns perldoc. I also tried it as (@[\$\w][\w\d]*)(?<!^\s*[*\/]) and it ran normal speed but still returned the perldoc. Commented Dec 29, 2014 at 22:21

1 Answer 1

1

You can use PHP's token_get_all() to find all of the @ symbols instead of regex. This way you're letting PHP's own internal parser parse the file for you:

$source_file = 'source_file_to_open.php';
$source = file_get_contents($source_file);
$tokens = token_get_all($source);

// Loop through all the tokens
for ($i=0; $i < count($tokens); $i++) {
    // If the token is equal to @, then get the line number (3rd value in array)
    // of the *following* token because the @ does not have a line number because
    // it's not listed as an array, just a string.
    if ($tokens[$i] == '@') {
        echo "@ found in $source_file on line: {$tokens[$i+1][2]}<br />\n";
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.