0

I'm trying to get a list of all occurrences of a file being included in a php script.

I'm reading in the entire file, which contains this:

<?php
    echo 'Hello there';

    include 'some_functions.php';

    echo 'Trying to find some includes.';

    include 'include_me.php';

    echo 'Testtest.';
?>

Then, I run this code on that file:

if (preg_match_all ("/(include.*?;){1}/is", $this->file_contents, $matches))
  {
      print_r($matches);
  }

When I run this match, I get the expected results... which are the two include sections, but I also get repeats of the exact same thing, or random chunks of the include statement. Here is an example of the output:

    Array ( 
[0] => Array ( [0] => include 'some_functions.php'; [1] => include 'include_me.php'; ) 
[1] => Array ( [0] => include 'some_functions.php'; [1] => include 'include_me.php'; ) ) 

As you can see, it's nesting arrays with the same result multiple times. I need 1 item in the array for each include statement, no repeats, no nested arrays.

I'm having some trouble with these regular expressions, so some guidance would be nice. Thank you for your time.

5
  • why are you extracting from a php file like this at all? Commented Jun 23, 2013 at 21:59
  • I'm going to replace the include statement with the contents of the file it's trying to include. Commented Jun 23, 2013 at 22:09
  • Your regex doesn't make sense, just remove {1} part so that it becomes include.*?;. Other than that, that's how preg_match_all works. Commented Jun 23, 2013 at 22:30
  • ouch, why not let include do its job Commented Jun 23, 2013 at 23:39
  • HamZa, thanks for the feedback. I know it's senseless, I just kept modifying it trying to get it to do what I wanted. Dagon, there is no good reason. I just want to make a script that grabs all included files and creates a standalone script with the same functionality. it's more of an experiment than anything else. Commented Jun 24, 2013 at 1:38

3 Answers 3

4

what about this one

<?php
  preg_match_all( "/include(_once)?\s*\(?\s*(\"|')(.*?)\.php(\"|')\s*\)?\s*;?/i", $this->file_contents, $matches );
  // for file names
  print_r( $matches[3] );
  // for full lines
  print_r( $matches[0] );
?>

if you want a better and clean way, then the only way is php's token_get_all

<?php
  $tokens = token_get_all( $this->file_contents );
  $files  = array();
  $index  = 0;
  $found  = false;
  foreach( $tokens as $token ) {
    // in php 5.2+ Line numbers are returned in element 2
    $token  = ( is_string( $token ) ) ? array( -1, $token, 0 ) : $token;
    switch( $token[0] ) {
      case T_INCLUDE:
      case T_INCLUDE_ONCE:
      case T_REQUIRE:
      case T_REQUIRE_ONCE:
        $found  = true;
        if ( isset( $token[2] ) ) {
          $index  = $token[2];
        }
        $files[$index]  = null;
      break;

      case T_COMMENT:
      case T_DOC_COMMENT:
      case T_WHITESPACE:
      break;

      default:
        if ( $found && $token[1] === ";" ) {
          $found  = false;
          if ( !isset( $token[2] ) ) {
            $index++;
          }
        }
        if ( $found ) {
          if ( in_array( $token[1], array( "(", ")" ) ) ) {
            continue;
          }
          if ( $found ) {
            $files[$index]  .=  $token[1];
          }
        }
      break;
    }
  }
  // if your php version is above 5.2
  // $files index will be line numbers
  print_r( $files );
?>
Sign up to request clarification or add additional context in comments.

5 Comments

This is very nice, thank you. However I need to entire path between the quotes in the include statement (That's why I was trying to grab everything between the word include and the semi colon). I'll play around and see if I can make that modification. Thank you for your response.
if you need everything around quotes, then remove \.php from regex so it will be "/include(_once)?\s*\(?\s*(\"|')(.*?)(\"|')\s*\)?\s*;?/" :)
Thank you. I took what you put and the comment yAnTar made and figured something out that appears to be working. I appreciate the help.
I was expecting you to update the regex to handle constants, so I can given you another example that would fail :) But I'm glad to see you didn't and went with the tokenizer too. Using regular expression to parse PHP is just silly...
Very cool. I wasn't aware of token_get_all(). That's a very nice approach, and I like it. It's much neater and is working better than the regex. Thank you for the feedback and a better solution than what I was originally thinking.
3

Use get_included_files(), or the built-in tokenizer if the script is not included

I'm searching through a string of another files contents and not the current file

Then your best bet is the tokenizer. Try this:

$scriptPath = '/full/path/to/your/script.php';
$tokens = token_get_all(file_get_contents($scriptPath));
$matches = array();
$incMode = null;

foreach($tokens as $token){

  // ";" should end include stm.
  if($incMode && ($token === ';')){
    $matches[] = $incMode;
    $incMode = array();
  }

  // keep track of the code if inside include statement
  if($incMode){
    $incMode[1] .= is_array($token) ? $token[1] : $token;
    continue;
  }  

  if(!is_array($token))
    continue;

  // start of include stm.
  if(in_array($token[0], array(T_INCLUDE, T_INCLUDE_ONCE, T_REQUIRE, T_REQUIRE_ONCE)))
    $incMode = array(token_name($token[0]), '');
}

print_r($matches); // array(token name, code)

2 Comments

I don't believe this will work, as I'm searching through a string of another files contents and not the current file.
Thanks for the updated token approach. I've come to the conclusion that is a much better solution then the regex I was thinking I needed to use. Thank you for bringing a better method to my attention.
1

Please read, how works preg_match_all

First item in array - it return all text, which is in regular expression. Next items in array - that's texts from regular expression (in parenthesises).

You should use $matches[1]

1 Comment

I went through all the text in the documentation and see what you mean now. When I remove the parenthesis from my expression, it only returns each include once. It's still in a single nested array, but I can work with that. Thank you for the response, I wasn't aware I misunderstood how the function worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.