Regex and PHP to get Java/PHP class content from a source file

Question

I need to parse some text file searching for php classes. So, for example, if I have a text file with this source:

... some text ...

... some other text ...

class Foo{

function Bar($param){ ... do stuff ... }

}

... some other text ...

class Bar{

function Foo(){ ... do something .... }

}

... some else ...

In this case, my regular expression must match the two classes and the content of the classes, to get back this results:

first result:

class Foo{

function Bar($param){ ... do stuff ... }

}

second result:

class Bar{

function Foo(){ ... do something .... }

}

I've tried a lot of times but unlucky. My last test was

/^[\n\r\t ](?:abstract|class|interface){1}(.)[^(?:class|interface)]*$/im

but it only matches

class Foo{

and

class Bar{

without the content of the class.

Thanks for your help :)

Are you asking how to match the contents of a possibly nested { .. } block structure? — tchrist
– tchrist, Commented Nov 11, 2010 at 12:19
Hi and welcome to Stack Overflow. For posting code, please don't use > but rather paste the code as it, select it and press Ctrl-K. This is much better. — Tim Pietzcker
– Tim Pietzcker, Commented Nov 11, 2010 at 12:23

Tim Pietzcker · Accepted Answer · 2010-11-11 14:19:18Z

2

This cannot be done with "classic" regular expressions because you'd need to be able to handle arbitrarily nested parentheses, and structures like these are by definition irregular. Some programming languages (.NET, PCRE, Perl 5.6 and up) have augmented regular expressions to support recursive matching, but most implementations can't handle recursion yet.

I'd also wager a bet that even if your favorite language's regex engine can handle recursion, it's usually not the best way to go. Most of the time, you rather want a parser for this.

That said, even without recursive regexes you might have a chance if your code is formatted in a consistent manner (start column of the class definition == column of the closing }, no mix of tabs and spaces, and every sub-level structure is indented).

Then you could try

/^([\t ]*)(?:abstract|class|interface).*?^\1\}/sim

But this is sure to fail horribly if your code is not exactly formatted according to those rules.

Explanation:

^                             # start of line
([\t\ ]*)                     # match and remember whitespace
(?:abstract|class|interface)  # match keyword
.*?                           # match as few characters as possible
^\1                           # until the next line that starts with the same amount of whitespace
\}                            # followed by a }

edited Nov 11, 2010 at 14:19

answered Nov 11, 2010 at 12:19

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

tchrist Over a year ago

Tim Tim Tim, please stop saying this "cannot be done with regexes" stuff. It's not true.

Tim Pietzcker Over a year ago

@tchrist: OK, I have clarified my answer. A little :). I still don't think it's a good thing to use recursion in regular expressions even if some modern dialects can. Regexes are hard enough already...

tchrist Over a year ago

Not perl6. perl5 has had it since at least 5.6 from back last millennium. The cooler buffer recursion thing though is from 5.10 and about three years old.

tchrist Over a year ago

@TimPietzcker: It depends on what you’re doing. I think a regex can be very maintainable, moreso than a dedicated parser. You just have to use "grammatical" regexes, like here and here.

Tim Pietzcker Over a year ago

@tchrist: How about handling } s inside comments or strings? Is it feasible to write a regex that finds the correct matching brace for { foo { bar "baz{" /* {{comment} */ tutu } tata }?

Collectives™ on Stack Overflow

Regex and PHP to get Java/PHP class content from a source file

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related