1

I have a regular expression in php and I need to convert it to java. Is it possible to do so? If yes how can i do?

Thanks in advance

$region_pattern = "/<a href=\"#\"><img src=\"images\/ponto_[^\.]+\.gif\"[^>]*>[&nbsp;]*<strong>(?P<neighborhood>[^\(<]+)\((?P<region>[^\)]+)\)<\/strong><\/a>/i" ;
4
  • 1
    Are you using the PHP ereg or preg methods? What is stopping you? Commented Nov 4, 2011 at 21:39
  • 4
    Also, don't use RegEx to parse HTML lest you unleash a demon: codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html Commented Nov 4, 2011 at 21:40
  • 1
    Can you post some sample data of what you are trying to match with this? Commented Nov 4, 2011 at 21:43
  • @Ben I am trying t use this pattern on this web site page source cgesp.org/pontosdealagamento_dia.php Commented Nov 4, 2011 at 22:56

3 Answers 3

4

A typical conversion from any regex to java is to:

  • Exclude pattern delimiters => remove starting and trailing /
  • Remove flags, these are applied to the Pattern object, this is the trailing i. You should either put it in the initialisation of your Pattern object or prepend it to the regex like (?i)<regex>
  • Replace all \ with \\, \ has a meaning already in java(escape in strings), to use a backslash inside a regex in java you have to use \\ instead of \, so \w becomes \\w. and \\ becomes \\\\

Above regex would become

Pattern.compile("<a href=\"#\"><img src=\"images\\/ponto_[^\\.]+\\.gif\"[^>]*>[&nbsp;]*<strong>(?P<neighborhood>[^\\(<]+)\\((?P<region>[^\\)]+)\\)<\\/strong><\\/a>", Pattern.CASE_INSENSITIVE);

This will fail however, I think it is because ?P is a modifier, not one I know exists in Java so ye it is a invalid regex.

Sign up to request clarification or add additional context in comments.

2 Comments

The inline modifier syntax takes the form (?i)regex (mode is on for the rest of the pattern or until turned off with (?-i)) or (?i:regex) (mode is on only within enclosing group). And the ?P is part of the Python-style named-group syntax, which is also valid in PHP but not in Java (but see my answer for an alternative).
Ah thanks, I tought you could just prepend it without putting it between parantheses. I never use it, always just put it in my Pattern construction
3

There are some problems with the original regex that have to be cleared away first. First, there's [&nbsp;], which matches one of the characters &, n, b, s, p or ;. To match an actual non-breaking space character, you should use \xA0.

You also have a lot of unneeded backslashes in there. You can get rid of some by changing the regex delimiter to something other than /; others aren't needed because they're inside character classes, where most metacharacters lose their special meanings. That leaves you with this PHP regex:

"~<a href=\"#\"><img src=\"images/ponto_[^.]+\.gif\"[^>]*>\xA0*<strong>(?P<neighborhood>[^(<]+)\((?P<region>[^)]+)\)</strong></a>~i"

There are three things that make this regex incompatible with Java. One is the delimiters (/ originally, ~ in the version above) along with the trailing i modifier. Java doesn't use regex delimiters at all, so just drop those. The modifier can be moved into the regex itself by using the inline form, (?i), at the beginning of the regex. (That will work in PHP too, by the way.)

Next is the backslashes. The ones that are used to escape quotation marks remain as they are, but all the others get doubled because Java is more strict about escape sequences in string literals.

Finally, there are the named groups. Up until Java 6, named groups weren't supported at all; Java 7 supports them, but they use the shorter (?<name>...) syntax favored by .NET, not the Pythonesque (?P<name>...) syntax. (By the way, the shorter (?<name>...) version should work in PHP, too (as should (?'name'...), also introduced by .NET).

So the Java 7 version of your regex would be:

"(?i)<a href=\"#\"><img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>(?<neighborhood>[^(<]+)\\((?<region>[^)]+)\\)</strong></a>"

For Java 6 or earlier you would use:

"(?i)<a href=\"#\"><img src=\"images/ponto_[^.]+\\.gif\"[^>]*>\\xA0*<strong>([^(<]+)\\(([^)]+)\\)</strong></a>"

...and you'd have to use numbers instead of names to refer to the group captures.

Comments

0

REGEX is REGEX regardless of language. The REGEX you've posted will work on both Java and PHP. You do need to make some adjustments as both language don't take the pattern exactly the same (though the pattern itself will work in both languages).

Points to Consider

  • You should know that Java's Pattern object applies flags without having to specify them on the pattern string itself.
  • Delimiters should not be included as well. Only the pattern itself.

8 Comments

@Ben How so? where will that pattern fail on Java?
i've tried it on a java regex tester and i got a pattern syntax exception
@Ben I'm interested to know that too.
@alexandre in which case you may have an actual error in the REGEX pattern? How did you test it?
double escaping, shouldnt start and end with /, flags are applied to the Pattern object, to start off with incompatibilities. The fixing of the escape chars will be most work, since there is also triple and quadruple escaping in java regex flavour. Believe me, any regex that works in Perl or PHP will NOT work in Java after simple copy paste
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.