2

I have some code in php I made using preg_grep for matching several words in any order that can exist in any context. I'm trying to convert it to java but i can't seem to figure it out.

My php code for converting a keyword to a regex string is:

function createRegexSearch($keywords)
{
    $regex = '';
    foreach ($keywords as $key)
        $regex .= '(?=.*' . $key . ')';
    return '/^' . $regex . '/i';
}

It would create a regex string similar to: /^(?=.*bot)/i - which should match robot, robots, bots etc. The same regex string doesn't seem to work in java which is leaving me confused. Currently in java I created a similar effect with contains but would rather use regex.

for (Map.Entry<String, String> entry : mKeyList.entrySet())
{
    boolean found = true;
    String val = entry.getValue().toLowerCase();
    for (int i = 0; i < keywords.length; i++)
    {
        if (!val.contains(keywords[i].toLowerCase()))
            found = false;
    }

    if (found)
        ret.add(entry.getValue());
}
1
  • Can you post the java code that doesn't work? Commented Jun 15, 2012 at 17:22

4 Answers 4

1

One thing that Java does differently than many languages is have two different ways of "matching" a regex against a target - "matches()" vs "find()" - matches is the equivalent of putting ^ and $ at the beginning and end of your expression, while find finds the first match (wherever it might be in the string) - for example while you might be able to find() .*bot in the target string robots, it would not be true to say that it matches() the target... I'm not entirely sure how the lookahead might affect this...

Without posted Java code (containing the problem), it's hard to tell you where you might be going wrong, but my guess is that it could very easily be in this area.

Also, the equivalent of putting /i at the end of your expression in Java (and .Net) is putting (?i) at the beginning of your expression (or any region you want to be case sensitive). Thus, /[a-f0-9]/i is equivalent to (?i)[a-f0-9]

Sign up to request clarification or add additional context in comments.

Comments

0

The String contains is case sensitive, so the first set (PHP Code) will behave case in-sensitive since the usage of \i. But the java code will behave case sensitive. So there will be differences in behavior.

So if this is difference, you convert both the end to specific char set, say toUpperCase() before the contains check.

Also you are using a regex in PHP code and not in Java, any specific reason behind this?

Regards Ajai G

1 Comment

Yeah my regex code i used in php didn't seem to work in java. I do change the case of everything to lowercase but it seems for the set of data I have it's taking about half a second which I think could be reduced with regex
0

You can use the embedded flag extension (?i) so the regex you should be using to match bot, robot, bots and robots is (?i)^(.*bots?)$ This should work with either String.matches or Pattern/Matcher

Comments

0

JMPL is simple java library, which could emulate some of the features pattern matching, using Java 8 features.

   import org.kl.state.Else;
   import static org.kl.pattern.DeconstructPattern.matches;
   import static org.kl.pattern.DeconstructPattern.foreach;
   import static org.kl.pattern.DeconstructPattern.let;

   let(figure, (int w, int h) -> {
      System.out.println("border: " + w + " " + h));
   });

   matches(figure).as(
      Rectangle.class, (int w, int h) -> System.out.println("square: " + (w * h)),
      Circle.class,    (int r)        -> System.out.println("square: " + (2 * Math.PI * r)),
      Else.class,      ()             -> System.out.println("Default square: " + 0)
   );

   foreach(listRectangles, (int w, int h) -> {
      System.out.println("square: " + (w * h));
   });

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.