0

I have a string that I want to split into an array:

SEQUENCE: 1A→2B→3C

I tried the following regular expression:

((.*\s)|([\x{2192}]*))

1. \x{2192} is the arrow mark
2. There is a space after the colon, I used that as a reference for matching the first part

and it works in testers(Patterns in OSX) enter image description here

but it splits the string into this:

[, , 1, A, , 2, B, , 3, C]

How can I achieve the following?:

[1A,2B,3C]

This is the test code:

String str = "SEQUENCE: 1A→2B→3C"; //Note that there's an extra space after the colon
System.out.println(Arrays.toString(str.split("(.*\\s)|([\\x{2192}]*)")));

2 Answers 2

5

As noted in Richard Sitze's post, the main problem with the regex is that it should use + rather than *. Additionally, there are further improvements you can make to your regex:

  • Instead of \\x{2192}, use \u2192. And because it's a single character, you don't need to put it into a character class ([...]), you can just use \u2192+ directly.
  • Also, because | binds more loosely than .*\\s and \u2192+, you won't need the parentheses there either. So your final expression is simply ".*\\s|\u2192+".
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the nice explanation, It works but how do I get rid of the leading "match"? The array looks like this: [, 1A, 2B, 3C] , I don't want the first empty element.
Chop off the SEQUENCE: first, before doing the splitting: str.replaceFirst("SEQUENCE: ", "").split("\u2192+")
5

The \u2192* will match 0 or more arrows - which is why you're splitting on every character (splitting on empty string). Try changing * to +.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.