1

I wanted to get list of numbers from sequence of characters(that is: letters and digits). So I've written this code:

class A {
  public static void main(String[] args) {
    String msg = "aa811b22";
    String[] numbers = msg.split("\\D+");
    for (int i = 0; i < numbers.length; i++) {
      System.out.println(">" + numbers[i] + "<");
    }

  }
}

Surpassingly it runs...:

 $ java A
><
>811<
>22<

Ok, so somehow it matched empty string...I explained to myself that ""(empty string) actually matches regexp of NON DIGIT MATCHER so \D+. Nothing is NOT digit...right? (however... why it returned only 1 empty string? There is infinite (∞) number of empty string inside any string)

To ensure myself I tried to extract words from string given above:

class A {
  public static void main(String[] args) {
    String msg = "aa811b22";
    String[] words = msg.split("\\d+");
    for (int i = 0; i < words.length; i++) {
      System.out.println(">" + words[i] + "<");
    }

  }
}

which actually prints what I expected (no empty strings returned):

 $ run A
>aa<
>b<

but... I did few more tests that completely confused me:

System.out.println("a".split("\\D+").length);
#=> 0 (WHY NOT 1? Empty string shouldn't be here?!)
System.out.println("a1".split("\\D+").length);
#=> 2 (So now it splits to empty string and 1)
System.out.println("1a".split("\\D+").length);
#=> 1 (now it returns expected "a" string)

So my questions are:

  • Why split returns empty string with my given examples?
  • Why "a".split("\\D+").length returns 0 ?
  • why "a1".split("\\D+").length is 2 (but no one)
  • how "1a".split("\\D+").length) varies from "a1".split("\\D+").length) in case of splitting?

2 Answers 2

2
  • Why split returns empty string with my given examples?

'a' is not a digit, so aa is a separator. There are elements to return on either side of a separator, and the empty string is to the left of a. If the separator were ",", then out of the string ",a,b" you would expect 3 elements -- "", "a", and "b". Here, aa is the separator, just like , in my example.

  • Why "a".split("\\D+").length returns 0 ?

'a' is not a digit, so it's a separator. The presence of the separator means that there are two substring split out of the original String, both empty strings, on either side of the a. However, the no-arg split method discards trailing empty strings. They're all empty, so they're all discarded, and the length is 0.

  • why "a1".split("\\D+").length is 2 (but not one)

Only trailing empty strings are discarded, so the elements are "" and "1".

  • how "1a".split("\\D+").length varies from "a1".split("\\D+").length in case of splitting?

"1a" will have one trailing empty string discarded, but "a1" will not have a trailing empty string discarded (it's leading).

Sign up to request clarification or add additional context in comments.

2 Comments

why "a1".split("\\D+").length is 2 (but not one) : in one example above (with "a".split) u told that trailing string (so on both sides of "a") are discarded). Isn't empty string before 'a' a trailing string? Shouldn't it be discarded?
No, here, "trailing string" means at the end, not at the beginning. So if you have "a1a".split("\\D+"), you the split first yields 3 elements: "", "1", and "". Only the last element is empty and trailing, so it's removed, and the array ["", "1"] is returned. The reason that both empty strings are discarded from "a".split("\\D+") is that once the ending empty string is removed, the other empty string that is left is now trailing also, so it's removed also.
1

It's not matching an empty string. Rather, it's matching the "aa" at the beginning of your string as a delimiter. The first element is empty because there is only an empty string before the first delimiter. In contrast, for trailing delimiters there is no empty string returned, as mentioned in the documentation for split():

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

2 Comments

so shouldn't System.out.println("a".split("\\D+").length); return 1 not 0?
No, because all zero-length strings are discarded from the end of the array before returning them, not just the empty string which results from the zero-length string after the final token.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.