1

I'm looking to parse a string that can be in one of the following formats:

"[a]"
"[a-b]"
"[a-b/9]"
"[a-b, b-c]"
"[a-b/9, b-c]"

In words, the part after - is optional, and if present, may in turn have an optional weight separated by /. The separator character - may change.

Here's my code (executable code here):

import scala.util.matching.Regex

case class Edge[A, B](u: A, v: A, data: Option[B])

def parseEdge(
    s: String,
    sep: Char
): (List[Edge[String, String]], List[String]) =
  if !s.startsWith("[") || !s.endsWith("]")
  then throw IllegalArgumentException("string must be enclosed by '[' and ']'")
  else
    val edgePattern: Regex = raw"""^(.+?)(?:$sep(.+?)(?:\/(.+?))??)??$$""".r

    s
      .substring(1, s.length() - 1)
      .split(",")
      .map(_.trim())
      .foldRight((List.empty[Edge[String, String]], List.empty[String])) {
        case (x, (es, vs)) =>
          x match
            case edgePattern(u)       => (es, u :: vs)
            case edgePattern(u, v)    => (Edge(u, v, None) :: es, vs)
            case edgePattern(u, v, d) => (Edge(u, v, Some(d)) :: es, vs)
      }

But:

println(parseEdge("[b-c]", '-'))  // (List(Edge(b,c,Some(null))),List())
println(parseEdge("[b-c/9]", '-'))  // (List(Edge(b,c,Some(9))),List())

Why's the first string parsed with a null instead of a None?

0

1 Answer 1

4

This case will match, because there are 3 capture groups:

case edgePattern(u, v, d)

You can test this when printing:

val edgePattern: Regex = raw"""^(.+?)(?:$sep(.+?)(?:/(.+?))?)?$$""".r
val m = edgePattern.pattern.matcher("[b-c]")
println(m.groupCount()) // 3

In the match you give back a Some() where you add the null, resulting into Some(null)

If you wrap in in an Option(null) it will result in a None

case edgePattern(u, v, d) => (Edge(u, v, Option(d)) :: es, vs)

And then

println(parseEdge("[b-c]", '-'))  // (List(Edge(b,c,None)),List())

See the updated executable code

Sign up to request clarification or add additional context in comments.

2 Comments

Upvoted, but problem seems to be with the once or not at all regex group (?), not the number of capturing groups that are present, so, this answer is not entirely correct. Removing the ? quantifier results in no match, no matter what case pattern is used. I've asked a question in the Scala user forum.
@AbhijitSarkar the groupCount counts the number of groups in the regex not the number of parts that it captured. So the question that you posted actually returns the expected. Also the ? in the regex makes the groups optional, if you remove the ? then you change what the pattern is expected to match. So if you remove it, then the matches change. See regex101.com/r/2XoItw/1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.