0

Using v9.6.2 (didn't try other versions).

First one works. Second one with a Kanji word fails. What gives?

dev=> select 'foo bar' ~ '\ybar\y' v;
 v
---
 t
(1 row)

dev=> select '積極的 積極的' ~ '\y積極的\y' v;
 v
---
 f
(1 row)
2
  • 1
    stackoverflow.com/questions/280712/javascript-unicode-regexes (Javascript, but I'm pretty sure that the issue is the same.) Commented May 7, 2017 at 15:56
  • 1
    From the docs: A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype) or an underscore. -- try with COLLATE "zh_CN" Commented May 8, 2017 at 12:52

1 Answer 1

1

It'll work without enclosing by \y:

SELECT '積極的 積極的' ~ '積極的' AS v;
 v 
---
 t
(1 row)

regexp_matches will work too:

SELECT regexp_matches('積極的 積極的', '^.*(積極的).*$') AS v;
    v     
----------
 {積極的}
(1 row)

[UPDATE]

Contemporary Chinese characters are rendered using Unicode, which not all programming platforms fully support when it comes to regex word boundaries. I suppose PostgreSQL isn't using a regex engine that supports Unicode word boundaries.

Some programming languages like Scala (Java as well) do support Unicode with word boundaries:

scala> """\b積極的\b""".r findFirstIn "積極的 積極的"
res4: Option[String] = Some(積極的)

Note that \b, not \y, is used for word boundaries in Scala/Java.

Sign up to request clarification or add additional context in comments.

2 Comments

Hmmm, my bad for not noticing word boundary was the focus. Updated my answer.
Thanks, yeah, probably that Postgres doesn't support it. Appreciate your effort.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.