Postgresql Regex for Word Boundary not Working

Question

Using v9.6.2 (didn't try other versions).

First one works. Second one with a Kanji word fails. What gives?

dev=> select 'foo bar' ~ '\ybar\y' v;
 v
---
 t
(1 row)

dev=> select '積極的 積極的' ~ '\y積極的\y' v;
 v
---
 f
(1 row)

stackoverflow.com/questions/280712/javascript-unicode-regexes (Javascript, but I'm pretty sure that the issue is the same.) — Dmitri Goldring
– Dmitri Goldring, Commented May 7, 2017 at 15:56
From the docs: A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype) or an underscore. -- try with COLLATE "zh_CN" — pozs
– pozs, Commented May 8, 2017 at 12:52

Leo C · Accepted Answer · 2017-05-08 01:13:47Z

1

It'll work without enclosing by \y:

SELECT '積極的 積極的' ~ '積極的' AS v;
 v 
---
 t
(1 row)

regexp_matches will work too:

SELECT regexp_matches('積極的 積極的', '^.*(積極的).*$') AS v;
    v     
----------
 {積極的}
(1 row)

[UPDATE]

Contemporary Chinese characters are rendered using Unicode, which not all programming platforms fully support when it comes to regex word boundaries. I suppose PostgreSQL isn't using a regex engine that supports Unicode word boundaries.

Some programming languages like Scala (Java as well) do support Unicode with word boundaries:

scala> """\b積極的\b""".r findFirstIn "積極的 積極的"
res4: Option[String] = Some(積極的)

Note that \b, not \y, is used for word boundaries in Scala/Java.

edited May 8, 2017 at 1:13

answered May 7, 2017 at 16:03

Leo C

22.5k3 gold badges28 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Leo C Over a year ago

Hmmm, my bad for not noticing word boundary was the focus. Updated my answer.

user2297550 Over a year ago

Thanks, yeah, probably that Postgres doesn't support it. Appreciate your effort.

Collectives™ on Stack Overflow

Postgresql Regex for Word Boundary not Working

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related