1

I am trying to perform a rethinkdb match query with an escaped unicode user provided search param:

import re
from rethinkdb import RethinkDB

r = RethinkDB()

search_value = u"\u05e5"  # provided by user via flask
search_value_escaped = re.escape(search_value)  # results in u'\\\u05e5' ->
    # when encoded with "utf-8" gives "\ץ" as expected.

conn = rethinkdb.connect(...)

results_cursor_a = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value)
).run(conn)  # search_value works fine

results_cursor_b = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value_escaped)
).run(conn)  # search_value_escaped spits an error

The error for search_value_escaped is the following:

ReqlQueryLogicError: Error in regexp `\ץ` (portion `\ץ`): invalid escape sequence: \ץ in:
r.db(...).table(...).order_by(index="id").filter(lambda var_1: var_1.coerce_to('string').match(u'\\\u05e5m'))
                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         

I tried encoding with "utf-8" before/after re.escape() but same results with different errors. What am I messing? Is it something in my code or some kind of a bug?

EDIT: .coerce_to('string') converts the document to "utf-8" encoded string. RethinkDB also converts the query to "utf-8" and then it matches them hence the first query works even though it looks like a unicde match inside a string.

2
  • It does look like RethinkDB doesn't accept unicode escaped characters in it's query (which usually python ignores) so writing my own escape function will be the solution unless someone sheds some light on it. Commented Mar 30, 2019 at 17:36
  • You could leave a comment on this question and ask if they ever solved the problem (the user is still active on SO). Commented Mar 30, 2019 at 17:40

1 Answer 1

1

From what it looks like RethinkDB rejects escaped unicode characters so I wrote a simple workaround with a custom escape without implementing my own logic of replacing characters (in fear that I must miss one and create a security issue).

import re

def no_unicode_escape(u):
    escaped_list = []

    for i in u:
        if ord(i) < 128:
            escaped_list.append(re.escape(i))
        else:
            escaped_list.append(i)

    rv = "".join(escaped_list)
    return rv

or a one-liner:

import re

def no_unicode_escape(u):
    return "".join(re.escape(i) if ord(i) < 128 else i for i in u)

Which yields the required result of escaping "dangerous" characters and works with RethinkDB as I wanted.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.