3

I am trying to tokenize my table fields with a query.

SELECT regexp_split_to_table(mytable.field_name, E'\\s+') from mytable limit 20;

This works when I execute it from psql shell but when I do:

from django.db import connection cursor=connection.cursor() 
cursor.execute("SELECT regexp_split_to_table(mytable.field_name,E'\\s+')
                FROM mytable LIMIT 20") 
cursor.fetchall()

... it fails to return tokens. What am I doing wrong?

4
  • it seems psycopg doent execute postgres functions. Commented Jan 18, 2012 at 8:30
  • Don't know much about Django, but here's an idea: "SELECT * FROM (SELECT regexp_split_to_table(fld, E'\\s+') FROM tbl LIMIT 20) x LIMIT 20". In case Django doesn't understand that a function can return a set of values. Commented Jan 18, 2012 at 8:41
  • that does not work either. say one on fields is "SOPA blackout", it wont return [("SOPA","blackout")] but will return [("SOPA blackout")] Commented Jan 18, 2012 at 9:30
  • 1
    So the query does return something? Try ' +') instead of E'\\s+') to determine if the escape string is the problem. May be that you have to double the \ to arrive at E'\\\\s+'). Just hat a similar problem here in the comment. Commented Jan 18, 2012 at 9:39

2 Answers 2

4

The backslash is treated as meta-character by Django and is interpreted inside double quotes. So one layer of E'\\s+') gets stripped before the string arrives at the PostgreSQL server, which will see E'\s+'). The escape string will result in 's+' which in turn will make regexp_split_to_table() split your strings at any number of s instead of non-printing space, which the character class shorthand \s stands for in regular expressions.

Double your backslashes in the string to get what you intended: E'\\\\s+'):

"SELECT regexp_split_to_table(field_name, E'\\\\s+') FROM mytable LIMIT 20"

As an alternative, to avoid problems with the special meaning of the backslash \, you can use [[:space:]] to denote the same character class:

"SELECT regexp_split_to_table(field_name, '[[:space:]]+') FROM mytable LIMIT 20"

Details in the chapter "Pattern Matching" in the manual.

Sign up to request clarification or add additional context in comments.

Comments

1

Thanks to new in Django F, Func, and supprot for postgresql ArrayField you can now call this function like this:

from django.db.models import F, Value, TextField
from django.contrib.postgres.fields import ArrayField
from django.db.models.expressions import Func

MyTable.objects.annotate(
  some_field_splitted=Func(
      F('some_field'), 
      Value(","), 
      function='regexp_split_to_array',
      output_field=ArrayField(TextField())
   )
).filter(some_field_splitted__contains=[HERE_SOME_VALUE])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.