3

I have a table in Postgres with a JSONB column, each row of the table contains a large JSONB object (~4500 keys, JSON string is around 110 KB in a txt file). I want to query these rows and get the entire JSONB object.

The query is fast -- when I run EXPLAIN ANALYZE, or omit the JSONB column, it returns in 100-300 ms. But when I execute the full query, it takes on the order of minutes. The exact same query on a previous version of the data was also fast (each JSONB was about half as large).

Some notes:

  • This ends up in Python (via SQLAlchemy/psycopg2). I'm worried that the query executor is converting JSONB to JSON, then it gets encoded to text for transfer over the wire, then gets JSON encoded again on the Python end. Is this correct? If so how could I mitigate this issue? When I select the JSONB column as ::text, the query is roughly twice as fast.

  • I only need a small subset of the JSON (around 300 keys or 6% of keys). I tried methods of filtering the JSON output in the query but they caused a substantial further performance hit -- it ended up being faster to return the entire object.

2
  • First try creating and running the query in pure sql, and run it on the same node as the postgres sever (if possible) this will eliminate any sqlalchemy/network issues. Next, provide more details about the data and the actual sql that you ran. and the query timings. Commented Sep 30, 2017 at 7:14
  • @JonScott when you say rewrite the query in pure SQL, does this mean I should essentially remove any PostgreSQL specific features (like JSON)? Commented Oct 2, 2017 at 15:05

1 Answer 1

8

This is not necessarily a solution, but here is an update:

By casting the JSON column to text in the Postgres query, I was able to substantially cut down query execution and results fetching on the Python end.

On the Python end, doing json.loads for every single row in the result set brings me to the exact timing as using the regular query. However, with the ujson library I was able to obtain a significant speedup. The performance of casting to text in the query, then calling ujson.loads on the python end is roughly 3x faster than simply returning JSON in the query.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.