1

I am trying to loop over ndarray to record index and value of it to postgresql. Here is my code:

    for idx, val in enumerate(data):
        cur.execute("INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES (%s, %s, %s, %s, %s)", (idx+1, spamlabel, 0, 0, dt.now()))

The size of ndarray is 762k and it tooks more than 8h to insert those values. Is there any more efficient way to do this?

2
  • that doesn't have anything to do with numpy, only the strategy you use with the database library. Which library do you use here? Almost any up-to-date library should support batched INSERT's, which is the way to go here. Commented Mar 24, 2019 at 11:27
  • I am using psycopg2 for postgresql. How can I do batched INSERT's with it according to my ndarray? Commented Mar 24, 2019 at 11:39

1 Answer 1

1

Use psycopg2's execute_values helper method and also provide constants to limit the data we have to transfer, e.g.:

from psycopg2 import extras

extras.execute_values(
    cur,
    "INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES %s",
    enumerate(data),
    template = "(%s + 1, %s, 0, 0, CURRENT_TIMESTAMP)")

You can also experiment with the page_size parameter for further throughput tuning.

Sign up to request clarification or add additional context in comments.

4 Comments

I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?
Sorry, I was using execute_batch first. Updated for the VALUES template.
Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?
Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.