So, I'm working in updating thousands of rows in a Postgres DB with Python (v3.6). After cleaning the data and preparing it, I'm having issues with times on the row updating. I've already indexed the columns that are being used to do the query.
I'm using psycopg2 to execute a "execute_batch" update on the table after having created the column, but the times just do not have any sense. It takes 40 seconds to update 10k rows, and what is breaking my mind, is that changing the "page_size" parameter of the function doesn't seem to change the speed of the updates.
These two codes would give the same time results:
psycopg2.extras.execute_batch(self.cursor, query, field_list, page_size=1000)
psycopg2.extras.execute_batch(self.cursor, query, field_list, page_size=10)
With all this, am I doing something wrong? Is it necessary to change anything in the database configuration so that the page_size argument would change its behaviour?
So far I've found a post that obtain improvements when using this method, but I cannot reproduce its results:
https://hakibenita.com/fast-load-data-python-postgresql#measuring-time
Any light in this would be awesome.
Many thanks!
execute_mogrify_method()