0

I have a Python script that will retrieve a batch of rows from a table in a remote Postgresql database, do some processing, and then store the result back to the database. I will have this script running concurrently on several different machines, so I need to make sure that two different instances of the script do not retrieve the same row from the table.

I can use SELECT ... FOR UPDATE, but I would still need to store, perhaps in a column of that table or in a different table, that those rows are being "worked on". If one instance of the script retrieves a batch of rows and starts processing them, another instance of the script could retrieve some of the same rows, unless I keep track of which rows are in progress.

What I need to do is retrieve a batch of rows AND update a table all in one atomic step.

4
  • do you have a unique key in that table? Commented Feb 9, 2017 at 16:29
  • Yes, it has a unique primary key Commented Feb 9, 2017 at 16:35
  • Use a SELECT FOR UPDATE as described here. Commented Feb 9, 2017 at 16:49
  • That looks useful, but it raises another issue for me, which I'll explain in an update to the question. Thanks! Commented Feb 9, 2017 at 17:56

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.