1

We have to refactor scraping algorithm. To speed it up we came up to conclusion to multi-thread processes (and limit them to max 3). Generally speaking scraping consists of following aspects:

  1. Scraping (async request, takes approx 2 sec)
  2. Image processing (async per image, approx 500ms per image)
  3. Changing source item in DB (async request, approx 2 sec)

What I am aiming to do is to create batch of scraping requests and while looping through them, create a stack of consequent async operations: Process images and as soon as images are processed -> change source item.

In other words - scraping goes. but image processing and changing source items must be run in separate limited async threads.

Only think I don't know how to stack the batch and limit threads.

Has anyone came across the same task and what approach have you used?

1 Answer 1

1

What you're looking for is consumer-producer pattern. Just create 3 different queues and when you process the item in one of them, queue new work in another. Then you can 3 different threads each of them processing one queue.

Sign up to request clarification or add additional context in comments.

1 Comment

Your answer lead me to this bogotobogo.com/python/Multithread/… great stuff...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.