We have to refactor scraping algorithm. To speed it up we came up to conclusion to multi-thread processes (and limit them to max 3). Generally speaking scraping consists of following aspects:
- Scraping (async request, takes approx 2 sec)
- Image processing (async per image, approx 500ms per image)
- Changing source item in DB (async request, approx 2 sec)
What I am aiming to do is to create batch of scraping requests and while looping through them, create a stack of consequent async operations: Process images and as soon as images are processed -> change source item.
In other words - scraping goes. but image processing and changing source items must be run in separate limited async threads.
Only think I don't know how to stack the batch and limit threads.
Has anyone came across the same task and what approach have you used?