0

I am using python 2.7 on Windows machine. I have an array of urls accompanied by data and headers, so POST method is required. In simple execution it works well:

    rescodeinvalid =[]
    success = []
    for i in range(0,len(HostArray)):
       data = urllib.urlencode(post_data)
       req = urllib2.Request(HostArray[i], data)
       response = urllib2.urlopen(req)
       rescode=response.getcode()

       if responsecode == 400:
            rescodeinvalid.append(HostArray[i])

       if responsecode == 200:
           success.append(HostArray[i])

My question is if HostArray length is very large, then it is taking much time in loop. So, how to check each url of HostArray in a multithread. If response code of each url is 200, then I am doing different operation. I have arrays to store 200 and 400 responses. So, how to do this in multithread in python

2

3 Answers 3

1

If you want to do each one in a separate thread you could do something like:

  rescodeinvalid =[]
  success = []

  def post_and_handle(url,post_data)
       data = urllib.urlencode(post_data)
       req = urllib2.Request(url, data)
       response = urllib2.urlopen(req)
       rescode=response.getcode()

       if responsecode == 400:
              rescodeinvalid.append(url) # Append is thread safe
       elif responsecode == 200:
              success.append(url)  # Append is thread safe

  workers = []
  for i in range(0,len(HostArray)):
         t = threading.Thread(target=post_and_handle,args=(HostArray[i],post_data))
         t.start()
         workers.append(t)

  # Wait for all of the requests to complete
  for t in workers:
       t.join()

I'd also suggest using requests: http://docs.python-requests.org/en/latest/

as well as a thread pool: Threading pool similar to the multiprocessing Pool?

Thread pool usage:

from multiprocessing.pool import ThreadPool

# Done here because this must be done in the main thread
pool = ThreadPool(processes=50) # use a max of 50 threads

# do this instead of Thread(target=func,args=args,kwargs=kwargs))
pool.apply_async(func,args,kwargs)

pool.close() # I think
pool.join()
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for answer. One question, if len(HostArray) is large, then how many thread will start. Is there any limitation with number of threads in windows can start.
It'll start a thread for each, so it's best to use a thread pool (see the link).
I looked into from multiprocessing.pool import ThreadPool, but i am not getting how to add this in our main code. Can u please suggest. Thanks
1

scrapy uses twisted library to call multiple urls in parallel without the overhead of opening a new thread per request, it also manage internal queue to accumulate and even prioritize them as a bonus you can also restrict number of parallel requests by settings maximum concurrent requests, you can either launch a scrapy spider as an external process or from your code, just set spider start_urls = HostArray

Comments

0

Your case (basically processing a list into another list) looks like an ideal candidate for concurrent.futures (see for example this answer) or you may go all the way to Executor.map. And of course use ThreadPoolExecutor to limit the number of concurrently running threads to something reasonable.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.