How to open Post Urls in multithreads in python

Question

I am using python 2.7 on Windows machine. I have an array of urls accompanied by data and headers, so POST method is required. In simple execution it works well:

    rescodeinvalid =[]
    success = []
    for i in range(0,len(HostArray)):
       data = urllib.urlencode(post_data)
       req = urllib2.Request(HostArray[i], data)
       response = urllib2.urlopen(req)
       rescode=response.getcode()

       if responsecode == 400:
            rescodeinvalid.append(HostArray[i])

       if responsecode == 200:
           success.append(HostArray[i])

My question is if HostArray length is very large, then it is taking much time in loop. So, how to check each url of HostArray in a multithread. If response code of each url is 200, then I am doing different operation. I have arrays to store 200 and 400 responses. So, how to do this in multithread in python

Possible duplicate of stackoverflow.com/questions/13481276/… ? And be careful not to open too many sockets at once, see stackoverflow.com/questions/9487569/… — nodakai
– nodakai, Commented Feb 9, 2014 at 17:15
possible duplicate of Multiple (asynchronous) connections with urllib2 or other http library? — Piotr Dobrogost
– Piotr Dobrogost, Commented Feb 9, 2014 at 21:07

Community · Accepted Answer · 2017-05-23 12:20:40Z

1

If you want to do each one in a separate thread you could do something like:

  rescodeinvalid =[]
  success = []

  def post_and_handle(url,post_data)
       data = urllib.urlencode(post_data)
       req = urllib2.Request(url, data)
       response = urllib2.urlopen(req)
       rescode=response.getcode()

       if responsecode == 400:
              rescodeinvalid.append(url) # Append is thread safe
       elif responsecode == 200:
              success.append(url)  # Append is thread safe

  workers = []
  for i in range(0,len(HostArray)):
         t = threading.Thread(target=post_and_handle,args=(HostArray[i],post_data))
         t.start()
         workers.append(t)

  # Wait for all of the requests to complete
  for t in workers:
       t.join()

I'd also suggest using requests: http://docs.python-requests.org/en/latest/

as well as a thread pool: Threading pool similar to the multiprocessing Pool?

Thread pool usage:

from multiprocessing.pool import ThreadPool

# Done here because this must be done in the main thread
pool = ThreadPool(processes=50) # use a max of 50 threads

# do this instead of Thread(target=func,args=args,kwargs=kwargs))
pool.apply_async(func,args,kwargs)

pool.close() # I think
pool.join()

edited May 23, 2017 at 12:20

CommunityBot

11 silver badge

answered Feb 9, 2014 at 20:56

frmdstryr

21.6k3 gold badges43 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

imp Over a year ago

Thanks for answer. One question, if len(HostArray) is large, then how many thread will start. Is there any limitation with number of threads in windows can start.

frmdstryr Over a year ago

It'll start a thread for each, so it's best to use a thread pool (see the link).

imp Over a year ago

I looked into from multiprocessing.pool import ThreadPool, but i am not getting how to add this in our main code. Can u please suggest. Thanks

Guy Gavriely · Accepted Answer · 2014-02-09 19:57:39Z

1

scrapy uses twisted library to call multiple urls in parallel without the overhead of opening a new thread per request, it also manage internal queue to accumulate and even prioritize them as a bonus you can also restrict number of parallel requests by settings maximum concurrent requests, you can either launch a scrapy spider as an external process or from your code, just set spider start_urls = HostArray

edited Feb 9, 2014 at 19:57

answered Feb 9, 2014 at 17:49

Guy Gavriely

11.4k6 gold badges31 silver badges43 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:20:40Z

0

Your case (basically processing a list into another list) looks like an ideal candidate for concurrent.futures (see for example this answer) or you may go all the way to Executor.map. And of course use ThreadPoolExecutor to limit the number of concurrently running threads to something reasonable.

edited May 23, 2017 at 12:20

CommunityBot

11 silver badge

answered Feb 12, 2014 at 23:00

mcepl

2,8362 gold badges24 silver badges42 bronze badges

Collectives™ on Stack Overflow

How to open Post Urls in multithreads in python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related