Python + Mechanize Async Tasks

Question

So I have this bit of python code that runs through a delicious page and scrapes some links off of it. The extract method contains some magic that pull out the required content. However, running the page fetches one after another is pretty slow - is there a way to do this async in python so i can launch several get requests and process pages in parallel?

url= "http://www.delicious.com/search?p=varun"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

count=1
#Follows regexp match onto consecutive pages
while soup.find ('a', attrs={'class': 'pn next'}):
    print "yay"
    print count
    endOfPage = "false"
    try :
        page3 = br.follow_link(text_regex="Next")
        html3 = page3.read()
        soup3 = BeautifulSoup(html3)
        extract(soup3)
    except:
        print "End of Pages"
        endOfPage = "true"
    if valval == "true":
        break
    count = count +1

are there any particular frameworks that work well with mechanize and BeautifulSoup? — varunsrin
– varunsrin, Commented Dec 19, 2010 at 2:15

anijhaw · Accepted Answer · 2010-12-19 05:30:55Z

1

Beautiful Soup is pretty slow, if you want better performance use lxml instead or if you have many CPU's perhaps you can try using multiprocessing with queues.

answered Dec 19, 2010 at 5:30

anijhaw

9,4927 gold badges38 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python + Mechanize Async Tasks

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related