Multiprocessing Pool not working - For loop inside function

Question

I'm trying to get this function work asynchronously (I have tried asyncio, threadpoolexecutor, processpoolexecutor and still no luck). It takes around 11 seconds on my PC to complete a batch 500 items and there isno difference compared to plain for loop, so I assume It doesn't work as expected (in parallel).

here is the function:

from unidecode import unidecode
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool(4)

def is_it_bad(word):
    for item in all_names:
        if str(word) in str(item['name']):
            return item
    item = {'name':word, 'gender': 2}
    return item

def check_word(arr):
    fname = unidecode(str(arr[1]['fullname'] + ' ' + arr[1]['username'])).replace('([^a-z ]+)', ' ').lower()
    fname = fname + ' ' + fname.replace(' ', '')
    fname = fname.split(' ')
    genders = []
    for chunk in fname:
        if len(chunk) > 2:
            genders.append(int(is_it_bad('_' + chunk + '_')['gender']))        
    if set(genders) == {2}:        
        followers[arr[0]]['gender'] = 2
        #results_new.append(name)
    elif set([0,1]).issubset(genders):
        followers[arr[0]]['gender'] = 2
        #results_new.append(name)
    else:
        if 0 in genders:
            followers[arr[0]]['gender'] = 0
            #results_new.append(name)
        else:
            followers[arr[0]]['gender'] = 1
            #results_new.append(name)

results = pool.map(check_word, [(idx, name) for idx, name in enumerate(names)])

Can you please help me with this

Taylor Iserman · Accepted Answer · 2018-04-01 02:34:24Z

1

You are using the module "multiprocessing.dummy"

According to the documentation provided here,

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

The threading module does not provide the same speedup advantages as the multiprocessing module does because the threads in that module are executed serially. For more information on how to use the multiprocessing module, visit this tutorial (no affiliation).

In it, the author uses both multiprocessing.dummy and multiprocessing to accomplish two different tasks. You'll notice multiprocessing is the module used to provide the speedup. Just switch to that module and you should see an increase.

edited Apr 1, 2018 at 2:34

answered Apr 1, 2018 at 1:07

Taylor Iserman

1851 silver badge16 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

HamishLacmane · Accepted Answer · 2018-03-30 11:19:19Z

0

I am unable to run your code due to the unidecode package, but here is how I use multithreading in my previous projects and with the with your code:

import multiprocessing
#get maximum threads
max_threads = multiprocessing.cpu_count()
#max_threads = multiprocessing.cpu_count()-1 #I prefer to use 1 less core if i still wish to use my device

#create pool with max_threads
p = multiprocessing.Pool(max_threads)
#execute pool with function
results = p.map(check_word, [(idx, name) for idx, name in enumerate(names)])

Let me know if this works or helps!

Edit: Added some comments to the code

answered Mar 30, 2018 at 11:19

HamishLacmane

982 silver badges11 bronze badges

2 Comments

Edgard Gomez Sennovskaya Over a year ago

as I can see this is exactly what I did, except for hardcoded cpu_count and I said there is no improvement in speed compared to regular for loop

HamishLacmane Over a year ago

@EdgardGomezSennovskaya if something is already really fast, adding multi-threading will not make it that much faster if at all. For something that requires extensive time to compute per iteration, the time difference can be seen easily. I prefer to use a for loop if the code already runs really fast, else I use multithreading. Btw, what is the format of the 'names' variable? We may be able to speed up code there as you make it a list of tuples.

Collectives™ on Stack Overflow

Multiprocessing Pool not working - For loop inside function

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related