0

I'm trying to get this function work asynchronously (I have tried asyncio, threadpoolexecutor, processpoolexecutor and still no luck). It takes around 11 seconds on my PC to complete a batch 500 items and there isno difference compared to plain for loop, so I assume It doesn't work as expected (in parallel).

here is the function:

from unidecode import unidecode
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool(4)

def is_it_bad(word):
    for item in all_names:
        if str(word) in str(item['name']):
            return item
    item = {'name':word, 'gender': 2}
    return item

def check_word(arr):
    fname = unidecode(str(arr[1]['fullname'] + ' ' + arr[1]['username'])).replace('([^a-z ]+)', ' ').lower()
    fname = fname + ' ' + fname.replace(' ', '')
    fname = fname.split(' ')
    genders = []
    for chunk in fname:
        if len(chunk) > 2:
            genders.append(int(is_it_bad('_' + chunk + '_')['gender']))        
    if set(genders) == {2}:        
        followers[arr[0]]['gender'] = 2
        #results_new.append(name)
    elif set([0,1]).issubset(genders):
        followers[arr[0]]['gender'] = 2
        #results_new.append(name)
    else:
        if 0 in genders:
            followers[arr[0]]['gender'] = 0
            #results_new.append(name)
        else:
            followers[arr[0]]['gender'] = 1
            #results_new.append(name)

results = pool.map(check_word, [(idx, name) for idx, name in enumerate(names)]) 

Can you please help me with this

2 Answers 2

1

You are using the module "multiprocessing.dummy"

According to the documentation provided here,

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

The threading module does not provide the same speedup advantages as the multiprocessing module does because the threads in that module are executed serially. For more information on how to use the multiprocessing module, visit this tutorial (no affiliation).

In it, the author uses both multiprocessing.dummy and multiprocessing to accomplish two different tasks. You'll notice multiprocessing is the module used to provide the speedup. Just switch to that module and you should see an increase.

Sign up to request clarification or add additional context in comments.

Comments

0

I am unable to run your code due to the unidecode package, but here is how I use multithreading in my previous projects and with the with your code:

import multiprocessing
#get maximum threads
max_threads = multiprocessing.cpu_count()
#max_threads = multiprocessing.cpu_count()-1 #I prefer to use 1 less core if i still wish to use my device

#create pool with max_threads
p = multiprocessing.Pool(max_threads)
#execute pool with function
results = p.map(check_word, [(idx, name) for idx, name in enumerate(names)]) 

Let me know if this works or helps!

Edit: Added some comments to the code

2 Comments

as I can see this is exactly what I did, except for hardcoded cpu_count and I said there is no improvement in speed compared to regular for loop
@EdgardGomezSennovskaya if something is already really fast, adding multi-threading will not make it that much faster if at all. For something that requires extensive time to compute per iteration, the time difference can be seen easily. I prefer to use a for loop if the code already runs really fast, else I use multithreading. Btw, what is the format of the 'names' variable? We may be able to speed up code there as you make it a list of tuples.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.