2

I am trying to write a program which retrieves a list of tasks to perform from a file and executes the tasks asynchronously. Each task has the form: read data from a file, do some computations (which take a couple minutes), and write the results to another file. There is no overlap in the files which must be read from and written to; indeed, the tasks are completely independent.

Googling around, it seems there is some native support for this sort of thing in Python 3.5, but unfortunately I am constrained to Python 3.4 at the moment. Upon further Googling it seems that the solution will involve generators and yields, but all the examples that I've found seem much more complicated than what I'm trying to do.

Feel free to recommend specific packages if they exist, but note that this is not a "what is the best tool" question. I'm just looking for a simple and reliable way to solve the problem.

2 Answers 2

1

You should read about pool.map_async, I've used it many times to perform asynchronous tasks. Basically what you need to do is write a function which is passed 2 arguments, the in_file and the out_file, instantiate the pool and associate it with the function and its list of tuples arguments [(in_file1,out_file1), (in_file2,out_file2)...].

Caution! while the files may not overlap you are still using an IO device which incurs a lot of overhead in terms of waiting for read and write, so try to separate the reading and writing of file from the main logic, i.e. read the file, process all of the data and store it on ram, write the file out.

EDIT1: appearantly in python3 they have starmap, which to my understanding lets you pass an iterable of iterables, the iterables are parsed as arguments. So I changed the code to use starmap. Also you should consider adding a timeout to one iteration, you can do that using the get method.

I'll include some example code (it's in python2 but the logic/syntax is easily transferable):

import multiprocessing as mp

def foo(in_file, out_file):
    in_data = ""
    # this is just an example of how to read file
    with open(in_file, "rb") as f:
        for line in f:
            in_data += line
    ...
    out_data = process(in_data)
    ...
    with open(out_file,'wb') as f:
        f.write(out_data)


def main():
    files =[("/infile1", "/outfile1"), ("/infile2", "/outfile2"), ...]
    # you should choose how many processes you wish to instantiate.
    # leaving it blank will assume the number of threads/core available.
    pool = mp.Pool(processes=4)
    pool.starmap_async(foo,files)
Sign up to request clarification or add additional context in comments.

Comments

1

You should try out the high level API provided by multiprocessing module, especially have a look at Pool in the Python documentation. Keep in mind that to run truly asynchronous tasks in Python, you have to use multiprocessing instead of multithreading due to the Global Interpreter Lock (GIL).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.