9

I have a python script to run a few external commands using the os.subprocess module. But one of these steps takes a huge time and so I would like to run it separately. I need to launch them, check they are finished and then execute the next command which is not parallel. My code is something like this:

nproc = 24 
for i in xrange(nproc):
    #Run program in parallel

#Combine files generated by the parallel step
for i in xrange(nproc):
    handle = open('Niben_%s_structures' % (zfile_name), 'w')
    for i in xrange(nproc):
        for zline in open('Niben_%s_file%d_structures' % (zfile_name,i)):handle.write(zline)
    handle.close()

#Run next step
cmd = 'bowtie-build -f Niben_%s_precursors.fa bowtie-index/Niben_%s_precursors' % (zfile_name,zfile_name)

3 Answers 3

6

For your example, you just want to shell out in parallel - you don't need threads for that.

Use the Popen constructor in the subprocess module: http://docs.python.org/library/subprocess.htm

Collect the Popen instances for each process you spawned and then wait() for them to finish:

procs = []
for i in xrange(nproc):
    procs.append(subprocess.Popen(ARGS_GO_HERE)) #Run program in parallel
for p in procs:
    p.wait()

You can get away with this (as opposed to using the multiprocessing or threading modules), since you aren't really interested in having these interoperate - you just want the os to run them in parallel and be sure they are all finished when you go to combine the results...

Sign up to request clarification or add additional context in comments.

2 Comments

@Daren Thomas: How about if I want to get the result of each process?
@hguser, read up on the module subprocess - you can redirect STDOUT and friends :-)
2

Running things in parallel can also be implemented using multiple processes in Python. I had written a blog post on this topic a while ago, you can find it here

http://multicodecjukebox.blogspot.de/2010/11/parallelizing-multiprocessing-commands.html

Basically, the idea is to use "worker processes" which independently retrieve jobs from a queue and then complete these jobs.

Works quite well in my experience.

Comments

1

You can do it using threads. This is very short and (not tested) example with very ugly if-else on what you are actually doing in the thread, but you can write you own worker classes..

import threading

class Worker(threading.Thread):
    def __init__(self, i):
        self._i = i
        super(threading.Thread,self).__init__()

    def run(self):
        if self._i == 1:
            self.result = do_this()
        elif self._i == 2:
            self.result = do_that()

threads = []
nproc = 24 
for i in xrange(nproc):
    #Run program in parallel        
    w = Worker(i)
    threads.append(w)
    w.start()
    w.join()

# ...now all threads are done

#Combine files generated by the parallel step
for i in xrange(nproc):
    handle = open('Niben_%s_structures' % (zfile_name), 'w')
    ...etc...

1 Comment

This actually doesn't do anything in parallel, due to the join() blocking (preventing the other threads from starting) until the thread finishes. See my answer for how to get around this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.