1

I am trying to run a join command within Python, and I'm being foiled by subprocess. I'm combining thousands of large files iteratively, so a dictionary would require a lot of memory. My rationale is that join only has to deal with two files at a time, so my memory overhead will be lower.

I have tried many different versions of this trying to get subprocess to run. Can anyone explain why this is not working? When I print the cmd and execute it myself on the shell, it runs perfectly.

cmd = "join <(sort %s) <(sort %s)" % (outfile, filename) 
with open(out_temp, 'w') as out:
     return_code = subprocess.call(cmd, stdout=out, shell=True)
if return_code != 0:
     print "not working!"
     break

The error produced looks like this. However, when I have python print cmd and execute it myself on the shell, it runs perfectly.

/bin/sh: -c: line 0: syntax error near unexpected token `('

I have also tried turning the command into a list, but I'm not sure what the rationale is for how to break up the commands. Can anyone explain? outfile and filename are variables

["join" , "<(sort" , outfile , ") <(sort" , filename , ")"]

Any help would be appreciated! I'm doing this in Python because I'm heavily parsing filenames upstream to figure out which files to combine.

1 Answer 1

2

<( is a bash extension to standard shell syntax. Notice in the error message that it's running /bin/sh, not /bin/bash; even if /bin/sh is a link to /bin/bash, bash drops many of its extensions when it's run using that link.

You can use bash explicitly with:

cmd = "bash -c 'join <(sort %s) <(sort %s)'" % (outfile, filename) 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you!! Works perfectly now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.