1

For a project I created multiple python scripts and I want to run these on a directory of files in a shell script. In this shell script I already created a for loop with multiple commands. The first command is a python script that blasts the input file against a local database and takes up most cores. The next commands take up way less cores, but take a lot of time. It is very important that for each file the commands are run in a series. To save time I wanted to alter the shell script to run the first command of a file and when it is done, to run the next commands on the output and the first command on the next file simultaneously.

Can anybody help me with this? I tried to search myself, but I can't find the answer. I have not tried running this script, as I am already running the python scripts without a shell script.

This is the script so far:

#!/bin/bash
tsv=/home/user/tsv
fasta=/home/user/fasta/*
clustering=/home/user/clustering

for file in ${fasta}
do
    python blastn_new.py --fasta ${file} --tsv ${tsv}/${file}.tsv &&
    mkdir ${clustering}/${file} &&
    mkdir ${clustering}/${file}/clusters &&
    python blastparsPB.py --clusters ${clustering}/${file}/${file}.txt --fish ${tsv}/${file}.tsv --dir ${clustering}/${file}/clusters/
done
4
  • Cant you just loop through using glob to fetch the files and then run? Commented Nov 9, 2018 at 11:34
  • Just to start with, bash is very sensitive about spaces, so your variable definitions won't work like that. They should be without spaces, like tsv="/home/user/tsv". The quotes are not strictly necessary as you don't have any spaces in the paths, but it's good style. Commented Nov 9, 2018 at 11:38
  • It is a bit hard to understand the order of your commands, can you provide some sort of picture or something like one > two ; three Commented Nov 9, 2018 at 11:44
  • This post on unix exchange has some more info on non-blocking commands. Commented Nov 9, 2018 at 12:00

1 Answer 1

1

You can run the second script in the background.

The following also has some tangential comments, and reformats your code slightly.

#!/bin/bash

# You cannot have spaces around the equals signs
# Also, avoid hard-coding an absolute path
tsv=./tsv
db=./newpacbiodb/pacbiodb
clustering=./clustering

# Notice proper quoting throughout
for file in ./fasta/*
do
    python blastn_new.py \
        --fasta "${file}" \
        --tsv "${tsv}/${file}.tsv" &&
    # mkdir -p creates an entire path if necessary
    # (and works fine even if the directory already exists)
    mkdir -p "${clustering}/${file}/clusters" &&
    python blastparsPB.py \
        --clusters "${clustering}/${file}/${file}.txt" \
        --fish "${tsv}/${file}.tsv" \
        --dir "${clustering}/${file}/clusters/" &
done # notice the simple addition of background ^ job

Obviously, this assumes that the second Python script doesn't dislike having something connect e.g. to the database for writing at the same time, but that's already a given.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.