2

I have a great amount of Linux servers to maintain. Frequently I need to run a script (script.sh) on all of them to get the health status, this script usually takes about 30-40 seconds to give an output. To facilitate maintenance tasks, I'm writing a shell script that uses SSH to loop through all remote hosts, run script.sh, collect output and write it to a log file in my local host. For the sake of this question, I have named this script MyScript.sh

The script works fine, however, it has to wait for the SSH output to continue to the next host. Because I have too many servers, and the commands runs in sequence, it take several minutes to finish. I would like to loop through all servers in parallel, without needing to wait for a response from each host.

Is there a way I can remotely run script.sh simultaneously on all host using MyScript.sh? Maybe run the ssh command in the background and somehow collect the output?

The output of script.sh is a single line separated by pipes. Such as the following

host1|49 days|10%|3.77%|27677/63997 MB|43% - /usr|38% - /usr|Optimal|No|40%|No

The output of Myscript.sh is the concatenation of the output from all host without pipes.

    Date       Hostname   Uptime     CPU     I/O      Free MEM           File System               INODES                   STATUS WWW       YYY             ZZZ                   XXX
    ===================================================================================================================================================================================================
    01/31/20   host1      44 days    5%      10.33%   38083/64000 MB     57% - /                   37% - /usr                OPTIMAL         No              40%                    No
    01/31/20   host2      45 days    11%     1.79%    27915/63997 MB     43% - /usr                38% - /usr                OPTIMAL         UP              7%                     OK
    01/31/20   host3      45 days    2%      1.89%    32145/63997 MB     43% - /usr                38% - /usr                OPTIMAL         UP              NO                     OK
    01/31/20   host4      45 days    11%     3.72%    52477/128637 MB    49% - /var                38% - /usr                OPTIMAL         UP              8%                     OK
    01/31/20   host5      45 days    6%      3.21%    65264/128637 MB    46% - /var                38% - /usr                OPTIMAL         UP              NO                     OK
    01/31/20   host6      45 days    7%      5.79%    56369/63997 MB     43% - /usr                38% - /usr                OPTIMAL         UP              NO                     No
    01/31/20   host7      45 days    6%      1.66%    56391/63997 MB     43% - /var                38% - /usr                OPTIMAL         UP              NO                     No

The core of MyScript.sh is the following:

(
    for ip in $IP_LIST;
    do
            echo "Checking $ip"

            ssh  -q -t $user@$ip 'sudo /tmp/script.sh' > /tmp/$$
            current_date=$(date +%D)
            printf "%-10s " "$current_date" >> $logfile

            while read line;
            do
                    echo $line | awk -F '|' '{printf("%-10s %-10s %-7s %-8s %-18s %-25s %-25s %-15s %-15s %-25s %-10s\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11); }' >> $logfile

            done< /tmp/$$

    done
    )

In summary, I would like to optimize this script to run the above code simultaneously on multiple servers. Thanks!

2
  • 1
    Very easy with GNU Parallel, see gnu.org/software/parallel/… Commented Feb 5, 2020 at 0:10
  • 1
    Note that GNU Parallel is not a library, it is a single file containing Perl script, and Perl is included by default with most Linux distros and macOS. Commented Feb 5, 2020 at 16:35

2 Answers 2

1

The solution could be to deploy a monitoring software with custom checks.

For the parrallel ssh problem, without install any binaries you could use this script I wrote a while ago. Put in a file mssh, run chmod u+x mssh and then :

./mssh -s SERVER1 -s SERVER2 -C script.sh

The mssh file :

#!/usr/bin/env bash

readonly prog_name="$(basename "$0")"
readonly date="$(date +%Y%m%d_%H%M%S)"

# print help
usage() {
cat <<- EOF
usage: $prog_name options

parallel ssh executions.

OPTIONS:
   -c --cmd CMD              execute command CMD
   -s --host SRV             execute cmd on server SRV
   -C --cmd CMD_FILE         execute command contained in CMD_FILE
   -S --hosts-file SRV_FILE  execute cmd on all servers contained in SRV_FILE
   -h --help                 show this help

Examples:
   Run CMD on SERVER1 and SERVER2:
   ./$prog_name -s SERVER1 -s SERVER2 -c "CMD"

EOF
}

# test if an element is in an array
is_element(){
    local search=$1; shift;
    for e in "$@"; do [[ "$e" == "$search" ]] && return 0; done
    return 1
}

# parse arguments
for arg in "$@"; do
    case "$arg" in
        --help)           args+=( -h );;
        --host)           args+=( -s );;
        --hosts-file)     args+=( -S );;
        --cmd)            args+=( -c );;
        --cmd-file)       args+=( -C );;
        *)                args+=("$arg");;
    esac
done
set -- "${args[@]}"
while getopts "hs:S:c:C:" OPTION; do
    case $OPTION in
        h)  usage; exit 0;;
        s)  servers_array+=("$OPTARG");;
        S)  while read -r L; do servers_array+=("$L"); done < <( grep -vE "^ *(#|$)" "$OPTARG");;
        c)  cmd="$OPTARG";;
        C)  cmd="$(< "$OPTARG")"; file=$OPTARG;;
        *)  :;;
    esac
done
if [[ -z ${servers_array[0]} ]] || [[ -z $cmd ]]; then
    usage; exit 1
fi

# clean up created files at exit
trap "rm -f /tmp/pssh*$date" EXIT

[[ -n $file ]] && echo "executing command file : $file"  || echo "executing command : $cmd"
# run cmd on each server
for i in "${!servers_array[@]}"; do
    # executing cmd in subshell
    ssh -n "${servers_array[$i]}" "$cmd" > "/tmp/pssh_${i}_${servers_array[$i]}_${date}" 2>&1 &
    pid=$!
    pids_array+=("$pid")
    echo "${servers_array[$i]} - $pid"
done

# for each pid, set state to running
ps_state_array=( $(for i in "${!servers_array[@]}"; do echo "running"; done) )

echo "waiting for results..."
echo

# begin finished verifications
continue=true; attempt=0
while $continue; do

    # foreach ps
    for i in "${!pids_array[@]}"; do

        # if already finished skip
        [[ ${ps_state_array[$i]} == "finished" ]] && continue

        # else check if finished
        ps -o pid "${pids_array[$i]}" > /dev/null 2>&1  && ps_finished=false || ps_finished=true
        if $ps_finished; then
            ps_state_array[$i]="finished"
            echo -e "[ ${servers_array[$i]} @ $(date +%H:%M:%S) ]" | grep '.*' --color=always
            cat "/tmp/pssh_${i}_${servers_array[$i]}_${date}"
            rm -f "/tmp/pssh_${i}_${servers_array[$i]}_${date}"
            echo
        fi
    done    

    is_element "running" "${ps_state_array[@]}" || continue=false
    if $continue; then
        (( attempt < 5 )) && attempt=$(( attempt + 1 ))
        sleep $attempt
    fi
done
exit 0
Sign up to request clarification or add additional context in comments.

2 Comments

@Cyrus, thanks, I correct many of the alerts and more (function is not posix, /usr/bin/env bash better, ...). As mentionned I wrote the script a while ago.
@BDR, thanks for sharing you code. I reused it and adapted it to my needs, it really solves my question.. Also, I like that it works without installing any binaries
1

With GNU Parallel it looks something like this:

doit() {
    ip="$1"
    echo "Checking $ip" >&2
    current_date=$(date +%D)
    printf "%-10s " "$current_date"

    ssh  -q -t $user@$ip 'sudo /tmp/script.sh' |
      awk -F '|' '{printf("%-10s %-10s %-7s %-8s %-18s %-25s %-25s %-15s %-15s %-25s %-10s\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11); }' 
}
export -f doit
export user
parallel -j0 doit ::: $IP_LIST >> $logfile

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.