So far, my bash script takes in two arguments...input which can be a file or a directory, and output, which is the output file. It finds all files recursively and if the input is a file it finds all occurrences of each word in all the files found and list them in the output file with the number on the left and the word on the right sorted from greatest to least. Right now it is also counting numbers as words which it shouldn't do...how can I have it only find all occurrences of valid words and no numbers? Also, in the last if statement...if the input is a directory, I am having trouble getting it to do the same thing I had it do for the file. It needs to find all files in that directory, and if there is another directory in that directory, it needs to find all files in it and so on. Then it needs to count all occurrences of each word in all files and store them to the output file just as in the case for a file. I was thinking to store them in an array, but I'm not sure if its the best way, and my syntax is off because its not working...so I would like to know how can I do this? Thanks!
#!/bin/bash
INPUT="$1"
OUTPUT="$2"
ARRAY=();
# Check that there are two arguments
if [ "$#" -ne 2 ]
then
echo "Usage: $0 {dir-name}";
exit 1
fi
# Check that INPUT is different from OUTPUT
if [ "$INPUT" = "$OUTPUT" ]
then
echo "$INPUT must be different from $OUTPUT";
fi
# Check if INPUT is a file...if so, find number of occurrences of each word
# and store in OUTPUT file sorted in greatest to least
if [ -f "$INPUT" ]
then
for name in $INPUT; do
if [ -f "$name" ]
then
xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
fi
done
# If INPUT is a directory, find number of occurrences of each word
# and store in OUTPUT file sorted in greatest to least
elif [ -d "$INPUT" ]
then
find $name -type f > "${ARRAY[@]}"
for name in "${ARRAY[@]}"; do
if [ -f "$name" ]
then
xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
fi
done
fi
for name in $INPUTis supposed to do ... since$INPUTshould be one argument?-and a couple of other things. just a thought. do you use the regular alphabet or are there special characters?grep -hoP '\b[[:alpha:]]+\b'in place ofgrep -hoP '\b\w+\b'name in $INPUTis each filename in the input.