0

I am having sometime here with my home assignment. Maybe you guys will advise what to read or what commands I can use in order to create the following:

Create a shell script test that will act as follows:

  1. The script will display the following message on the terminal screen: Enter file names (wild cards OK)
  2. The script will read the list of names.
  3. For each file on the list that is a proper file, display a table giving the ten most frequently used words in the file, sorted with the most frequent first. Include the count.
  4. Repeat steps 1-3 over and over until the user indicates end-of-file. This is done by entering the single character Ctrl-d as a file name.

Here is what I have so far:

#!/bin/bash
echo 'Enter file names (wild cards OK)'
read input_source
if test -f "$input_source"
then 
9
  • 1
    Please ask a single, specific question. IE not, "I'm not sure how to do this assignment.", but "How can I find the ten most frequently used words in a file?". Commented Apr 19, 2013 at 19:29
  • 1
    Let's start with the first part. You are supposed to read a list of names. Are you treating the input you get as a list (i.e. many things) or as a single item (i.e. one name)? How will it make a difference? Imagine I enter after your prompt: filename1 filename2 filename3. What does your test do with that input? Commented Apr 19, 2013 at 19:33
  • assuming there is a word per line: sort $input_source | uniq -c | head -10 Commented Apr 19, 2013 at 19:34
  • 1
    Also, if you are completely lost about shell scripting, I recommend that you take a look at something like this first: mywiki.wooledge.org/BashGuide. Commented Apr 19, 2013 at 19:34
  • @tigran …which would give you the ten least frequently used words in the file. Commented Apr 19, 2013 at 19:38

3 Answers 3

1

I'm usually ignoring homework questions without showing some progress and effort to learn something - but you're as beautifully cheeky so i'll make an exception.

here is what you want

while read -ep 'Files?> ' files
do
    for file in $files
    do
        echo "== word counts for the $file =="
        tr -cs '[:alnum:]' '\n' < "$file" | sort | uniq -c | tail | sort -nr
    done
done

And now = at least try understand what the above doing...

Ps: voting to close...

Sign up to request clarification or add additional context in comments.

3 Comments

wow. Thanks. I am currently having couple of books opened regarding shell scripting :)
Your use of ls is adding a bug for files with whitespace: the file file with spaces will go through 3 iterations of the for loop.
Amended. This is the safest way to iterate over files.
1

How to find the ten most frequently used words in a file

Assumptions:

  1. The files given have one word per line.
  2. The files are not huge, so efficiency isn't a primary concern.

You can use sort and uniq to find the count of non-unique values in a file, then tail to cut off all but the last ten, and reverse-numeric sort to put them in descending order.

sort "$afile" | uniq -c | tail | sort -rd

Comments

1

Some tips:

  1. have access to the complete bash manual: it's daunting at first, but it's an invaluable reference -- http://www.gnu.org/software/bash/manual/bashref.html

  2. You can get help about bash builtins at the command line: try help read

  3. the read command can handle printing the prompt with the -p option (see previous tip)

  4. you'll accomplish the last step with a while loop:

    while read -p "the prompt" filenames; do 
        # ...
    done
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.