Unix script is sorting the input

Question

I am having sometime here with my home assignment. Maybe you guys will advise what to read or what commands I can use in order to create the following:

Create a shell script test that will act as follows:

The script will display the following message on the terminal screen: Enter file names (wild cards OK)
The script will read the list of names.
For each file on the list that is a proper file, display a table giving the ten most frequently used words in the file, sorted with the most frequent first. Include the count.
Repeat steps 1-3 over and over until the user indicates end-of-file. This is done by entering the single character Ctrl-d as a file name.

Here is what I have so far:

#!/bin/bash
echo 'Enter file names (wild cards OK)'
read input_source
if test -f "$input_source"
then

Please ask a single, specific question. IE not, "I'm not sure how to do this assignment.", but "How can I find the ten most frequently used words in a file?". — kojiro
– kojiro, Commented Apr 19, 2013 at 19:29
Let's start with the first part. You are supposed to read a list of names. Are you treating the input you get as a list (i.e. many things) or as a single item (i.e. one name)? How will it make a difference? Imagine I enter after your prompt: filename1 filename2 filename3. What does your test do with that input? — Telemachus
– Telemachus, Commented Apr 19, 2013 at 19:33
assuming there is a word per line: sort $input_source | uniq -c | head -10 — kofemann
– kofemann, Commented Apr 19, 2013 at 19:34
Also, if you are completely lost about shell scripting, I recommend that you take a look at something like this first: mywiki.wooledge.org/BashGuide. — Telemachus
– Telemachus, Commented Apr 19, 2013 at 19:34
@tigran …which would give you the ten least frequently used words in the file. — kojiro
– kojiro, Commented Apr 19, 2013 at 19:38

glenn jackman · Accepted Answer · 2013-04-21 13:28:37Z

1

I'm usually ignoring homework questions without showing some progress and effort to learn something - but you're as beautifully cheeky so i'll make an exception.

here is what you want

while read -ep 'Files?> ' files
do
    for file in $files
    do
        echo "== word counts for the $file =="
        tr -cs '[:alnum:]' '\n' < "$file" | sort | uniq -c | tail | sort -nr
    done
done

And now = at least try understand what the above doing...

Ps: voting to close...

edited Apr 21, 2013 at 13:28

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

answered Apr 19, 2013 at 22:01

clt60

64.3k17 gold badges114 silver badges206 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Zerg12 Over a year ago

wow. Thanks. I am currently having couple of books opened regarding shell scripting :)

glenn jackman Over a year ago

Your use of ls is adding a bug for files with whitespace: the file file with spaces will go through 3 iterations of the for loop.

glenn jackman Over a year ago

Amended. This is the safest way to iterate over files.

kojiro · Accepted Answer · 2013-04-19 19:35:30Z

1

How to find the ten most frequently used words in a file

Assumptions:

The files given have one word per line.
The files are not huge, so efficiency isn't a primary concern.

You can use sort and uniq to find the count of non-unique values in a file, then tail to cut off all but the last ten, and reverse-numeric sort to put them in descending order.

sort "$afile" | uniq -c | tail | sort -rd

answered Apr 19, 2013 at 19:35

kojiro

77.8k20 gold badges151 silver badges217 bronze badges

Comments

glenn jackman · Accepted Answer · 2013-04-19 21:42:24Z

1

Some tips:

have access to the complete bash manual: it's daunting at first, but it's an invaluable reference -- http://www.gnu.org/software/bash/manual/bashref.html
You can get help about bash builtins at the command line: try help read
the read command can handle printing the prompt with the -p option (see previous tip)

you'll accomplish the last step with a while loop:

while read -p "the prompt" filenames; do 
    # ...
done

answered Apr 19, 2013 at 21:42

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

Collectives™ on Stack Overflow

Unix script is sorting the input

3 Answers 3

3 Comments

How to find the ten most frequently used words in a file

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

How to find the ten most frequently used words in a file

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related