0

So, there is a large file where I have to conduct several search using bash shell scripting.

The file is like this:

TITLE and AUTHOR                                                     ETEXT NO.

Aspects of plant life; with special reference to the British flora,      56900
 by Robert Lloyd Praeger

The Vicar of Morwenstow, by Sabine Baring-Gould                          56899
 [Subtitle: Being a Life of Robert Stephen Hawker, M.A.]

Raamatun tutkisteluja IV, mennessä Charles T. Russell                    56898
 [Subtitle: Harmagedonin taistelu]
 [Language: Finnish]

Raamatun tutkisteluja III, mennessä Charles T. Russell                   56897
 [Subtitle: Tulkoon valtakuntasi]
 [Language: Finnish]

Tom Thatcher's Fortune, by Horatio Alger, Jr.                            56896

A Yankee Flier in the Far East, by Al Avery                              56895
 and George Rutherford Montgomery
 [Illustrator: Paul Laune]

Nancy Brandon's Mystery, by Lillian Garis                                56894

Nervous Ills, by Boris Sidis                                             56893
 [Subtitle: Their Cause and Cure]

Pensées sans langage, par Francis Picabia                                56892
 [Language: French]

Helon's Pilgrimage to Jerusalem, Volume 2 of 2, by Frederick Strauss     56891
 [Subtitle: A picture of Judaism, in the century
  which preceded the advent of our Savior]

Fra Tommaso Campanella, Vol. 1, di Luigi Amabile                         56890
 [Subtitle: la sua congiura, i suoi processi e la sua pazzia]
 [Language: Italian]

The Blue Star, by Fletcher Pratt                                         56889

Importanza e risultati degli incrociamenti in avicoltura,                56888
 di Teodoro Pascal
 [Language: Italian]

The Junior Classics, Volume 3: Tales from Greece and Rome, by Various    56887


~ ~ ~ ~ Posting Dates for the below eBooks:  1 Mar 2018 to 31 Mar 2018 ~ ~ ~ ~

TITLE and AUTHOR                                                     ETEXT NO.

The American Missionary, Volume 41, No. 1, January, 1887, by Various     56886

Morganin miljoonat, mennessä Sven Elvestad                               56885
 [Author a.k.a. Stein Riverton]
 [Subtitle: Salapoliisiromaani]
 [Language: Finnish]

"Trip to the Sunny South" in March, 1885, by L. S. D                     56884

Balaam and His Master, by Joel Chandler Harris                           56883
 [Subtitle: and Other Sketches and Stories]

Susien saaliina, mennessä Jack London                                    56882
 [Language: Finnish]

Forged Egyptian Antiquities, by T. G. Wakeling                           56881

The Secret Doctrine, Vol. 3 of 4, by Helena Petrovna Blavatsky           56880
 [Subtitle: Third Edition]

No Posting                                                               56879

First love and other stories, by Iván Turgénieff                         56878

Now I have to search it with etext no, author name and title..

Like If I search by an etext no: like etext 56900: It should return

Aspects of plant life; with special reference to the British flora,      56900

Well I am new to shell scripting. And I can only read the file. With this:

#!/bin/sh
read -p 'string to search ' searchstring
grep --color searchstring GUTINDEX.ALL | #condition

I don't know what kind of condition I should use to search by author name or etext no....

4
  • 1
    grep is a general purpose tool to find strings matching a regular expression - there is no easy way to work with (semi-)structured data. I would suggest that you use a more powerful language to parse your file into some type of object and then look up entries within that object. Commented Apr 28, 2018 at 17:34
  • I have to do it with bash script. It can be done with Grep, I was told.. Commented Apr 28, 2018 at 17:38
  • It can be done but it's not the right tool for the job in my opinion. You will need to ask the user which field they want to search on and then you could write several grep commands to handle each case. Commented Apr 28, 2018 at 17:41
  • What are the regular expression to match it to ETEXT, author or title? Getting the regex will do it, no? Commented Apr 28, 2018 at 17:44

2 Answers 2

1

As others have already pointed out, the use of grep alone is not how you would really approach this. A rather substantial improvement could be accomplished by using Awk instead of grep, but for a real production system, you would parse out the fields into a relational database, and use SQL to search instead. With database indexing, searching will then be much much quicker than sequentially scanning the entire index file for each search.

But if you are confined to just grep, here is a quick and dirty attempt.

author () { grep -E "(by|par|di|mennessä) $@" GUTINDEX.ALL; }
index () { grep " $@\$" GUTINDEX.ALL; }
title () { grep "^$@" GUTINDEX.ALL; }

This declares three shell functions which search different parts of the file, by way of supplying an anchor expression (^ matches beginning of line, $ matches end of line) or a suitable context.

Having the search expression as a command-line argument instead of requiring interactive input is generally a huge usability improvement. Now, you can use your shell's history mechanism to recall and possibly edit earlier searches, and build new scripts on top of these simple building blocks.

(By the way, "mennessä" is not at all a correct Finnish localization here. I have reported a bug to Project Gutenberg.)

Sign up to request clarification or add additional context in comments.

Comments

0

You could start with something like this, but as @tom-fenech points out, it's rather unreliable in the absence of structured input.

For instance, the author names are not consistently prefixed, they appear sometimes under "Subtitle", and rarely under "Author" tag.

#!/bin/bash

CATALOG=/tmp/s

function usage()
{
    echo "Usage:"
    echo "$0 [etext <key>] [author <id>]"
    exit 1;
}

function process_etext()
{
    local searchKey=$1
    egrep "${searchKey}" ${CATALOG} | awk -F"${searchKey}" '{print $1}'
}

function process_author()
{
    local searchKey=$1
    egrep -b1 "${searchKey}" ${CATALOG} | egrep "[[:digit:]]{5}" 
}


for key in "$@"
do
    key="$1"
    case $key in
    etext|author)
        process_${key} $2
        shift; shift;
        ;;
    *)
        [ -z ${key} ] || usage
        ;;
    esac
done

4 Comments

If you are hoping \d would match a digit, it doesn't in egrep (but if you have grep -P it could work; or use [[:digit:]]{5} instead of \d\d\d\d\d)
@tripleee Thanks for the tip. I'm using Mac, and \d works as used in answer above.
Sure, it might work on some platforms, but it's not portable. Weird that it works on MacOS - they threw out a perfectly good grep -P because it wasn't POSIX.
True. Updated as suggested.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.