Building a parser to use in script (bash)

Question

I need to discovery patterns in a string by bash, I would like put auto-execution with crontab.

I have a string that contain data like %d/%m/%Y %H:%i aaa bbb ccc 123456 ddd 7890 eee and something like that. It's a report.

I thought to define constants like string masks, and compare every substring with my masks. I think I will use a mix with lenght and char positition.

I'm googling to have better idea and watch some other implementation, but I'm not finding useful results.

Any suggestion? Thanks.

Edit: some sample of input

01/01/2015 06:20 EXAMPLE 2 (001) Foo bar X(12) 
02/01/2015 12:03 EXAMPLE 1 (000) 01234567 Baz bax X(04) 
03/01/2015 10:37 EXAMPLE 1 (000) Bam bac (X:1-16). [ SOMEGUY ] 
04/01/2015 11:04 EXAMPLE 2 (001) 12345678 Baz bax X(12) SOMEACTION 
05/01/2015 12:34 EXAMPLE 2 (001) 45678901 Bim bum X(01) SOMEACTION NAME SURNAME
08/08/2015 19:00 SOMEGUY Bic bac. [ SOMEGUY ] 
01/01/2015 11:34 EXAMPLE 2 (001) 78901234 Gic gia gim X(01)

whereas as output I need

variabile $date $time $example $codeline $action $message $name $surname

Edit2: I forgot to say I'm looping that lines with this

while IFS=' ' read -ra field; do
...
done <<< "$line"

As described above, this sounds like a task for grep. Otherwise you need to improve your question with 3 lines of sample input (including one line that should NOT be processed), AND your required output from the sample input. You should read enough about grep (many tutorials available) that you can improve your question with an attempt with a reg-exp to match the lines yous want to capture. Otherwise you're likely to get downvoted and close votes. Good luck. — shellter
– shellter, Commented Oct 7, 2015 at 14:42
One suggestion: build a very concise example, and show us what would be the output. — Rubens
– Rubens, Commented Oct 7, 2015 at 14:44
ok i will try few examples 01/01/2015 06:20 EXAMPLE 2 (001) Foo bar X(12) 02/01/2015 12:03 EXAMPLE 1 (000) 01234567 Baz bax X(04) 03/01/2015 10:37 EXAMPLE 1 (000) Bam bac (X:1-16). [ SOMEGUY ] 04/01/2015 11:04 EXAMPLE 2 (001) 12345678 Baz bax X(12) SOMEACTION 05/01/2015 12:34 EXAMPLE 2 (001) 45678901 Bim bum X(01) SOMEACTION NAME SURNAME 08/08/2015 19:00 SOMEGUY Bic bac. [ SOMEGUY ] 01/01/2015 11:34 EXAMPLE 2 (001) 78901234 Gic gia gim X(01) well, this are examples coming from my real world — rivaldid
– rivaldid, Commented Oct 7, 2015 at 23:03
please edit your question to include your sample input and expected outputs. Use the {} tool at the top left of the edit box after highlighting your text with line breaks. Good luck. — shellter
– shellter, Commented Oct 9, 2015 at 0:05
again, edit your question to include the expected output. $date $time is easy, what about $name $surname $action. It seems like your data is a jumble of incomplete information. You'll need to show that you've tried to solve at least some of this on your own. Have you worked thru an awk tutorial or two? It could be very helpful. see grymoire.com/Unix/Awk.html ? Good luck. — shellter
– shellter, Commented Oct 9, 2015 at 21:25

Community · Accepted Answer · 2017-05-23 12:07:23Z

1

Use date to format your string:

$ date +"%d/%m/%Y %H:%m aaa bbb ccc 123456 ddd 7890 eee"
09/10/2015 14:10 aaa bbb ccc 123456 ddd 7890 eee

if that's what you meant.

Alternatively use printf, for example:

printf "%s/%s/%s %s:%s aa bb cc" 2015 01 01 00 00

or create equivalent sprintf function:

sprintf() { local stdin; read -d '' -u 0 stdin; printf "$@" "$stdin"; }

If you want to read other way round, use read, e.g.:

while IFS=':/ ' read d m y h m _; do echo "$d $m $y $h $m"; done < data.txt

For more examples, see: How do I split a string on a delimiter in Bash?

edited May 23, 2017 at 12:07

CommunityBot

11 silver badge

answered Oct 9, 2015 at 13:04

kenorb

169k95 gold badges712 silver badges796 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Newbie · Accepted Answer · 2015-10-09 21:34:54Z

1

Could be an aplroach more complex than what you need. But you are going in the same way... so:

Have you ever heard about machine learning tecniques used to recognize images? They are actually using many different masks (in your case a string mask) that you will need to chose randomly and then correct stocasticaly upon analises. XOR the mask with the string and sum characters value to a int. You will get a number for each mask, and you will actually produce a hash that tells you the matching of the string to your masks. Comparing similar hashs (with int numbers close to each other) those will be similar strings.

This is a tip. You can go easier or deeper, depend on your requirements.

answered Oct 9, 2015 at 21:34

Newbie

4,88816 silver badges26 bronze badges

2 Comments

rivaldid Over a year ago

Yes, this solution could be the more strong and heavy. Actually I'm writing a work-around with perl and regex, so I define my constants date time foo bar baz to build mask-lines, variables of variables to take in short, in a given when construct. Seems quite easy with perl, bash looks hard for this type of operations.

Newbie Over a year ago

You didn't defined exactly the scenario, so i gave you the most stabile and versatile solution!

rivaldid · Accepted Answer · 2015-10-19 14:30:49Z

0

At the end I solved with perl and regex, I have defined my string masks $FOO $BAR $BAZ, and then I have compared my input string with them

if ($myinputstring =~ $FOO) { 
 statement 
} elseif($myinputstring =~ $BAR) {
 otherstatment
} elseif ($myinputstring =~ $BAZ) {
 someotherstatement
} else {
 print_to_unmatched_log
}

Thanks

answered Oct 19, 2015 at 14:30

rivaldid

853 silver badges12 bronze badges

Comments

rivaldid · Accepted Answer · 2015-12-02 17:24:49Z

0

at the end I have simplified my issue and I got back the bash solution. This is a fast pseudo, tell me what do you think about.

pre:
myregex1="^[0-9]{2}/[0-9]{2}/[0-9]{4}[[:space:]][0-9]{2}:[0-9]{2}$"
myregex2="^[[:space:]]\([0-9]{3}\)$"
myregex3="^[[:space:]][0-9]{8}$"
myregex4="^foo[[:space:]]bar$"
myregex5="^[[:space:]]baz\([0-9]{3}\)$"
...
nospace() { printf "$1" | sed -e 's/^[[:space:]]*//'; }



   the code:
    while loop each line of my source text file; do
    buffer="";i=0
    while IFS= read -r -N 1 char; do
    buffer+="$char"; let "i++"
    if [[ $buffer =~ $myregex1 ]]; then printf -v myvar1 "$(nospace "$buffer")"; i=$(( $i - ${#buffer} )); buffer="${buffer::-$i}" 
    elif [[ $buffer =~ $myregex2 ]]; then printf -v myvar2 SAME_STATEMENT_BEFORE
    elif SAME_STATEMENT_BEOFRE_WITH_MYVAR3
    elif ...
    fi
    done <<< "$mylinegotfromtextfile"
    done < $mytextfile

That's all, did you know a better solution?

edited Dec 2, 2015 at 17:24

answered Dec 2, 2015 at 17:18

rivaldid

853 silver badges12 bronze badges

1 Comment

rivaldid Over a year ago

just to explain, for every char of the line, append this char in a buffer and increase an index, if buffer match with one of my regex then clean from space and put in a dedicated variable, decrease the index with the length of buffer and strip from buffer the last character. in this way I can have the unmatched pattern.

Collectives™ on Stack Overflow

Building a parser to use in script (bash)

4 Answers 4

Comments

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related