Parse array based on variable and nth character

Question

Looking to be able to parse an array based on a variable and take the next 2 characters

array=( 7501 7302 8403 9904 )

if var = 73, result desired is 02
if var = 75, result desired is 01
if var = 84, result desired is 03
if var = 99, result desired is 04

Sorry if this is an elementary question, but I've tried variations of cut and grep and cannot find the solution.

Any help is greatly appreciated.

anubhava · Accepted Answer · 2017-02-20 22:02:58Z

2

You can use this search function using printf and awk:

srch() {
    printf "%s\n" "${array[@]}" | awk -v s="$1" 'substr($1, 1, 2) == s{
    print substr($1, 3)}' ;
}

Then use it as:

srch 75
01

srch 73
02

srch 84
03

srch 99
04

answered Feb 20, 2017 at 22:02

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Charles Duffy Over a year ago

Hmm. If it really is just two-digit keys, you're right that we can't have inputs so long that the difference between O(1) and O(n) is all that big in practice; probably would be sticking in the space where startup and subshell/pipeline costs are the main drivers. OTOH, if this lookup is done inside a tight loop (ie. one per line of input), then we could get to a place where calls can add up, just from the aforementioned constant-factor expenses.

l'L'l Over a year ago

@CharlesDuffy: 1e+37 iterations (awk: r:0.573s, u:0.276s, s:0.253s) vs (map: r:0.215s, u:0.104s, s:0.079s)

anubhava Over a year ago

Thanks for this benchmark. Above function will become faster if I run printf "%s\n" "${array[@]}" > tempFIle.tmp and then keep only awk -v s="$1" 'substr($1, 1, 2) == s{ print substr($1, 3)}' tempFIle.tmp in the function body.

Charles Duffy Over a year ago

@l'L'l, might I ask for implementation details on that test? I'd like to reproduce -- I'm seeing a much bigger difference than that between for ((i=0; i<1000000; ++i)); do : "${replacements[37]}"; done -- at 15.4s per million iterations -- and for ((i=0; i<1000000; ++i)); do srch 37; done >/dev/null (which I'm too impatient to let complete before posting this comment).

l'L'l Over a year ago

@CharlesDuffy: Sorry, I was away for a few minutes. You have to consider the test I ran was on a cray xe6, so it was blazingly fast. gist.github.com/anonymous/27859e55f9726619f339db0fb96d0b30. I shortened up the loops since you're probably not on a cray :) I think you'll see a drastic difference between the map method and awk in normal setting in the time it actually takes.

|

Charles Duffy · Accepted Answer · 2017-02-20 22:17:22Z

2

Since bash arrays are sparse, even in older versions of bash that don't have associative arrays (mapping arbitrary strings as keys), you could have a regular array that has keys only for numeric indexes that you wish to map. Consider the following code, which takes your input array and generates an output array of that form:

array=( 7501 7302 8403 9904 )

replacements=( )                    # create an empty array to map source to dest
for arg in "${array[@]}"; do        # for each entry in our array...
  replacements[${arg:0:2}]=${arg:2} # map the first two characters to the remainder.
done

This will create an array that looks like (if you ran declare -p replacements after the above code to dump a description of the replacements variable):

# "declare -p replacements" will then print this description of the new array generated...
# ...by the code given above:
declare -a replacements='([73]="02" [75]="01" [84]="03" [99]="04")'

You can then trivially look up any entry in it as a constant-time operation that requires no external commands:

$ echo "${replacements[73]}"
02

...or iterate through the keys and associated values independently:

for key in "${!replacements[@]}"; do
  value=${replacements[$key]}
  echo "Key $key has value $value"
done

...which will emit:

Key 73 has value 02
Key 75 has value 01
Key 84 has value 03
Key 99 has value 04

Notes/References:

See the bash-hackers wiki on parameter expansion for understanding of the syntax used to slice the elements (${arg:0:2} and ${arg:2}).
See BashFAQ #5 or the BashGuide on arrays for more details on the syntax used above.

edited Feb 20, 2017 at 22:17

answered Feb 20, 2017 at 22:06

Charles Duffy

299k43 gold badges442 silver badges498 bronze badges

3 Comments

l'L'l Over a year ago

Nice solution, question: what happens if you have duplicate entries (e.g.. [73] x 2)?

Charles Duffy Over a year ago

@l'L'l, the latter overwrites the former. I don't see anything specified in the question that that behavior would conflict with.

Charles Duffy Over a year ago

Hmm. I suppose correct behavior depends on what the OP wants -- the way I read the awk-based answer, it would print suffixes for both matching entries if you had a duplicate. Could extend this to do likewise, if desired (or extend the awk answer to use first or last match only), but probably worth getting an explicit spec.

Collectives™ on Stack Overflow

Parse array based on variable and nth character

2 Answers 2

10 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related