4

This one work:

arr[0]="XX1 1"
arr[1]="XX2 2" 
arr[2]="XX3 3"
arr[3]="XX4 4"
arr[4]="XX5 5"
arr[5]="XX1 1"
arr[6]="XX7 7"
arr[7]="XX8 8"

duplicate() { printf '%s\n' "${arr[@]}" | sort -cu |& awk -F: '{ print $5 }'; }

duplicate_match=$(duplicate)

echo "array: ${arr[@]}"

# echo "duplicate: $duplicate_match"

[[ ! $duplicate_match ]] || { echo "Found duplicate:$duplicate_match"; exit 0; }

echo "no duplicate"

with same code, this one doesn't work, why ?

arr[0]="XX"
arr[1]="wXyz" 
arr[2]="ABC"
arr[3]="XX"
5
  • Your code doesn't actually work, because sort -cu fails when the input is not already sorted; the duplicate it finds in the first data set just happens to be the first item that occurs out of sorted order. Commented Feb 26, 2014 at 23:02
  • the pipe-ampersand combination is only valid in c-shell, not in bash Commented Feb 26, 2014 at 23:04
  • @chepner Thanks, i will search for how to sort my array in the right place. Commented Feb 26, 2014 at 23:13
  • @thom |& was added to bash as well in version 4. Commented Feb 27, 2014 at 0:39
  • @chepner thanks, I stand corrected. pipe-ampersand is indeed valid. Commented Feb 27, 2014 at 0:59

2 Answers 2

6

To check duplicates this code is much simpler and works in both cases:

uniqueNum=$(printf '%s\n' "${arr[@]}"|awk '!($0 in seen){seen[$0];c++} END {print c}')

(( uniqueNum != ${#arr[@]} )) && echo "Found duplicates"

EDIT: To print duplicates use this awk:

printf '%s\n' "${arr[@]}"|awk '!($0 in seen){seen[$0];next} 1'

Awk command stores in an array seen if a line isn't already part of seen array and next move to the next line. 1 in the end prints only those lines that are duplicates.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Anubhava, i need to study your code to fully understand it, how can i return the duplicate element in echo with it please? Also anyone can correct my code please? i'm on this since two hour and finishing using another code without understanding mine is frustrating :(
See chepner's answer below why your code failed if you want to understand it.
I have also added some explanation to my answer.
@Neeraj: Try this: printf '%s\n' "${arr[@]}" | awk '!seen[$0]++ {} END {print length(seen)}'
1

Slightly silly solution here. I just wanted to see if I could do this in a single command without explicit pipes. (I think for very large arrays/array elements, explicit pipes might actually be more efficient.)

Note that this is a test for the presence of duplicate array elements, and doesn't output the duplicates themselves, although the awk command on its own will do that. Also note that if you're unlucky enough to have array elements that contain spaces, the below won't evaluate as described.

[[ $( awk -v RS=" " ' a[$0]++ ' <<< "${arr[@]} " ) ]] && echo "dups found"

Explanation:

awk -v RS=" "

  • do the subsequent awk command on each input record with space as the record separator. Basically, this will make awk treat each array element as a separate "line".

' a[$0]++ '

  • awk command that does two things:

    • return at the value at key $0 in array a. If this is greater than 0, print the line. Compare to awk ' { $1=$2 } 1 '

    • Add 1 to the value at key $0 in array a.

<<< "${arr[@]} "

  • as the input of the awk command, use the string created when you print each element in arr as a separate word, i.e. separated by space PLUS AN ADDITIONAL SPACE AT THE END.

  • The space between } and " is actually really important, because without it the final array element will not have a space after it and therefore will not be counted as a distinct "record" by awk.

[[ $( ... ) ]]

  • If the containing awk command gives any output at all, the test evaluates to 0, i.e. TRUE.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.