Matches in Array Algorithm

Question

I have an array of items and I need to find the matching ones(duplicates). I have the simplest O(n^2) algorithm running for now. Item type doesn't really matter, but if you want to know it's image.

myarray;   
for(i = 0; i < myarray.length - 1; i++) 
    for(int j = i+1; j < myarray.length; j++) 
        if(myarray[i] = myarray[j]) 
           output(names of items);

I tried Wikipedia and Google, but couldn't come out with an answer. Any links or algorithms or code in any language would be great.

What's the question? Do you want smaller O? Increased performance? — Skizz
– Skizz, Commented May 4, 2012 at 13:09
Well, an O(n) could out perform an O(log n) for certain values of n. See codinghorror.com/blog/2007/09/…. Basically, the only way to know if an algorithm is quicker is to implement and profile the code. O(n) is a measure of complexity. An O(log n) algorithm may require more complex memory usage than an O(n) for example. — Skizz
– Skizz, Commented May 4, 2012 at 13:16
Ah, I got your question now. Total size of items are 200MB, so less memory usage is not my priority. I want to do things fast. — İsmet Alkan
– İsmet Alkan, Commented May 4, 2012 at 13:22

Skizz · Accepted Answer · 2012-05-04 13:12:48Z

1

Rather than sort and then compare adjacent items, why not add each item to a self balancing binary tree, thus you get the 'already present' check for free (sort of).

answered May 4, 2012 at 13:12

Skizz

71.4k10 gold badges75 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Thomash Over a year ago

«thus you get the 'already present' check for free (sort of)» -> that's wrong, it will be O(n*Log(n))

Skizz Over a year ago

@Thomash: Well, the insertion algorithm can return two values: it's not been seen before (new node in tree) or it's been seen before (there's a node with the same item already). So the problem is a single pass over the 'myarray' data rather than two - sorting then comparing adjacent items. I made no comment on the O value, it was more of a comment on the implementation - you get the matching information from the insertion process.

Thomash Over a year ago

Ok I understand, but I am not convinced that it is actually faster. I thing a sort can be faster and use less memory (or no memory at all) than a self balancing tree.

Skizz Over a year ago

@Thomash: Well, some O(log n) sort algorithms have bad worst case scenarios (qsort has O(n.log n) worst case I think). Unless I'm mistaken (which is entirely possible) the worst case for this is still O(n.log n). But then, writing a self balancing data structure is a lot more complex than calling qsort. The tree has little extra memory (3 * n + 1) * sizeof pointer (3 pointers per node - left, right and value - plus root pointer). We can argue about O notation all day but in the end it's only when there's some real code being run though a profiler will the true answer be known as to what's best

Thomash Over a year ago

Good sorts are O(n*Log(n)) and qsort is O(n²) in the worst case. But what I wanted to say is that a tree uses extra space whereas a sort can be made in place and even if the complexity is the same, a good qsort is much faster than a balanced tree.

Thomash · Accepted Answer · 2012-05-04 13:09:23Z

1

If you can find an order on the items, sort them. Then it will be very simple to find items that are equal because they will be next to each other.

This is only O(n*Log(n)).

answered May 4, 2012 at 13:09

Thomash

6,3791 gold badge32 silver badges50 bronze badges

3 Comments

Thomash Over a year ago

I don't understand the question. The complexity of the sort is O(n*Log(n)).

Danica Over a year ago

@userunknown Sorting is O(n lg n); the "merge" is O(n).

İsmet Alkan Over a year ago

I would like to create a solution that works for an array that continuously change, so sorting is not a good option for me.

b.buchhold · Accepted Answer · 2012-05-04 13:25:47Z

1

To find duplicates in your array you can sort and scan the list, looking for adjacent identical items in O(n log n). If you only want to output duplicates, and memory is not an issue, you can keep a hashSet of elements you've already seen, go through the array, check if the current element is is in your set. Output it as duplicate if it is, insert it to the set otherwise. That would be O(n)

edited May 4, 2012 at 13:25

answered May 4, 2012 at 13:11

b.buchhold

3,9062 gold badges29 silver badges34 bronze badges

2 Comments

b.buchhold Over a year ago

so what are "matching one"? duplicates?

user unknown Over a year ago

@IsmetAlkan: I needed to look twice too, to see you're looking for duplicates.

Collectives™ on Stack Overflow

Matches in Array Algorithm

3 Answers 3

5 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related