0

I have an array of items and I need to find the matching ones(duplicates). I have the simplest O(n^2) algorithm running for now. Item type doesn't really matter, but if you want to know it's image.

myarray;   
for(i = 0; i < myarray.length - 1; i++) 
    for(int j = i+1; j < myarray.length; j++) 
        if(myarray[i] = myarray[j]) 
           output(names of items);

I tried Wikipedia and Google, but couldn't come out with an answer. Any links or algorithms or code in any language would be great.

4
  • What's the question? Do you want smaller O? Increased performance? Commented May 4, 2012 at 13:09
  • It's obvious that I do want increased performance, I think. Commented May 4, 2012 at 13:11
  • Well, an O(n) could out perform an O(log n) for certain values of n. See codinghorror.com/blog/2007/09/…. Basically, the only way to know if an algorithm is quicker is to implement and profile the code. O(n) is a measure of complexity. An O(log n) algorithm may require more complex memory usage than an O(n) for example. Commented May 4, 2012 at 13:16
  • Ah, I got your question now. Total size of items are 200MB, so less memory usage is not my priority. I want to do things fast. Commented May 4, 2012 at 13:22

3 Answers 3

1

Rather than sort and then compare adjacent items, why not add each item to a self balancing binary tree, thus you get the 'already present' check for free (sort of).

Sign up to request clarification or add additional context in comments.

5 Comments

«thus you get the 'already present' check for free (sort of)» -> that's wrong, it will be O(n*Log(n))
@Thomash: Well, the insertion algorithm can return two values: it's not been seen before (new node in tree) or it's been seen before (there's a node with the same item already). So the problem is a single pass over the 'myarray' data rather than two - sorting then comparing adjacent items. I made no comment on the O value, it was more of a comment on the implementation - you get the matching information from the insertion process.
Ok I understand, but I am not convinced that it is actually faster. I thing a sort can be faster and use less memory (or no memory at all) than a self balancing tree.
@Thomash: Well, some O(log n) sort algorithms have bad worst case scenarios (qsort has O(n.log n) worst case I think). Unless I'm mistaken (which is entirely possible) the worst case for this is still O(n.log n). But then, writing a self balancing data structure is a lot more complex than calling qsort. The tree has little extra memory (3 * n + 1) * sizeof pointer (3 pointers per node - left, right and value - plus root pointer). We can argue about O notation all day but in the end it's only when there's some real code being run though a profiler will the true answer be known as to what's best
Good sorts are O(n*Log(n)) and qsort is O(n²) in the worst case. But what I wanted to say is that a tree uses extra space whereas a sort can be made in place and even if the complexity is the same, a good qsort is much faster than a balanced tree.
1

If you can find an order on the items, sort them. Then it will be very simple to find items that are equal because they will be next to each other.

This is only O(n*Log(n)).

3 Comments

I don't understand the question. The complexity of the sort is O(n*Log(n)).
@userunknown Sorting is O(n lg n); the "merge" is O(n).
I would like to create a solution that works for an array that continuously change, so sorting is not a good option for me.
1

To find duplicates in your array you can sort and scan the list, looking for adjacent identical items in O(n log n). If you only want to output duplicates, and memory is not an issue, you can keep a hashSet of elements you've already seen, go through the array, check if the current element is is in your set. Output it as duplicate if it is, insert it to the set otherwise. That would be O(n)

2 Comments

so what are "matching one"? duplicates?
@IsmetAlkan: I needed to look twice too, to see you're looking for duplicates.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.