Lately I had an issue with an array that contained some hundred thousands of values and the only thing I wanted to do was to check whether a value was already present. In my case this were IPs from a webserver log. So basically something like:
in_array(ip2long(ip),$myarray) did the job
However the lookup time increased dramatically and 10k of lookups took around 17 seconds or so.
So in this case I didn't care whether I had duplicates or not, I just needed to check for existence. So I could store the IPs in the index like this:
isset($myarray[ip2long($ip)])
And boom, lookup times went down from 17 seconds (and more) to a static time of 0.8 seconds for 10k lookups. As a value for the array entry I just used int 1.
I think the array index is probably based on some b-tree which should have log(n) lookup time and the index on a hashmap.
In my case using the index worked fine, but are there any data structures where I can use hashmaps as a value index, where multiple values may also occour (i realize that this makes only sense if do not have too many duplicates and I cannot use range/search requests efficiently, which is the primary benefit of tree structures)?