1

I search the most efficient way to get the difference between two arrays of arrays. At this point I don't know if working with hash would be better.

I have two arrays of array containing one id and one datetime converted to int.

a = [[1, 1234],[2, 7345],[3, 12769],[4, 13456], [5, 34765]]
b = [[1, 1234],[3, 12769],[2, 7345],[5, 39875],[4, 13459]]

My goal is to know if the date contained in each arrays of a is superior to the date contained with the same id in b and keep the arrays that match the comparaison otherwise I would have done something like a - b.

What is the fastest and cleanest way even with large amounts of arrays ?

The alternative way would be with hash, I don't really know what to use.

a = [{id: 1, date: 1234},{id: 2, date: 7345},{id: 3, date: 12769},{id: 4, date: 13456},{id: 5, date: 34765}]
b = [{id: 1, date: 1234},{id: 3, date: 12769},{id: 2, date: 7345},{id: 5, date: 39875}, {id: 4, date: 13459}]

What are your thoughts ?

5
  • When you say large amounts of arrays, how large? Are the arrays too big to keep in memory? If so, that complicates things a bit. Are the two data structures relatively sorted to each other like in your example? In other words a[i]==b[i] will always be true? If so, that simplifies things a bit. Commented Jan 25, 2021 at 22:34
  • @PaulNikonowicz I mean array with few ten of thousands ids and dates inside, it might not be sorted like a[i]==b[i] all the time. Commented Jan 25, 2021 at 22:36
  • Well, if it all fits in memory (it sounds like it might), then we could start by sorting both arrays. Using a hash vs an array isn't going to make this any easier to develop, but it might make things easier to read. I think an array is fine though. Commented Jan 25, 2021 at 22:42
  • With a unchanged could b equal [[2, 7345],[1, 1234],[3, 12769],[4, 13459], [5, 39875]]? Commented Jan 25, 2021 at 23:05
  • @CarySwoveland Yes b could be equal to something like that. Commented Jan 25, 2021 at 23:07

2 Answers 2

3

It is easier and better performing to have b be a hash. Fortunately, an array of 2-element arrays can be directly converted to hash with .to_h (and back to array with .to_a).

# this will make an { <id> => <date> } hash
b_hash = b.to_h

Now the filter step just involves a select over a, checking the associated values in b_hash. I use |(id, date)| to destructure the array into its individual elements:

result = a.select do |(id, date)|
  b_hash[id] > date
end

Note that you do want to keep the .to_h call outside the select loop since it's an O(N) operation.

You could do it without converting b.to_h, you would just need to loop through b for each element of a, taking the time complexity from O(N) to O(N^2)

Sign up to request clarification or add additional context in comments.

1 Comment

I think the OP gave an example of an array of arrays vs an array of hashes. I think what you're trying to illustrate is that a Hash (no arrays at all) will perform better here.
3

Suppose a and b are defined as follows.

a = [[1, 1235] ,[2, 7345], [3, 12760],[4, 13466], [5, 34765]]
b = [[5, 39875],[3, 12769],[2, 7345], [1, 1234],  [4, 13459]]

Then

a.select { |id,date| date > b.assoc(id).last }
  #=> [[1, 1235], [4, 13466]]

See Array#assoc. This is not the most efficient solution but it may be fine if a and b are not too large. I posted it mainly because it uses a method that I rarely use.

2 Comments

What an oddly specific method. I guess you found the use case right here!
@maxpleaner, "I rarely use" probably should be "I've never used" (just as I've never used for). Note there is also Array#rassoc, which is even more bizarre.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.