4

I have an array of hashes in the format shown below and I am attempting to sort the :book key of the hash based on a separate array. The order is not alphabetical and for my use case it cannot be alphabetical.

I need to sort based on the following array:

array = ['Matthew', 'Mark', 'Acts', '1John']

Note that I've seen several solutions that leverage Array#index (such as Sorting an Array of hashes based on an Array of sorted values) to perform a custom sort but that will not work with strings.

I've tried various combinations of sorting with Array#sort and Array#sort_by but they don't seem to accept a custom order. What am I missing? Thank you in advance for your help!

Array of Hashes

[{:book=>"Matthew",
  :chapter=>"4",
  :section=>"new_testament"},
 {:book=>"Matthew",
  :chapter=>"22",
  :section=>"new_testament"},
 {:book=>"Mark",
  :chapter=>"6",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"9",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"17",
  :section=>"new_testament"}]

5 Answers 5

9

You can use sort_by with index

arr = [{a: 1}, {a: 3}, {a: 2}] 

order = [2,1,3]  

arr.sort_by { |elem| order.index(elem[:a]) }                                           
# => [{:a=>2}, {:a=>1}, {:a=>3}]  

You can make it slightly faster by indexing the list of elements you want to order by:

order_with_index = order.each.with_object.with_index({}) do |(elem, memo), idx|
  memo[elem] = idx
end

then instead of order.index(<val>) use order_with_index[<val>]

Sign up to request clarification or add additional context in comments.

10 Comments

Ah this one is better than my solution as sort_by is expensive
Max, thank you so much for the quick reply! I'll try this out. I didn't think index could be used to compare string order. This is very elegant--really appreciate your time.
@ChaitanyaKale, sort_by is cheap, not expensive. That's because the construction of a hash that maps elements to the sorting criterion is done only once, prior to the sorting operation. By contrast, with <=> indices must be calculated twice for each pairwise comparison, which is much slower than performing two hash lookups.
Ah Thanks @CarySwoveland! I read ruby-doc.org/core-2.2.0/Enumerable.html#method-i-sort_by which mentions it being expensive "The current implementation of sort_by generates an array of tuples containing the original collection element and the mapped value. This makes sort_by fairly expensive when the keysets are simple."
@ChaitanyaKale, I'm amazed that sort_by uses an array of two-element arrays. I just assumed it would be a hash for fast lookups. Even so, sort_by is much faster than sort in many situations.
|
4

As can be seen from the documentation, Array#index indeed does work for strings (it's even the provided example), so this would work:

books.sort_by { |b| array.index(b[:book]) }

But instead of repeatedly searching through array, you can just determine the order once and then look it up:

order = array.each.with_index.to_h
#=> { "Matthew" => 0, "Mark" => 1, "Acts" => 2, "1John" => 3 }
books.sort_by { |b| order[b[:book]] }

7 Comments

I somehow missed your answer earlier. I expected the hash to speed up sort_by quite a lot, but my benchmarks suggest that it doesn't.
The array is probably too small for that to make a noticeable difference. But for larger arrays I'd certainly use this approach.
Ah, I missed your update with all the BM results above. Seems like even for biggish arrays the difference between my and your solution is negligible. Maybe the implementation of sort_by with a two element array offsets the benefit of using a hash for generating the external order, so I'd say one should use whatever code makes semantically the most sense. Thanks for the benchmarks!
Apologies Michael, I am just seeing this now. Thank you so much for taking the time to put this together. This is definitely a more readable solution and I have upvoted it. It is nice to understand how the backend sorting works but I will definitely go this route in the future.
@KurtW Don't worry, it seems that you learned a lot from this answer and that's what SO is about.
|
4

Since you know the desired order there's no need to sort the array. Here's one way you could do that. (I've called your array of hashes bible.)

bible.group_by { |h| h[:book] }.values_at(*array).flatten
  #=> [{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
  #    {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"},
  #    {:book=>"Mark", :chapter=>"6", :section=>"new_testament"},
  #    {:book=>"Acts", :chapter=>"9", :section=>"new_testament"},
  #    {:book=>"Acts", :chapter=>"17", :section=>"new_testament"},
  #    {:book=>"1John", :chapter=>"1", :section=>"new_testament"},
  #    {:book=>"1John", :chapter=>"1", :section=>"new_testament"}] 

Since Enumerable#group_by, Hash#values_at and Array#flatten each require just one pass through the array bible this may be faster than sorting when bible is large.

Here are the steps.

h = bible.group_by { |h| h[:book] }
  #=> {"Matthew"=>[{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
  #                {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"}],
  #    "Mark"   =>[{:book=>"Mark", :chapter=>"6", :section=>"new_testament"}],
  #    "1John"  =>[{:book=>"1John", :chapter=>"1", :section=>"new_testament"},
  #                {:book=>"1John", :chapter=>"1", :section=>"new_testament"}],
  #    "Acts"   =>[{:book=>"Acts", :chapter=>"9", :section=>"new_testament"}, 
  #                {:book=>"Acts", :chapter=>"17", :section=>"new_testament"}]
  #   } 

a = h.values_at(*array)
  #=> h.values_at('Matthew', 'Mark', 'Acts', '1John')
  #=> [[{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
  #     {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"}],
  #    [{:book=>"Mark", :chapter=>"6", :section=>"new_testament"}],
  #    [{:book=>"Acts", :chapter=>"9", :section=>"new_testament"},
  #     {:book=>"Acts", :chapter=>"17", :section=>"new_testament"}],
  #    [{:book=>"1John", :chapter=>"1", :section=>"new_testament"},
  #     {:book=>"1John", :chapter=>"1", :section=>"new_testament"}]] 

Lastly, a.flatten returns the array shown earlier.

Let's do a benchmark.

require 'fruity'

@bible = [
  {:book=>"Matthew",
   :chapter=>"4",
   :section=>"new_testament"},
  {:book=>"Matthew",
   :chapter=>"22",
   :section=>"new_testament"},
  {:book=>"Mark",
   :chapter=>"6",
   :section=>"new_testament"},
  {:book=>"1John",
   :chapter=>"1",
   :section=>"new_testament"},
  {:book=>"1John",
   :chapter=>"1",
   :section=>"new_testament"},
  {:book=>"Acts",
   :chapter=>"9",
   :section=>"new_testament"},
  {:book=>"Acts",
   :chapter=>"17",
   :section=>"new_testament"}]

@order = ['Matthew', 'Mark', 'Acts', '1John']

def bench_em(n)
  arr = (@bible*((n/@bible.size.to_f).ceil))[0,n].shuffle
  puts "arr contains #{n} elements"
  compare do 
    _sort       { arr.sort { |h1,h2| @order.index(h1[:book]) <=>
                  @order.index(h2[:book]) }.size }
    _sort_by    { arr.sort_by { |h| @order.find_index(h[:book]) }.size }
    _sort_by_with_hash {[email protected]_index.to_h;
                        arr.sort_by {|b| ord[b[:book]]}.size}    
    _values_at  { arr.group_by { |h| h[:book] }.values_at(*@order).flatten.size }
  end
end

@maxpleaner, @ChaitanyaKale and @Michael Kohl contributed _sort, _sort_by, and sort_by_with_hash, respectively.

bench_em    100
arr contains 100 elements
Running each test 128 times. Test will take about 1 second.
_sort_by is similar to _sort_by_with_hash
_sort_by_with_hash is similar to _values_at
_values_at is faster than _sort by 2x ± 1.0

bench_em  1_000
arr contains 1000 elements
Running each test 16 times. Test will take about 1 second.
_sort_by_with_hash is similar to _values_at
_values_at is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

bench_em 10_000
arr contains 10000 elements
Running each test once. Test will take about 1 second.
_values_at is faster than _sort_by_with_hash by 10.000000000000009% ± 10.0%
_sort_by_with_hash is faster than _sort_by by 10.000000000000009% ± 10.0%
_sort_by is faster than _sort by 2x ± 0.1

bench_em 100_000
arr contains 100000 elements
Running each test once. Test will take about 3 seconds.
_values_at is similar to _sort_by_with_hash
_sort_by_with_hash is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

Here's a second run.

bench_em    100
arr contains 100 elements
Running each test 128 times. Test will take about 1 second.
_sort_by_with_hash is similar to _values_at
_values_at is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

bench_em  1_000
arr contains 1000 elements
Running each test 8 times. Test will take about 1 second.
_values_at is faster than _sort_by_with_hash by 10.000000000000009% ± 10.0%
_sort_by_with_hash is similar to _sort_by
_sort_by is faster than _sort by 2.2x ± 0.1

bench_em 10_000
arr contains 10000 elements
Running each test once. Test will take about 1 second.
_values_at is similar to _sort_by_with_hash
_sort_by_with_hash is similar to _sort_by
_sort_by is faster than _sort by 2x ± 1.0

bench_em 100_000
arr contains 100000 elements
Running each test once. Test will take about 3 seconds.
_sort_by_with_hash is similar to _values_at
_values_at is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

1 Comment

Cary, so sorry I didn't catch this yesterday. I was so glad to have a solution that I didn't follow up. This is really good analysis and my @bible array (or whatever it will be called) is quite large. If I get this thing off the ground, I may very well be comparing performance and use your solution in my final code. Thank you!!
2

As the description of Array#sort_by accepts a block. The block should return -1, 0, or +1 depending on the comparison between a and b. You can use find_index on the array to do such comparison.

array_of_hashes.sort_by {|a| array.find_index(a[:book]) } should do the trick.

1 Comment

Thank you for your time Chaitanya!
0

Your mistake is to think that you are sorting. But, in fact, you are not, you already have the order, you just need to place the elements. I'm not proposing a compact or optimal solution, but a simple solution. First convert your large array into a hash indexed by the :book key (which should have been your first data structure), and then just use map:

array = ['Matthew', 'Mark', 'Acts', '1John']
elements = [{:book=>"Matthew",
  :chapter=>"4",
  :section=>"new_testament"},
 {:book=>"Matthew",
  :chapter=>"22",
  :section=>"new_testament"},
 {:book=>"Mark",
  :chapter=>"6",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"9",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"17",
  :section=>"new_testament"}]
by_name = {}
for e in elements
  by_name[e[:book]] = e
end
final = array.map { |x| by_name[x] }

1 Comment

Hmmm. I didn't see there were more than one entry with the same name, forget this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.