1

How can I remove all substrings of another string within an array of strings? I want this array of strings:

arr = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"]

to shrink to:

["Bochum", "Stuttt", "Stuttgart", "Heesestr.", "Berlin"]

Edit:

  • Order does not need to be preserved. Sorting the elements is fine, if it helps.
  • Assume arr is unique, with no dups.
2
  • 2
    What if the array included ochu? A small suggestion: when you give examples, assign a variable to each input object (e.g, arr = ["Bochum", ...]). That way readers can refer to those variables (e.g., arr) in comments and answers without having to define them. Commented Jan 7, 2016 at 4:08
  • When you edit your question after an answer has been posted it's best to leave what you had and add to it, making clear that it is an edit (e.g., "Edit: ..."). If you don't do that you may render answers or comments incorrect or meaningless. btw, you didn't answer my question re ochu. Commented Jan 7, 2016 at 14:02

6 Answers 6

2

If you're not opposed to the use of brute force:

arr = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart",
       "Heesestr.", "Berl", "Berlin"]

arr.each_with_object([]) { |str,a|
  a << str unless arr.any? { |s| s.include?(str) && s.size > str.size } }
  #=> ["Bochum", "Stuttt", "Stuttgart", "Heesestr.", "Berlin"] 
Sign up to request clarification or add additional context in comments.

Comments

0

One-Liner with Sort, Grep, and Count

Assuming your array elements always start with the same letters, one way to remove substrings is to sort, which will place shorter elements first. You can then reject elements that have longer matches deeper into the array. For example:

array = %w[Bochum Stu Stut Stuttt Stutt Stuttgart Heesestr. Berl Berlin]
array.sort.reject { |elem| array.grep(/\A#{elem}/).count > 1 }
#=> ["Berlin", "Bochum", "Heesestr.", "Stuttgart", "Stuttt"]

If your array shouldn't be sorted, then this is not the right solution for you. However, it definitely contains the right array elements, and is both short and easy to read. Your mileage may vary.

2 Comments

A possible edge problem: if arr = ["Stu", "Stu"] it returns an empty array. I say "possible" because I don't know what the OP wants in this case or even permits dups.
A tiny tweak to your response, got me to what I've decided to go with: array.sort.select { |elem| array.grep(/\A#{elem}/).one? }
0

A solution that does not preserve the order:

["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart",
   "Heesestr.", "Berlin", "Berl"].sort_by(&:size).reduce([]) do |ary, word|
  ary.reject{|s| word.include?(s)}.push(word)
end

Comments

0

No need for Rails, plain Ruby will do:

my_array =["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"]

my_array.keep_if {|x| my_array.join(",").scan(x).length == 1}

1 Comment

yes, that should be my_array. Was trying to clean up variable names, and left one out... sorry..
0

Here's an implementation using a Trie-like data structure. It achieves the goal by simply losing information :-)

(I've assumed you only care about strings being prefixes of each other, rather than substrings...)

class LossyTrie
  def initialize; @dict = {}; end

  def add(str)
    # Break the new string apart into characters, traversing down the trie at each step.
    # As a side effect, if a prefix of str was already present, it will be forgotten.
    # Similarly, if str itself is a prefix of an existing string, nothing will change.
    dict = @dict
    str.each_char do |c|
      dict = (dict[c] ||= {})
    end
  end

  def all_strings
    strs = []
    def traverse(dict, so_far, &block)
      for k, v in dict
        if v.empty?
          block.call(so_far + k)
        else
          traverse(v, so_far + k, &block)
        end
      end
    end
    traverse(@dict, "") { |leaf| strs << leaf }
    strs
  end
end

strs = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"]

trie = LossyTrie.new
strs.each { |s| trie.add(s) }

trie.all_strings # => ["Bochum", "Berlin", "Stuttt", "Stuttgart", "Heesestr."]

Comments

0

Find the sub-strings and remove them, might be not good but clear

ar = ["Bochum", "Stu", "Stut", "Stuttt", "Stutt", "Stuttgart", "Heesestr.", "Berl", "Berlin"] 
sub_strings = []
ar.collect do |string|
  for index in 0...string.length
    sub_strings << string[0...index] if ar.include?(string[0...index]) 
  end
end
results = ar - sub_strings

1 Comment

@CarySwoveland thank you! haha just too many Objective C code due to it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.