4

I am trying to find all substrings within a given string. For a random string like rymis the subsequences would be [i, is, m, mi, mis, r, ry, rym, rymi, rymis, s, y, ym, ymi, ymis]. From Wikipedia, a string of a length of n will have n * (n + 1) / 2 total substrings.

Which can be found by doing the following snippet of code:

    final Set<String> substring_set = new TreeSet<String>();
    final String text = "rymis";

    for(int iter = 0; iter < text.length(); iter++)
    {
        for(int ator = 1; ator <= text.length() - iter; ator++)
        {
            substring_set.add(text.substring(iter, iter + ator));
        }
    }

Which works for small String lengths but obviously slows down for larger lengths as the algorithm is near O(n^2).

Also reading up on Suffix Trees which can do insertions in O(n) and noticed the same subsequences could be obtained by repeatedly inserting substrings by removing 1 character from the right until the string is empty. Which should be about O(1 + … + (n-1) + n) which is a summation of n -> n(n+1)/2 -> (n^2 + n)/ 2, which again is near O(n^2). Although there seems to be some Suffix Trees that can do insertions in log2(n) time which would be a factor better being O(n log2(n)).

Before I delve into Suffix Trees is this the correct route to be taking, is there some another algorithm that would be more efficient for this, or is O(n^2) as good as this will get?

7
  • 5
    Since the set contains n * (n + 1) / 2 values, you'll have to do n * (n + 1) / 2 insertions to the set, so I don't see how the algorithm could be less than O(n^2). Commented Feb 22, 2012 at 19:15
  • @JBNizet - I agree, there is no way to avoid touching every substring element. Since the size of the original set is n, and there are roughly n^2 elements to visit, this most likely cannot get more efficient. Commented Feb 22, 2012 at 19:28
  • This is not homework. Playing with the two other algorithms posted they are both slower than my original but I noticed the data structure may not be optimal. If the algorithm produces unique substrings already then TreeSet is not needed (the data structure can be sorted later) and a dynamic array also would be inefficient because of the large size of inserts (needing to expand its internal array and copy). Commented Feb 22, 2012 at 20:33
  • From some testing all three algorithms will generate the correct answer. My algorithm in the original post is empirically the quickest because it has the less constant time costs but it is not a significant difference. The problem is compounded when a String with repeat characters is added like aba where the substrings then start to become duplicates then it can no longer be assured that a structure contains only unique elements. Where if it could be assured that it did then a data structure like LinkedList would be a good speed boost over a *Set or ArrayList. Commented Feb 22, 2012 at 21:05
  • @ntin - Duplicates should not be an issue because you can always remove them faster than the current bottleneck. HeapSort the array, then iterate through it removing current item if previous item is the same. These two operations should be O(n log n) and O(n) respectively. Commented Feb 22, 2012 at 21:31

3 Answers 3

1

I am fairly sure you can't beat O(n^2) for this as has been mentioned in comments to the question.

I was interested in different ways of coding that so I made one quickly, and I decided to post it here.

The solution I put here isn't asymptotically faster I don't think, but when counting the inner and outer loops there are less. There are also less duplicate insertions here - no duplicate insertions.

String str = "rymis";
ArrayList<String> subs = new ArrayList<String>();
while (str.length() > 0) {
    subs.add(str);
    for (int i=1;i<str.length();i++) {
        subs.add(str.substring(i));
        subs.add(str.substring(0,i));
    }
    str = str.substring(1, Math.max(str.length()-1, 1));
}
Sign up to request clarification or add additional context in comments.

Comments

1

This is an inverted way of your example, but still o(n^2).

string s = "rymis";
ArrayList<string> al = new ArrayList<string>();
for(int i = 1; i < s.length(); i++){//collect substrings of length i
 for(int k = 0; k < s.length(); k++){//start index for sbstr len i
  if(i + k > s.length())break;//if the sbstr len i runs over end of s move on
  al.add(s.substring(k, k + i));//add sbstr len i at index k to al
 }
}

Let me see if I can post a recursive example. I started doing a couple recursive tries and came up with this iterative approach using dual sliding windows as a sort of improvement to the above method. I had a recursive example in mind but was having issues reducing the tree size.

string s = "rymis";
ArrayList<string> al = new ArrayList<string>();
for(int i = 1; i < s.length() + 1; i ++)
{
 for(int k = 0; k < s.length(); k++)
 {
  int a = k;//left bound window 1
  int b = k + i;//right bound window 1
  int c = s.length() - 1 - k - i;//left bound window 2
  int d = s.length() - 1 - k;//right bound window 2
  al.add(s.substring(a,b));//add window 1
  if(a < c)al.add(s.substring(c,d));//add window 2
 }
}

There was an issue mentioned with using arraylist affecting performance so this next one will be with more basic structures.

string s = "rymis";
StringBuilder sb = new StringBuilder();
for(int i = 1; i < s.length() + 1; i ++)
{
 for(int k = 0; k < s.length(); k++)
 {
  int a = k;//left bound window 1
  int b = k + i;//right bound window 1
  int c = s.length() - 1 - k - i;//left bound window 2
  int d = s.length() - 1 - k;//right bound window 2
  if(i > 1 && k > 0)sb.append(",");
  sb.append(s.substring(a,b));//add window 1
  if(a < c){
   sb.append(",");
   sb.append(s.substring(c,d));//add window 2
  }
 }
}
string s = sb.toString();
String[] sArray = s.split("\\,");

Comments

1

I am not sure about the exact algorithm but you may look into Ropes:

http://en.wikipedia.org/wiki/Rope_(computer_science)

In summary, ropes are better suited when the data is large and frequently modified.

I believe Rope outperforms String for your problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.