I am trying to find all substrings within a given string. For a random string like rymis the subsequences would be [i, is, m, mi, mis, r, ry, rym, rymi, rymis, s, y, ym, ymi, ymis]. From Wikipedia, a string of a length of n will have n * (n + 1) / 2 total substrings.
Which can be found by doing the following snippet of code:
final Set<String> substring_set = new TreeSet<String>();
final String text = "rymis";
for(int iter = 0; iter < text.length(); iter++)
{
for(int ator = 1; ator <= text.length() - iter; ator++)
{
substring_set.add(text.substring(iter, iter + ator));
}
}
Which works for small String lengths but obviously slows down for larger lengths as the algorithm is near O(n^2).
Also reading up on Suffix Trees which can do insertions in O(n) and noticed the same subsequences could be obtained by repeatedly inserting substrings by removing 1 character from the right until the string is empty. Which should be about O(1 + … + (n-1) + n) which is a summation of n -> n(n+1)/2 -> (n^2 + n)/ 2, which again is near O(n^2). Although there seems to be some Suffix Trees that can do insertions in log2(n) time which would be a factor better being O(n log2(n)).
Before I delve into Suffix Trees is this the correct route to be taking, is there some another algorithm that would be more efficient for this, or is O(n^2) as good as this will get?
abawhere the substrings then start to become duplicates then it can no longer be assured that a structure contains only unique elements. Where if it could be assured that it did then a data structure likeLinkedListwould be a good speed boost over a*SetorArrayList.