I have a data file in which each line represents a single record and each record may contain a list of keywords, each preceded by a "+".
foo1 foofoo foo foo foo +key1 +key2 +key3
foo2 foo foo foofoo foo
foo3 foo foofoo foo +key1 key1 key1 +key2
There wil be between zero and a theoretically unlimited number of keywords. Keywords will ALWAYS be preceded by a +. Individual keywords MAY be a single word, or a phrase with spaces. My strategy for identifying keywords:
I would like to read these records into an array, String keywords[]. I'm using lineBuffer to bring the data in, and here's what I have so far.
// PSEUDOCODE
counter = [number of occurences of + in the line];
for(int i=0;i<=counter,i++) {
Pattern p = [regex reresenting + to the next occurence of + -or- end of line];
Match pattern;
keyword[i] = Match.group(1) }
I may be over-thinking this, but does Java know to go to the next instance of my pattern in the same line? Looking at these few lines of code, it seems that my pattern matcher would read the line, find the first instance of a keyword and write it to the array i number of times. It would never GET to the second keyword.
Is there a better way to think about this? A better strategy for creating this array?