1

I have a data file in which each line represents a single record and each record may contain a list of keywords, each preceded by a "+".

foo1 foofoo foo foo foo +key1 +key2 +key3
foo2 foo foo foofoo foo 
foo3 foo foofoo foo +key1 key1 key1 +key2

There wil be between zero and a theoretically unlimited number of keywords. Keywords will ALWAYS be preceded by a +. Individual keywords MAY be a single word, or a phrase with spaces. My strategy for identifying keywords:

I would like to read these records into an array, String keywords[]. I'm using lineBuffer to bring the data in, and here's what I have so far.

// PSEUDOCODE
counter = [number of occurences of + in the line];
for(int i=0;i<=counter,i++) {
    Pattern p = [regex reresenting + to the next occurence of + -or- end of line];
    Match pattern;
    keyword[i] = Match.group(1) }

I may be over-thinking this, but does Java know to go to the next instance of my pattern in the same line? Looking at these few lines of code, it seems that my pattern matcher would read the line, find the first instance of a keyword and write it to the array i number of times. It would never GET to the second keyword.

Is there a better way to think about this? A better strategy for creating this array?

2 Answers 2

2

If you know that there is no + in the keys, you could simply split the string:

String[] ss = s.split(" \\+");

And discard the first entry (the foo fooo...).

EDIT

Regarding the pattern / regex question, you could also do it that way:

Pattern p = Pattern.compile(" \\+\\w+");
Matcher m = p.matcher(s);
while (m.find()) {
    String key = m.group().trim().replaceAll("\\+","");
    System.out.println(key);
}
Sign up to request clarification or add additional context in comments.

2 Comments

I think this is the simpler way of doing it, but just to answer the part about the " but does Java know to go to the next instance of my pattern in the same line?", check the following link to find information on the matcher object that can do just that: docs.oracle.com/javase/tutorial/essential/regex/matcher.html
@JTMon I have added an example.
1

This would be pretty easy to do with a Scanner:

Scanner s = new Scanner(line);
int i = 0;
while (s.hasNext()) {
    String token = s.next();
    if (token.startsWith("+")) {
        keyword[i] = token;
        i++;
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.