5

I have a file containing records of the following format:

1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css

Which has 11 fields ([02/Oct/2010:00:00:38 +0530] is a single field)

I want to write extract fields say 7, 8, 9. Is it possible to extract these fields using Java regex.

Can regex be used to match multiple patterns for the above?

From the above record, I need to extract the fields

f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css  
f2: 02/Oct/2010:00:00:38 +0530  
f3: je02121
1
  • Is there any delimiter (e.g. a space) that partitions the fields? Commented May 11, 2011 at 12:36

4 Answers 4

14

Do it sequentially, not all in one pattern (if you have many lines like this, split the lines first, also extract the compiled Pattern to a constant):

String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
    System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}

Output:

Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'

Regex Pattern explained:

\\[    match an opening square brace
.*?    and anything up to a
\\]    closing square brace
|      or
\\S+   any sequence of multiple non-whitespace characters
Sign up to request clarification or add additional context in comments.

4 Comments

I believe attacking it with two patterns is the right approach. Recently I had to parse very similar logs. Fortunately, the timestamp in brackets was the first field, so I split the log line at the closing bracket and used white space regex delimiter on the rest.
@Olaf if you want to split(), you need two patterns, agreed. But in my version you only need 1 pattern, which I personally prefer.
I just went back to my code and incorporated your suggestion. Now my code can handle all permutations of the Extended Squid format, not only the case when date/time in brackets is the first field. Thanks!
Worked well, I don't want to include opening and closing braces ([]) in output. How can I do that.
5

Assuming that the only place where spaces are allowed within a field are between the brackets in the date field, and that there are no empty fields, you could use this:

Pattern regex = Pattern.compile(
    "^(?:\\S+\\s+){6}   # first 6 fields\n" +
    "(\\S+)\\s+         # field 7\n" +
    "\\[([^]]+)\\]\\s+  # field 8\n" +
    "(\\S+)             # field 9", 
    Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    for (int i = 1; i <= regexMatcher.groupCount(); i++) {
        // matched text: regexMatcher.group(i)
        // match start: regexMatcher.start(i)
        // match end: regexMatcher.end(i)
    }
} 

1 Comment

Less generic than mine, but answers the question perfectly (+1)
1

use split with regex "[\t\s]+?" and store results in array say s.

Then s[6], s[7]+s[8] and s[9] will be the expected result

2 Comments

a) \s contains \t. Use \s+ instead of [\t\s]+ b) field no 8 contains a space, so that won't help
@stackoverflow.com/users/342852/sean-patrick-floyd Thats y I've added s[7] + s[8] to get the entire date.
0

This option not include opening and closing braces ([]) in output

    String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
    Matcher matcher = Pattern.compile("(\\d+/+\\w+/+\\d.* \\+\\d+)|([^\\[]\\S+[^\\]])").matcher(input);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.