3

I have a Java app that retrieves logs stored in an ElasticSearch. The logs are stored like this (this is what you retrieve from ElasticSearch):

{
	"took":1013,
	"timed_out":false,
	"_shards":{"total":40,"successful":40,"failed":0},
	"hits":{"total":28,"max_score":null,"hits":
    [
      {
      "_shard":"[logstash-2017.09.06][0]",
      "_node":"_G934CTGTjKypnI_D1b1Lg",
      "_index":"logstash-2017.09.06",
      "_type":"logs",
      "_id":"AV5WyiTlbV8ga6rEI4b8",
      "_score":null,
      "_source":{"@timestamp":"2017-09-06T10:44:01.691Z",
      "@version":"1",
      "message":"{
        \"log\":\"2017-09-19 09:26:09,149 INFO [com.mycompany.class.MyClass] (default task-23) Some log to retrieve\",
        \"stream\":\"stderr\",
        \"docker\":{
            \"container_id\":\"61b34e11002c636b289e7c40d6fbc6718e0deec58bf8a3410d598e3bd561672d\"
            },
        \"metadata\":{
            \"container_name\":\"router\",
            \"namespace_name\":\"default\",
            \"cluster_name\":\"cluster\"
            }
         }"
      },
      "sort":[1504694641691]
      }
    ]
  }
}

To get only logs that contains, for example, the word 'INFO', I want to query the "message", but the log is inside \"log\", and I want to query only for words that are inside \"log\".

I thought maybe if I query "message.log" it could work, but it didn't. It isn't a nested json ( "message":{key:value, key:value} ), it's "message":"{string}" (there are double quotes) :_(

It would be easy if the logs where stored like "log":"The log" with nothing else, but I can't change the behaviour of the logstash that is putting the logs in the ElasticSearch.

So I tried using regex ( QueryBuilders.regexpQuery("message", "Some_regex") ) with the following regex:

.*\"log\\\":\\\".*INFO.*},\\\"metadata\\\":{

I know that this regex also affects \"stream\" or \"docker\", but it's not a problem.

I tested this regex in http://regexr.com/ and https://regex101.com/ and it should work, but when I do the query, I find 0 results (and there should be results).

I trial-error tested more regex, but it didn't find results if I added anything after

.*\"log\\\":\\\".*INFO

I'm not very used to regex, and I did want to successfully accomplish it without your help, but I'm a bit lost right now...

Thank you in advance, and sorry for my bad English. Thank you!

1 Answer 1

-1

Here is one regex to group the stuff after the INFO part:

.*\\"log\\":\\".* INFO (.*)

Explanation: - \\ is required for matching one backslash - the parentheses (.*) are for grouping. You can retrieve them later.

Usage in java is a bit tricky. I.e. if you want to specify this string as java String literal, then you get this:

String str = "\"message\":\"{\"\r\n\\\"log\\\":\\\"2017-09-19 09:26:09,149 INFO [com.mycompany.class.MyClass] (default task-23) Some log to retrieve\\\",";

Looks strange, but you can always test if I escaped them correctly:

System.out.println(str);

So, this is how to use this regex in java:

    // lots of escape characters :(
    // you can reuse the compiled pattern
    Pattern p = Pattern.compile(".*\\\\\\\"log\\\\\\\":\\\\\\\".* INFO (.*)");
    // this is how you match
    Matcher m = p.matcher(str);
    if (m.find()) {
        // this is how you retrieve the text after INFO
        System.out.println(m.group(1));
    } else {
        System.out.println("--> no match");
    }
Sign up to request clarification or add additional context in comments.

2 Comments

I thank you for your help, but it's not java who needs to use this regex (with a Pattern and a Matcher), it's elasticsearch using a regexpQuery. In java that regex works, but when I use it against elasticsearch, it doesn't, the query doesn't get any 'hit' with that regex.
The regex works with javascript dialect too. Can you try it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.