1

How to get multiple fields returned that are unique using elasticsearch query?

All of my documents have duplicate name and job fields. I would like to use an es query to get all the unique values which include the name and job in the same response, so they are tied together.

[
{
    "name": "albert",
    "job": "teacher",
    "dob": "11/22/91"
},
{
    "name": "albert",
    "job": "teacher",
    "dob": "11/22/91"
},
{
    "name": "albert",
    "job": "teacher",
    "dob": "11/22/91"
},
{
    "name": "justin",
    "job": "engineer",
    "dob": "1/2/93"
},
{
    "name": "justin",
    "job": "engineer",
    "dob": "1/2/93"
},
{
    "name": "luffy",
    "job": "rubber man",
    "dob": "1/2/99"
}
]

Expected result in any format -> I was trying to use aggs but I only get one field

[
    {
        "name": "albert",
        "job": "teacher"
    },
    {
        "name": "justin",
        "job": "engineer"
    },
    {
        "name": "luffy",
        "job": "rubber man"
    },

]

This is what I tried so far

GET name.test.index/_search
{
  "size": 0,
    "aggs" : {
      "name" : {
        "terms" : { "field" : "name.keyword" }
      }
    }
}

using the above query gets me this which is good that its unique

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 95,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Justin",
          "doc_count" : 56
        },
        {
          "key" : "Luffy",
          "doc_count" : 31
        },
        {
          "key" : "Albert",
          "doc_count" : 8
        }
      ]
    }
  }
}

I tried doing nested aggregation but that did not work. Is there an alternative solution for getting multiple unique values or am I missing something?

1 Answer 1

3

That's a good start! There are a few ways to achieve what you want, each provides a different response format, so you can decide which one you prefer.

The first option is to leverage the top_hits sub-aggregation and return the two fields for each name bucket:

GET name.test.index/_search
{
  "size": 0,
  "aggs": {
    "name": {
      "terms": {
        "field": "name.keyword"
      },
      "aggs": {
        "top": {
          "top_hits": {
            "_source": [
              "name",
              "job"
            ],
            "size": 1
          }
        }
      }
    }
  }
}

The second option is to use a script in your terms aggregation instead of a field to return a compound value:

GET name.test.index/_search
{
  "size": 0,
  "aggs": {
    "name": {
      "terms": {
        "script": "doc['name'].value + ' - ' + doc['job'].value"
      }
    }
  }
}

The third option is to use two levels of field collapsing:

GET name.test.index/_search
{
  "collapse": {
    "field": "name",
    "inner_hits": {
      "name": "by_job",
      "collapse": {
        "field": "job"
      },
      "size": 1
    }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.