1

Elasticsearch - Count duplicated and unique values

I need also same kind of count but that field is in nested properties as

[{
    "firstname": "john",
    "lastname": "doe",
    "addressList": [{
            "addressId": 39640,
            "txt": "sdf",
        }, {
            "addressId": 39641,
            "txt": "NEW",
        }, {
            "addressId": 39640,
            "txt": "sdf",
        }, {
            "addressId": 39641,
            "txt": "NEW"
        }
    ]
}, {
    "firstname": "jane",
    "lastname": "smith",
    "addressList": [{
            "addressId": 39644,
            "txt": "sdf",
        }, {
            "addressId": 39642,
            "txt": "NEW",
        }, {
            "addressId": 39644,
            "txt": "sdf",
        }, {
            "addressId": 39642,
            "txt": "NEW"
        }
    ]
  }
]

what would be the query for addressId duplicate counts ? Need you help on this user:3838328

2
  • Could you share me your mapping details. It is done via GET <your_index_name>/_mapping Update the mapping information in your question and I will be able to help you. Commented Nov 21, 2019 at 8:16
  • Thanks - @Kamal I got the solution as mentioned below. URL mapping as POST <your_index_name>/_search Commented Nov 21, 2019 at 8:35

1 Answer 1

2

I got the answer for nested field duplicate counts as

POST <your_index_name>/_search

{
"size": 0,
"aggs": {
    "prop_counts": {
        "nested": {
            "path": "addressList"
        },
        "aggs": {
            "duplicate_aggs": {
                "terms": {
                    "field": "addressList.addressId",
                    "min_doc_count": 2, 
                    "size": 100                     <----- Note this
                }
            },
            "duplicate_bucketcount": {
                "stats_bucket": {
                    "buckets_path": "duplicate_aggs._count"
                }
            },
            "nonduplicate_aggs": {
                "terms": {
                    "field": "addressList.addressId",
                    "size": 100                    <---- Note this
                },
                "aggs": {
                    "equal_one": {
                        "bucket_selector": {
                            "buckets_path": {
                                "count": "_count"
                            },
                            "script": "params.count == 1"
                        }
                    }
                }
            },
            "nonduplicate_bucketcount": {
                "sum_bucket": {
                    "buckets_path": "nonduplicate_aggs._count"
                }
            }
        }
    }
  }
 }

Response as

{
"took": 4,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
},
"hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
},
"aggregations": {
    "prop_counts": {
        "doc_count": 8,
        "duplicate_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [{
                    "key": 39640,
                    "doc_count": 2
                }, {
                    "key": 39641,
                    "doc_count": 2
                }, {
                    "key": 39644,
                    "doc_count": 2
                }, {
                    "key": 39642,
                    "doc_count": 2
                }
            ]
        },
        "nonduplicate_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
        },
        "duplicate_bucketcount": {
            "count": 4,
            "min": 2,
            "max": 2,
            "avg": 2,
            "sum": 8
        },
        "nonduplicate_bucketcount": {
            "value": 0
        }
    }
  }
}
Sign up to request clarification or add additional context in comments.

4 Comments

Cool, accept your answer by clicking a grey arrow. Also if not yet done, could you please delete that post you've posted in my answer.
Hi Kamal - If there are more duplicate counts then it is not showing all in a single page. So what is "size":0 means ? If I want to delete the duplicate child list for the given example what should be the query ? need help on this if possible.
the size at the very top returns the original documents and if you only want aggregation query you should have it as 0. I've added size inside terms aggregation which should give you want you want. Let me know if that helps.
Hi Kamal - thanks for your answer. yes that helped me a lot. Now I want to delete all these duplicate child elements and only to keep single child element. If possible help me on this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.