2

Using the Elasticsearch scan-and-scroll feature, is it possible to control both the size of the batches returned, as well as the limit on the number of matches?

According to the Elasticsearch scan-and-scroll documentation:

Although we specified a size of 1,000, we get back many more documents. When scanning, the size is applied to each shard, so you will get back a maximum of size * number_of_primary_shards documents in each batch.

This seems to indicate that the size parameter is used differently in a scan-and-scroll then it would be used in a query-then-fetch-type (where it limits the number of matches), and that there is not a "separate knob" that can be specified.

Update

A use case for this is:

  • I have many indices (2 shards each).
    • They're organized by day for some good reasons that I can't change.
  • Some queries will be the likes of "give me everything for one day, no order needed" and this could result in many results (100s of thousands). Seems like the query size should be 0 (or some really high number) to allow the user to eventually page through everything, if necessary
  • I'd like the first page of results to display quickly - the first page can show a variable number depending on the UI setup (on order of 100s). Seems like I should be able to control this and fetch this size in the first scroll ID.

Scan-and-scroll seems like a good choice, but perhaps there's a better way to do this?

1 Answer 1

1

size is used differently in scan and scroll. It does limit the number of documents return with each scroll, but you get size * num_of_primary_shards back.

In general you are correct but you could limit the hits returned using a limit filter (or limit query in 2.0) - seems a little odd though, I'd make sure scan and scroll is the best approach if limiting it in this way is the desired behavior.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the response, as well as the limit filter/query suggestion... I updated to clarify my use case; does this help?
I would suggest not using scan and scroll for this, and instead just use the paging behavior of a normal search. The initial page of results will display quickly (and really quite a few after that depending on your page size), however it is true that deeply paging in this manner can be slow. Setting up a scan in scroll is not inexpensive, I don't think creating one per user performing the search will scale very well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.