0

I have a case in which I shouldn't make requests to get the scroll_id - I have to manage it somehow so I can get the URL for next pages offline (I am making GET requests against a certain site that exposes their Elasticsearch instance)

So basically, I have a certain URL containing Elasticsearch query and it returns me only 20 results out of 40(20 per request is the max size). I want to get an URL for the next pages - so given I had the connection to the Internet, I would just get the scroll_id from the first request and use it to make next ones.

But I want to avoid it and see if I can have a helper class that builds scroll ids by itself.

Is it possible?

Thanks in advance.

1 Answer 1

1

The scroll_id ties directly to some internal state (i.e. the context of the initial query) managed by ES and which eventually times out after the a given period of time.

Once the period times out, the search context is cleared and the scroll id is not valid anymore. I'm afraid there's no way you can craft a scroll id by hand.

But if the result set contains 40 results ans you can only retrieve 20 at a time, I suggest you simple set from: 20 in your second query and you'll be fine.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, but I tried that already - setting from doesn't work in this case.
Why not? Care to explain more?
"reason": "Result window is too large, from + size must be less than or equal to: [20] but was [40].
Oh, they decreased the default window size from 10000 to 20. That's uncool!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.