10

Is it possible to reduce the number of shards in ElasticSearch search engine once the index is created ?

I tried :

$ curl -XPUT 'localhost:9200/myindex/_settings' -d '{"index" : {"number_of_shards" : 3}}'

But it gives an error :

{"error":"ElasticsearchIllegalArgumentException[can't change the number of shards for an index]","status":400}
3
  • 1
    what version of es are you using? Commented Apr 30, 2015 at 8:27
  • @eliasah On my development server : Version: 1.4.4, Build: c88f77f/2015-02-19T13:05:36Z, JVM: 1.7.0_75 Commented Apr 30, 2015 at 8:47
  • 1
    Your only option is to create new index with less shards and reindex all data from old index to the new one with tool like stream2es Commented Apr 30, 2015 at 9:09

3 Answers 3

3

This is no longer true, with 5.x you can shrink the number of indexes to a whole fraction. For example from 12 you could go down to 1, 2, 3 or 6 (see the docs). But you must put it to read-only mode and naturally the shrinking process requires lots of IO.

Alternatively, since version 2.3 you could use the reindex API, which would allow you to change to any number of shards. Reindex would need much more resources than the shrink API, because it would have to go through the process of indexing each document from scratch.

Sign up to request clarification or add additional context in comments.

Comments

1

No, it's not possible. You could change a lot of stuff - e.g. number of replicas for each shard, or many other index settings, but not the number of shards.

For more information - take a look here - http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-update-settings.html

Comments

1

Ok. Like @Mysterion said, it's not possible to change the number of shards with zero-downtime directly with an index update. But there is another way around.

You'll be needing to re-index your old index into an new index after creating it with the desired number of shards. (Like I said no zero-downtime)

For that you can use the Scroll Search API :

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

Client support for scrolling and reindexing : Some of the officially supported clients provide helpers to assist with scrolled searches and reindexing of documents from one index to another:

Perl See Search::Elasticsearch::Bulk and Search::Elasticsearch::Scroll

Python See elasticsearch.helpers.*

For more information about the Scroll Search API, I suggest the official documentation

And you might also want to take a look at this answer here, maybe it can also give you some ideas in case you are using Java.

3 Comments

You can use github.com/taskrabbit/elasticsearch-dump to copy the data to a new index with the correct number of shards and then remove the old one. that program makes easier than trying to use directly the scroll search api
It's a very good project but when you have tens of millions of documents in your index, it's extremely slow. I've benchmarked it against optimized scan and scroll with the official python API and it the later runs 10 times faster (at least). Thus I'll let you judge ;-)
you need to increase the batch size, the default of 100 make it slower. Also, my first run took about 2 hours, the second run took 35min... so indexes caches make a huge differences

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.