Contents
  1. 1. Elasticsearch Dump
    1. 1.1. Installation[1]
    2. 1.2. Usages[1]
    3. 1.3. Useful Options
  2. 2. Summary
  3. 3. References

A very common problem we encounter in Elasticsearch cluster management is how to copy an index to another cluster. For example, we need to make a back up copy for all our current data, or we need to copy data to staging cluster to test query performance.

The intuitive way would be manually get all documents under the specific index, create another index using the same settings and mappings, then index all documents to the newly created index. Fortunately, we do not need to do that. There are already several good solutions. One of them is using a third-parth tool named Elasticsearch Dump.

Elasticsearch Dump

ElasticSearch Dump is a node package for moving and saving indices. To use this handy tool, you need to have node installed before using it.

Installation[1]

Assume that you already have node, to install it:

Install elasticdump
1
npm install elasticdump -g

You probaly need to add sudo before the above command if there is any permission problem.

Usages[1]

Then elasticdump command should be available from your command line. The command works by specifying an input and an output. Both can be either an elasticsearch url or a file, or even standard io streams from terminal.

To fully copy an index from production to staging, we provide the command with production hostname, port and index name as input, staging hostname, port and index name as output. We should move the analyzer first, the mapping second, and finally the data.

Copy an index from production to staging with analyzer and mapping
1
2
3
4
5
6
7
8
9
10
11
12
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=analyzer
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=mapping
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=data

I encountered an error when try to dump data. It’s related to how many connections can be set up to target host at one time. If you also have limit on your host, please refer to limit and maxSockets options in the Useful Options table.

We may also specify output as a local file instead of a url.

Backup index data to a file
1
2
3
4
5
6
7
8
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_mapping.json \
--type=mapping
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index.json \
--type=data

We can pipe the outcome with gzip using stdout.

Backup and index to a gzip using stdout
1
2
3
4
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz

Useful Options

There are many options for the elasticdump command. I list a few most useful options with comments in the table below.

Options Explanation Default Comment
limit How many objects to move in bulk per operation limit is approximate for file streams 100 Can avoid too much traffic generate at one time
debug Display the elasticsearch commands being used false Useful when error happens
type What are we exporting? Can be data, mapping, analyzer data Can export mapping or analyzer instead of data only
delete Delete documents one-by-one from the input as they are moved. Will not delete the source index false If you want to remove old data in the meantime
searchBody Preform a partial extract based on search results {“query”: { “match_all”: {} } } Provide query to customize target data
all Load/store documents from ALL indexes false
bulk Leverage elasticsearch Bulk API when writing documents false Should use bulk as a default to be more efficient
ignore-errors Will continue the read/write loop on write error false
scrollTime Time the nodes will hold the requested search in order. 10m
maxSockets How many simultaneous HTTP requests can we process make? 5 [node <= v0.10.x]
Infinity [node >= v0.11.x]
help This page - Check help if you forget some options

Summary

Elasticsearch Dump satisfies the need adequately.

There are also some other options. The native solution would be the Snapshot & Restore[2]. Or you can use plugins like Elasticsearch InOut[3] and Elasticsearch knapsack[4]. In addition to that, you can use other third-party tools like ElasticSearch Exporter[5].

References

  1. Elasticsearch-dump Github Page
  2. Snapshot and Restore, Elasticsearch documentation
  3. Elasticsearch Inout Plugin, Github Page
  4. Knapsack plugin for Elasticsearch, Github Page
  5. Elasticsearch Exporter, Github Page
Contents
  1. 1. Elasticsearch Dump
    1. 1.1. Installation[1]
    2. 1.2. Usages[1]
    3. 1.3. Useful Options
  2. 2. Summary
  3. 3. References