Export Index Using Elasticsearch Dump

Contents

1. Elasticsearch Dump
2. Summary
3. References

A very common problem we encounter in Elasticsearch cluster management is how to copy an index to another cluster. For example, we need to make a back up copy for all our current data, or we need to copy data to staging cluster to test query performance.

The intuitive way would be manually get all documents under the specific index, create another index using the same settings and mappings, then index all documents to the newly created index. Fortunately, we do not need to do that. There are already several good solutions. One of them is using a third-parth tool named Elasticsearch Dump.

Elasticsearch Dump

ElasticSearch Dump is a node package for moving and saving indices. To use this handy tool, you need to have node installed before using it.

Installation^[1]

Assume that you already have node, to install it:

Install elasticdump

1	npm install elasticdump -g

You probaly need to add sudo before the above command if there is any permission problem.

Usages^[1]

Then elasticdump command should be available from your command line. The command works by specifying an input and an output. Both can be either an elasticsearch url or a file, or even standard io streams from terminal.

To fully copy an index from production to staging, we provide the command with production hostname, port and index name as input, staging hostname, port and index name as output. We should move the analyzer first, the mapping second, and finally the data.

Copy an index from production to staging with analyzer and mapping

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=analyzer
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

I encountered an error when try to dump data. It’s related to how many connections can be set up to target host at one time. If you also have limit on your host, please refer to limit and maxSockets options in the Useful Options table.

We may also specify output as a local file instead of a url.

Backup index data to a file

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=/data/my_index_mapping.json \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=/data/my_index.json \
  --type=data

We can pipe the outcome with gzip using stdout.

Backup and index to a gzip using stdout

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=$ \
  | gzip > /data/my_index.json.gz

Useful Options

There are many options for the elasticdump command. I list a few most useful options with comments in the table below.

Options	Explanation	Default	Comment
limit	How many objects to move in bulk per operation limit is approximate for file streams	100	Can avoid too much traffic generate at one time
debug	Display the elasticsearch commands being used	false	Useful when error happens
type	What are we exporting? Can be data, mapping, analyzer	data	Can export mapping or analyzer instead of data only
delete	Delete documents one-by-one from the input as they are moved. Will not delete the source index	false	If you want to remove old data in the meantime
searchBody	Preform a partial extract based on search results	{“query”: { “match_all”: {} } }	Provide query to customize target data
all	Load/store documents from ALL indexes	false
bulk	Leverage elasticsearch Bulk API when writing documents	false	Should use bulk as a default to be more efficient
ignore-errors	Will continue the read/write loop on write error	false
scrollTime	Time the nodes will hold the requested search in order.	10m
maxSockets	How many simultaneous HTTP requests can we process make?	5 [node <= v0.10.x] Infinity [node >= v0.11.x]
help	This page	-	Check help if you forget some options

Summary

Elasticsearch Dump satisfies the need adequately.

There are also some other options. The native solution would be the Snapshot & Restore^[2]. Or you can use plugins like Elasticsearch InOut^[3] and Elasticsearch knapsack^[4]. In addition to that, you can use other third-party tools like ElasticSearch Exporter^[5].

Better Programmer

Where best programming happens

Export Index Using Elasticsearch Dump

Elasticsearch Dump

Installation^[1]

Usages^[1]

Useful Options

Summary

References

Elasticsearch Dump

Installation[1]

Usages[1]

Useful Options

Summary

References

Installation^[1]

Usages^[1]