API Documentation¶

A Word About Forward-Compatibility Kwargs¶

In the following documentation, the phrase “other kwargs listed below” refers to the kwargs documented in a subsequent Parameters section. However, it also implicitly includes any kwargs the caller might care to make up and have passed to ES as query string parameters. These kwargs must start with es_ for forward compatibility and will be unprefixed and converted to strings as discussed in Features.

ElasticSearch Class¶

Unless otherwise indicated, methods return the JSON-decoded response sent by elasticsearch. This way, you don’t lose any part of the return value, no matter how esoteric. But fear not: if there was an error, an exception will be raised, so it’ll be hard to miss.

class pyelasticsearch.ElasticSearch(urls='http://localhost', timeout=60, max_retries=0, port=9200, username=None, password=None, ca_certs='/home/docs/checkouts/readthedocs.org/user_builds/pyelasticsearch/envs/latest/local/lib/python2.7/site-packages/certifi/cacert.pem', client_cert=None)[source]¶

An object which manages connections to elasticsearch and acts as a go-between for API calls to it

This object is thread-safe. You can create one instance and share it among all threads.

Parameters:

urls – A URL or iterable of URLs of ES nodes. These can be full URLs with port numbers, like http://elasticsearch.example.com:9200, or you can pass the port separately using the port kwarg. To do HTTP basic authentication, you can use RFC-2617-style URLs like http://someuser:somepassword@example.com:9200 or the separate username and password kwargs below.
timeout – Number of seconds to wait for each request before raising Timeout
max_retries – How many other servers to try, in series, after a request times out or a connection fails
username – Authentication username to send via HTTP basic auth
password – Password to use in HTTP basic auth. If a username and password are embedded in a URL, those are favored.
port – The default port to connect on, for URLs that don’t include an explicit port
ca_certs – A path to a bundle of CA certificates to trust. The default is to use Mozilla’s bundle, the same one used by Firefox.
client_cert – A certificate to authenticate the client to the server

json_encoder = <class 'pyelasticsearch.client.JsonEncoder'>¶: You can set this attribute on an instance to customize JSON encoding. The stock JsonEncoder class maps Python datetimes to ES-style datetimes and Python sets to ES lists. You can subclass it to add more.

Bulk Indexing Methods¶

class pyelasticsearch.ElasticSearch[source]

bulk(actions, index=None, doc_type=None[, other kwargs listed below])[source]¶

Perform multiple index, delete, create, or update actions per request.

Used with helper routines index_op(), delete_op(), and update_op(), this provides an efficient, readable way to do large-scale changes. This contrived example illustrates the structure:

es.bulk([es.index_op({'title': 'All About Cats', 'pages': 20}),
         es.index_op({'title': 'And Rats', 'pages': 47}),
         es.index_op({'title': 'And Bats', 'pages': 23})],
        doc_type='book',
        index='library')

More often, you’ll want to index (or delete or update) a larger number of documents. In those cases, yield your documents from a generator, and use bulk_chunks() to divide them into multiple requests:

from pyelasticsearch import bulk_chunks

def documents():
    for book in books:
        yield es.index_op({'title': book.title, 'pages': book.pages})
        # index_op() also takes kwargs like index= and id= in case
        # you want more control.
        #
        # You could also yield some delete_ops or update_ops here.

# bulk_chunks() breaks your documents into smaller requests for speed:
for chunk in bulk_chunks(documents(),
                         docs_per_chunk=500,
                         bytes_per_chunk=10000):
    # We specify a default index and doc type here so we don't
    # have to repeat them in every operation:
    es.bulk(chunk, doc_type='book', index='library')

Parameters:

actions – An iterable of bulk actions, generally the output of bulk_chunks() but sometimes a list of calls to index_op(), delete_op(), and update_op() directly. Specifically, an iterable of JSON-encoded bytestrings that can be joined with newlines and sent to ES.
index – Default index to operate on
doc_type – Default type of document to operate on. Cannot be specified without index.
consistency – See the ES docs.
refresh – See the ES docs.
replication – See the ES docs.
routing – See the ES docs.
timeout – See the ES docs.

Return the decoded JSON response on success.

Raise BulkError if any of the individual actions fail. The exception provides enough about the failed actions to identify them for retrying.

Sometimes there is an error with the request in general, not with any individual actions. If there is a connection error, timeout, or other transport error, a more general exception will be raised, as with other methods; see Error Handling.

See ES’s bulk API for more detail.

index_op(doc, doc_type=None, overwrite_existing=True, **meta)[source]¶

Return a document-indexing operation that can be passed to bulk(). (See there for examples.)

Specifically, return a 2-line, JSON-encoded bytestring.

Parameters:

doc – A mapping of property names to values.
doc_type – The type of the document to index, if different from the one you pass to bulk()
overwrite_existing – Whether we should overwrite existing documents of the same ID and doc type. (If False, this does a create operation.)
meta – Other args controlling how the document is indexed, like id (most common), index (next most common), version, and routing. See ES’s bulk API for details on these.

delete_op(doc_type=None, **meta)[source]¶

Return a document-deleting operation that can be passed to bulk().

def actions():
    ...
    yield es.delete_op(id=7)
    yield es.delete_op(id=9,
                       index='some-non-default-index',
                       doc_type='some-non-default-type')
    ...

es.bulk(actions(), ...)

Specifically, return a JSON-encoded bytestring.

Parameters:	doc_type – The type of the document to delete, if different from the one passed to `bulk()` meta – A description of what document to delete and how to do it. Example: `{"index": "library", "id": 2, "version": 4}`. See ES’s bulk API for a list of all the options.

update_op(doc=None, doc_type=None, upsert=None, doc_as_upsert=None, script=None, params=None, lang=None, **meta)[source]¶

Return a document-updating operation that can be passed to bulk().

def actions():
    ...
    yield es.update_op(doc={'pages': 4},
                       id=7,
                       version=21)
    ...

es.bulk(actions(), ...)

Specifically, return a JSON-encoded bytestring.

Parameters:

doc – A partial document to be merged into the existing document
doc_type – The type of the document to update, if different from the one passed to bulk()
upsert – The content for the new document created if the document does not exist
script – The script to be used to update the document
params – A dict of the params to be put in scope of the script
lang – The language of the script. Omit to use the default, specified by script.default_lang.
meta – Other args controlling what document to update and how to do it, like id, index, and retry_on_conflict, destined for the action line itself rather than the payload. See ES’s bulk API for details on these.

bulk_index(index, doc_type, docs, id_field='id', parent_field='_parent'[, other kwargs listed below])[source]¶

Index a list of documents as efficiently as possible.

Note

This is deprecated in favor of bulk(), which supports all types of bulk actions, not just indexing, is compatible with bulk_chunks() for batching, and has a simpler, more flexible design.

Parameters:

index – The name of the index to which to add the document. Pass None if you will specify indices individual in each doc.
doc_type – The type of the document
docs – An iterable of Python mapping objects, convertible to JSON, representing documents to index
id_field – The field of each document that holds its ID. Removed from document before indexing.
parent_field – The field of each document that holds its parent ID, if any. Removed from document before indexing.
index_field – The field of each document that holds the index to put it into, if different from the index arg. Removed from document before indexing.
type_field – The field of each document that holds the doc type it should become, if different from the doc_type arg. Removed from the document before indexing.
consistency – See the ES docs.
refresh – See the ES docs.
replication – See the ES docs.
routing – See the ES docs.
timeout – See the ES docs.

Raise BulkError if the request as a whole succeeded but some of the individual actions failed. You can pull enough about the failed actions out of the exception to identify them for retrying.

See ES’s bulk API for more detail.

There’s also a helper function, outside the ElasticSearch class:

pyelasticsearch.bulk_chunks(actions, docs_per_chunk=300, bytes_per_chunk=None)[source]¶

Return groups of bulk-indexing operations to send to bulk().

Return an iterable of chunks, each of which is a JSON-encoded line or pair of lines in the format understood by ES’s bulk API.

Parameters:

actions – An iterable of bulk actions, JSON-encoded. The best idea is to pass me a list of the outputs from index_op(), delete_op(), and update_op().
docs_per_chunk – The number of documents (or, more technically, actions) to put in each chunk. Set to None to use only bytes_per_chunk.
bytes_per_chunk – The maximum number of bytes of HTTP body payload to put in each chunk. Leave at None to use only docs_per_chunk. This option helps prevent timeouts when you have occasional very large documents. Without it, you may get unlucky: several large docs might land in one chunk, and ES might time out.

Chunks are capped by docs_per_chunk or bytes_per_chunk, whichever is reached first. Obviously, we cannot make a chunk to smaller than its smallest doc, but we do the best we can. If both docs_per_chunk and bytes_per_chunk are None, all docs end up in one big chunk (and you might as well not use this at all).

Other Methods¶

class pyelasticsearch.ElasticSearch[source]

aliases(index=None[, other kwargs listed below])[source]¶

close_index(index)[source]¶

Close an index.

Parameters:	index – The index to close

See ES’s close-index API for more detail.

cluster_state(metric='_all', index='_all'[, other kwargs listed below])[source]¶

Return state information about the cluster.

Parameters:	metric – Which metric to return: one of “version”, “master_node”, “nodes”, “routing_table”, “meatadata”, or “blocks”, an iterable of them, or a comma-delimited string of them. Defaults to all metrics. index – An index or iterable of indexes to return info about local – See the ES docs.

See ES’s cluster-state API for more detail.

count(query[, other kwargs listed below])[source]¶

Execute a query against one or more indices and get hit count.

Parameters:

query – A dictionary that will convert to ES’s query DSL or a string that will serve as a textual query to be passed as the q query string parameter
index – An index or iterable of indexes to search. Omit to search all.
doc_type – A document type or iterable thereof to search. Omit to search all.
df – See the ES docs.
analyzer – See the ES docs.
default_operator – See the ES docs.
source – See the ES docs.
routing – See the ES docs.

See ES’s count API for more detail.

create_index(index, settings=None)[source]¶

Create an index with optional settings.

Parameters:	index – The name of the index to create settings – A dictionary of settings

If the index already exists, raise IndexAlreadyExistsError.

See ES’s create-index API for more detail.

delete(index, doc_type, id[, other kwargs listed below])[source]¶

Delete a typed JSON document from a specific index based on its ID.

Parameters:	index – The name of the index from which to delete doc_type – The type of the document to delete id – The (string or int) ID of the document to delete routing – See the ES docs. parent – See the ES docs. replication – See the ES docs. consistency – See the ES docs. refresh – See the ES docs.

See ES’s delete API for more detail.

delete_all_indexes()[source]¶: Delete all indexes.

delete_all(index, doc_type[, other kwargs listed below])[source]¶

Delete all documents of the given doc type from an index.

Parameters:	index – The name of the index from which to delete. ES does not support this being empty or “_all” or a comma-delimited list of index names (in 0.19.9). doc_type – The name of a document type routing – See the ES docs. parent – See the ES docs. replication – See the ES docs. consistency – See the ES docs. refresh – See the ES docs.

See ES’s delete API for more detail.

delete_by_query(index, doc_type, query[, other kwargs listed below])[source]¶

Delete typed JSON documents from a specific index based on query.

Parameters:

index – An index or iterable thereof from which to delete
doc_type – The type of document or iterable thereof to delete
query – A dictionary that will convert to ES’s query DSL or a string that will serve as a textual query to be passed as the q query string parameter. (Passing the q kwarg yourself is deprecated.)
q – See the ES docs.
df – See the ES docs.
analyzer – See the ES docs.
default_operator – See the ES docs.
sourcerouting – See the ES docs.
replication – See the ES docs.
consistency – See the ES docs.

See ES’s delete-by-query API for more detail.

delete_index(index)[source]¶

Delete an index.

Parameters:	index – An index or iterable thereof to delete

If the index is not found, raise ElasticHttpNotFoundError.

See ES’s delete-index API for more detail.

flush(index=None[, other kwargs listed below])[source]¶

Flush one or more indices (clear memory).

Parameters:	index – An index or iterable of indexes refresh – See the ES docs.

See ES’s flush API for more detail.

gateway_snapshot(index=None)[source]¶

Gateway snapshot one or more indices.

Parameters:	index – An index or iterable of indexes

See ES’s gateway-snapshot API for more detail.

get(index, doc_type, id[, other kwargs listed below])[source]¶

Get a typed JSON document from an index by ID.

Parameters:	index – The name of the index from which to retrieve doc_type – The type of document to get id – The ID of the document to retrieve realtime – See the ES docs. fields – See the ES docs. routing – See the ES docs. preference – See the ES docs. refresh – See the ES docs.

See ES’s get API for more detail.

get_mapping(index=None, doc_type=None)[source]¶

Fetch the mapping definition for a specific index and type.

Parameters:	index – An index or iterable thereof doc_type – A document type or iterable thereof

Omit both arguments to get mappings for all types and indexes.

See ES’s get-mapping API for more detail.

get_settings(index[, other kwargs listed below])[source]¶

Get the settings of one or more indexes.

Parameters:	index – An index or iterable of indexes

See ES’s get-settings API for more detail.

health(index=None[, other kwargs listed below])[source]¶

Report on the health of the cluster or certain indices.

Parameters:	index – The index or iterable of indexes to examine level – See the ES docs. wait_for_status – See the ES docs. wait_for_relocating_shards – See the ES docs. wait_for_nodes – See the ES docs. timeout – See the ES docs.

See ES’s cluster-health API for more detail.

index(index, doc_type, doc, id=None, overwrite_existing=True[, other kwargs listed below])[source]¶

Put a typed JSON document into a specific index to make it searchable.

Parameters:

index – The name of the index to which to add the document
doc_type – The type of the document
doc – A Python mapping object, convertible to JSON, representing the document
id – The ID to give the document. Leave blank to make one up.
overwrite_existing – Whether we should overwrite existing documents of the same ID and doc type
routing – A value hashed to determine which shard this indexing request is routed to
parent – The ID of a parent document, which leads this document to be routed to the same shard as the parent, unless routing overrides it.
timestamp – An explicit value for the (typically automatic) timestamp associated with a document, for use with ttl and such
ttl – The time until this document is automatically removed from the index. Can be an integral number of milliseconds or a duration like ‘1d’.
percolate – An indication of which percolator queries, registered against this index, should be checked against the new document: ‘*’ or a query string like ‘color:green’
consistency – An indication of how many active shards the contact node should demand to see in order to let the index operation succeed: ‘one’, ‘quorum’, or ‘all’
replication – Set to ‘async’ to return from ES before finishing replication.
refresh – Pass True to refresh the index after adding the document.
timeout – A duration to wait for the relevant primary shard to become available, in the event that it isn’t: for example, “5m”
fields – See the ES docs.

See ES’s index API for more detail.

more_like_this(index, doc_type, id, fields, body=''[, other kwargs listed below])[source]¶

Execute a “more like this” search query against one or more fields and get back search hits.

Parameters:

index – The index to search and where the document for comparison lives
doc_type – The type of document to find others like
id – The ID of the document to find others like
mlt_fields – The list of fields to compare on
body – A dictionary that will convert to ES’s query DSL and be passed as the request body
search_type – See the ES docs.
search_indices – See the ES docs.
search_types – See the ES docs.
search_scroll – See the ES docs.
search_size – See the ES docs.
search_from – See the ES docs.
like_text – See the ES docs.
percent_terms_to_match – See the ES docs.
min_term_freq – See the ES docs.
max_query_terms – See the ES docs.
stop_words – See the ES docs.
min_doc_freq – See the ES docs.
max_doc_freq – See the ES docs.
min_word_len – See the ES docs.
max_word_len – See the ES docs.
boost_terms – See the ES docs.
boost – See the ES docs.
analyzer – See the ES docs.

See ES’s more-like-this API for more detail.

multi_get(ids, index=None, doc_type=None, fields=None[, other kwargs listed below])[source]¶

Get multiple typed JSON documents from ES.

Parameters:

ids – An iterable, each element of which can be either an a dict or an id (int or string). IDs are taken to be document IDs. Dicts are passed through the Multi Get API essentially verbatim, except that any missing _type, _index, or fields keys are filled in from the defaults given in the doc_type, index, and fields args.
index – Default index name from which to retrieve
doc_type – Default type of document to get
fields – Default fields to return

See ES’s Multi Get API for more detail.

open_index(index)[source]¶

Open an index.

Parameters:	index – The index to open

See ES’s open-index API for more detail.

optimize(index=None[, other kwargs listed below])[source]¶

Optimize one or more indices.

Parameters:	index – An index or iterable of indexes max_num_segments – See the ES docs. only_expunge_deletes – See the ES docs. refresh – See the ES docs. flush – See the ES docs. wait_for_merge – See the ES docs.

See ES’s optimize API for more detail.

percolate(index, doc_type, doc[, other kwargs listed below])[source]¶

Run a JSON document through the registered percolator queries, and return which ones match.

Parameters:	index – The name of the index to which the document pretends to belong doc_type – The type the document should be treated as if it has doc – A Python mapping object, convertible to JSON, representing the document routing – See the ES docs. preference – See the ES docs. ignore_unavailable – See the ES docs. percolate_format – See the ES docs.

Use index() to register percolators. See ES’s percolate API for more detail.

put_mapping(index, doc_type, mapping[, other kwargs listed below])[source]¶

Register specific mapping definition for a specific type against one or more indices.

Parameters:	index – An index or iterable thereof doc_type – The document type to set the mapping of mapping – A dict representing the mapping to install. For example, this dict can have top-level keys that are the names of doc types. ignore_conflicts – See the ES docs.

See ES’s put-mapping API for more detail.

refresh(index=None)[source]¶

Refresh one or more indices.

Parameters:	index – An index or iterable of indexes

See ES’s refresh API for more detail.

search(query[, other kwargs listed below])[source]¶

Execute a search query against one or more indices and get back search hits.

Parameters:

query – A dictionary that will convert to ES’s query DSL or a string that will serve as a textual query to be passed as the q query string parameter
index – An index or iterable of indexes to search. Omit to search all.
doc_type – A document type or iterable thereof to search. Omit to search all.
size – Limit the number of results to size. Use with es_from to implement paginated searching.
routing – See the ES docs.

See ES’s search API for more detail.

send_request(method, path_components, body='', query_params=None)[source]¶

Send an HTTP request to ES, and return the JSON-decoded response.

This is mostly an internal method, but it also comes in handy if you need to use a brand new ES API that isn’t yet explicitly supported by pyelasticsearch, while still taking advantage of our connection pooling and retrying.

Retry the request on different servers if the first one is down and the max_retries constructor arg was > 0.

On failure, raise an ElasticHttpError, a ConnectionError, or a Timeout.

Parameters:	method – An HTTP method, like “GET” path_components – An iterable of path components, to be joined by “/” body – A map of key/value pairs to be sent as the JSON request body. Alternatively, a string to be sent verbatim, without further JSON encoding. query_params – A map of querystring param names to values or `None`

status(index=None[, other kwargs listed below])[source]¶

Retrieve the status of one or more indices

Parameters:	index – An index or iterable thereof recovery – See the ES docs. snapshot – See the ES docs.

See ES’s index-status API for more detail.

update(index, doc_type, id, script[, other kwargs listed below])[source]¶

Update an existing document. Raise TypeError if script, doc and upsert are all unspecified.

Parameters:

index – The name of the index containing the document
doc_type – The type of the document
id – The ID of the document
script – The script to be used to update the document
params – A dict of the params to be put in scope of the script
lang – The language of the script. Omit to use the default, specified by script.default_lang.
doc – A partial document to be merged into the existing document
upsert – The content for the new document created if the document does not exist
doc_as_upsert – The provided document will be inserted if the document does not already exist
routing – See the ES docs.
parent – See the ES docs.
timeout – See the ES docs.
replication – See the ES docs.
consistency – See the ES docs.
percolate – See the ES docs.
refresh – See the ES docs.
retry_on_conflict – See the ES docs.
fields – See the ES docs.

See ES’s Update API for more detail.

update_aliases(settings[, other kwargs listed below])[source]¶

Atomically add, remove, or update aliases in bulk.

Parameters:	actions – A list of the actions to perform

See ES’s indices-aliases API.

update_all_settings(settings)[source]¶

Update the settings of all indexes.

Parameters:	settings – A dictionary of settings

See ES’s update-settings API for more detail.

update_settings(index, settings)[source]¶

Change the settings of one or more indexes.

Parameters:	index – An index or iterable of indexes settings – A dictionary of settings

See ES’s update-settings API for more detail.

Error Handling¶

Any method representing an ES API call can raise one of the following exceptions:

exception pyelasticsearch.exceptions.ConnectionError¶: Exception raised there is a connection error and we are out of retries. (See the max_retries argument to ElasticSearch.)

exception pyelasticsearch.exceptions.Timeout¶: Exception raised when an HTTP request times out and we are out of retries. (See the max_retries argument to ElasticSearch.)

exception pyelasticsearch.exceptions.BulkError[source]¶

Exception raised when one or more bulk actions fail

You can extract document IDs from these to retry them.

errors¶

Return a list of actions that failed, in the format emitted by ES:

{"index" : {
    "_index" : "test",
    "_type" : "type1",
    "_id" : "1",
    "status" : 409,
    "error" : "VersionConflictEngineException[[test][2] [type1][1]: version conflict, current [3], provided [2]]"
  }
},
{"update" : {
    "_index" : "index1",
    "_type" : "type1",
    "_id" : "1",
    "status" : 404,
    "error" : "DocumentMissingException[[index1][-1] [type1][1]: document missing]"
  }
},
...

successes¶: Return a list of actions that succeeded, in the same format as errors().

exception pyelasticsearch.exceptions.ElasticHttpError[source]¶

Exception raised when ES returns a non-OK (>=400) HTTP status code

error¶: A string error message

status_code¶: The HTTP status code of the response that precipitated the error

exception pyelasticsearch.exceptions.ElasticHttpNotFoundError[source]¶: Exception raised when a request to ES returns a 404

exception pyelasticsearch.exceptions.IndexAlreadyExistsError[source]¶: Exception raised on an attempt to create an index that already exists

exception pyelasticsearch.exceptions.InvalidJsonResponseError[source]¶

Exception raised in the unlikely event that ES returns a non-JSON response

input¶: Return the data we attempted to convert to JSON.

Debugging¶

pyelasticsearch logs to the elasticsearch.trace logger using the Python logging module. If you configure that to show INFO-level messages, then it’ll show the requests in curl form and their responses. To see when a server is marked as dead, follow the elasticsearch logger.

import logging

logging.getLogger('elasticsearch.trace').setLevel(logging.INFO)
logging.getLogger('elasticsearch').setLevel(logging.INFO)

Note

This assumes that logging is already set up with something like this:

import logging

logging.basicConfig()

pyelasticsearch will log lines like:

INFO:elasticsearch.trace: curl
-XGET 'http://localhost:9200/fooindex/testdoc/_search' -d '{"fa
cets": {"topics": {"terms": {"field": "topics"}}}}'

You can copy and paste the curl line, and it’ll work on the command line.