API Documentation¶
A Word About Forward-Compatibility Kwargs¶
In the following documentation, the phrase “other kwargs listed below” refers
to the kwargs documented in a subsequent Parameters section. However, it also
implicitly includes any kwargs the caller might care to make up and have passed
to ES as query string parameters. These kwargs must start with es_
for
forward compatibility and will be unprefixed and converted to strings as
discussed in Features.
ElasticSearch Class¶
Unless otherwise indicated, methods return the JSON-decoded response sent by elasticsearch. This way, you don’t lose any part of the return value, no matter how esoteric. But fear not: if there was an error, an exception will be raised, so it’ll be hard to miss.
-
class
pyelasticsearch.
ElasticSearch
(urls='http://localhost', timeout=60, max_retries=0, port=9200, username=None, password=None, ca_certs='/home/docs/checkouts/readthedocs.org/user_builds/pyelasticsearch/envs/latest/local/lib/python2.7/site-packages/certifi/cacert.pem', client_cert=None)[source]¶ An object which manages connections to elasticsearch and acts as a go-between for API calls to it
This object is thread-safe. You can create one instance and share it among all threads.
Parameters: - urls – A URL or iterable of URLs of ES nodes. These can be full
URLs with port numbers, like
http://elasticsearch.example.com:9200
, or you can pass the port separately using theport
kwarg. To do HTTP basic authentication, you can use RFC-2617-style URLs likehttp://someuser:somepassword@example.com:9200
or the separateusername
andpassword
kwargs below. - timeout – Number of seconds to wait for each request before raising Timeout
- max_retries – How many other servers to try, in series, after a request times out or a connection fails
- username – Authentication username to send via HTTP basic auth
- password – Password to use in HTTP basic auth. If a username and password are embedded in a URL, those are favored.
- port – The default port to connect on, for URLs that don’t include an explicit port
- ca_certs – A path to a bundle of CA certificates to trust. The default is to use Mozilla’s bundle, the same one used by Firefox.
- client_cert – A certificate to authenticate the client to the server
-
json_encoder
= <class 'pyelasticsearch.client.JsonEncoder'>¶ You can set this attribute on an instance to customize JSON encoding. The stock JsonEncoder class maps Python datetimes to ES-style datetimes and Python sets to ES lists. You can subclass it to add more.
- urls – A URL or iterable of URLs of ES nodes. These can be full
URLs with port numbers, like
Bulk Indexing Methods¶
-
class
pyelasticsearch.
ElasticSearch
[source] -
bulk
(actions, index=None, doc_type=None[, other kwargs listed below])[source]¶ Perform multiple index, delete, create, or update actions per request.
Used with helper routines
index_op()
,delete_op()
, andupdate_op()
, this provides an efficient, readable way to do large-scale changes. This contrived example illustrates the structure:es.bulk([es.index_op({'title': 'All About Cats', 'pages': 20}), es.index_op({'title': 'And Rats', 'pages': 47}), es.index_op({'title': 'And Bats', 'pages': 23})], doc_type='book', index='library')
More often, you’ll want to index (or delete or update) a larger number of documents. In those cases, yield your documents from a generator, and use
bulk_chunks()
to divide them into multiple requests:from pyelasticsearch import bulk_chunks def documents(): for book in books: yield es.index_op({'title': book.title, 'pages': book.pages}) # index_op() also takes kwargs like index= and id= in case # you want more control. # # You could also yield some delete_ops or update_ops here. # bulk_chunks() breaks your documents into smaller requests for speed: for chunk in bulk_chunks(documents(), docs_per_chunk=500, bytes_per_chunk=10000): # We specify a default index and doc type here so we don't # have to repeat them in every operation: es.bulk(chunk, doc_type='book', index='library')
Parameters: - actions – An iterable of bulk actions, generally the output of
bulk_chunks()
but sometimes a list of calls toindex_op()
,delete_op()
, andupdate_op()
directly. Specifically, an iterable of JSON-encoded bytestrings that can be joined with newlines and sent to ES. - index – Default index to operate on
- doc_type – Default type of document to operate on. Cannot be
specified without
index
. - consistency – See the ES docs.
- refresh – See the ES docs.
- replication – See the ES docs.
- routing – See the ES docs.
- timeout – See the ES docs.
Return the decoded JSON response on success.
Raise
BulkError
if any of the individual actions fail. The exception provides enough about the failed actions to identify them for retrying.Sometimes there is an error with the request in general, not with any individual actions. If there is a connection error, timeout, or other transport error, a more general exception will be raised, as with other methods; see Error Handling.
See ES’s bulk API for more detail.
- actions – An iterable of bulk actions, generally the output of
-
index_op
(doc, doc_type=None, overwrite_existing=True, **meta)[source]¶ Return a document-indexing operation that can be passed to
bulk()
. (See there for examples.)Specifically, return a 2-line, JSON-encoded bytestring.
Parameters: - doc – A mapping of property names to values.
- doc_type – The type of the document to index, if different from
the one you pass to
bulk()
- overwrite_existing – Whether we should overwrite existing documents of the same ID and doc type. (If False, this does a create operation.)
- meta – Other args controlling how the document is indexed,
like
id
(most common),index
(next most common),version
, androuting
. See ES’s bulk API for details on these.
-
delete_op
(doc_type=None, **meta)[source]¶ Return a document-deleting operation that can be passed to
bulk()
.def actions(): ... yield es.delete_op(id=7) yield es.delete_op(id=9, index='some-non-default-index', doc_type='some-non-default-type') ... es.bulk(actions(), ...)
Specifically, return a JSON-encoded bytestring.
Parameters: - doc_type – The type of the document to delete, if different
from the one passed to
bulk()
- meta – A description of what document to delete and how to do it.
Example:
{"index": "library", "id": 2, "version": 4}
. See ES’s bulk API for a list of all the options.
- doc_type – The type of the document to delete, if different
from the one passed to
-
update_op
(doc=None, doc_type=None, upsert=None, doc_as_upsert=None, script=None, params=None, lang=None, **meta)[source]¶ Return a document-updating operation that can be passed to
bulk()
.def actions(): ... yield es.update_op(doc={'pages': 4}, id=7, version=21) ... es.bulk(actions(), ...)
Specifically, return a JSON-encoded bytestring.
Parameters: - doc – A partial document to be merged into the existing document
- doc_type – The type of the document to update, if different
from the one passed to
bulk()
- upsert – The content for the new document created if the document does not exist
- script – The script to be used to update the document
- params – A dict of the params to be put in scope of the script
- lang – The language of the script. Omit to use the default,
specified by
script.default_lang
. - meta – Other args controlling what document to update and how
to do it, like
id
,index
, andretry_on_conflict
, destined for the action line itself rather than the payload. See ES’s bulk API for details on these.
-
bulk_index
(index, doc_type, docs, id_field='id', parent_field='_parent'[, other kwargs listed below])[source]¶ Index a list of documents as efficiently as possible.
Note
This is deprecated in favor of
bulk()
, which supports all types of bulk actions, not just indexing, is compatible withbulk_chunks()
for batching, and has a simpler, more flexible design.Parameters: - index – The name of the index to which to add the document. Pass None if you will specify indices individual in each doc.
- doc_type – The type of the document
- docs – An iterable of Python mapping objects, convertible to JSON, representing documents to index
- id_field – The field of each document that holds its ID. Removed from document before indexing.
- parent_field – The field of each document that holds its parent ID, if any. Removed from document before indexing.
- index_field – The field of each document that holds the index to
put it into, if different from the
index
arg. Removed from document before indexing. - type_field – The field of each document that holds the doc type it
should become, if different from the
doc_type
arg. Removed from the document before indexing. - consistency – See the ES docs.
- refresh – See the ES docs.
- replication – See the ES docs.
- routing – See the ES docs.
- timeout – See the ES docs.
Raise
BulkError
if the request as a whole succeeded but some of the individual actions failed. You can pull enough about the failed actions out of the exception to identify them for retrying.See ES’s bulk API for more detail.
-
There’s also a helper function, outside the ElasticSearch class:
-
pyelasticsearch.
bulk_chunks
(actions, docs_per_chunk=300, bytes_per_chunk=None)[source]¶ Return groups of bulk-indexing operations to send to
bulk()
.Return an iterable of chunks, each of which is a JSON-encoded line or pair of lines in the format understood by ES’s bulk API.
Parameters: - actions – An iterable of bulk actions, JSON-encoded. The best idea is
to pass me a list of the outputs from
index_op()
,delete_op()
, andupdate_op()
. - docs_per_chunk – The number of documents (or, more technically,
actions) to put in each chunk. Set to None to use only
bytes_per_chunk
. - bytes_per_chunk – The maximum number of bytes of HTTP body payload to
put in each chunk. Leave at None to use only
docs_per_chunk
. This option helps prevent timeouts when you have occasional very large documents. Without it, you may get unlucky: several large docs might land in one chunk, and ES might time out.
Chunks are capped by
docs_per_chunk
orbytes_per_chunk
, whichever is reached first. Obviously, we cannot make a chunk to smaller than its smallest doc, but we do the best we can. If bothdocs_per_chunk
andbytes_per_chunk
are None, all docs end up in one big chunk (and you might as well not use this at all).- actions – An iterable of bulk actions, JSON-encoded. The best idea is
to pass me a list of the outputs from
Other Methods¶
-
class
pyelasticsearch.
ElasticSearch
[source] -
-
close_index
(index)[source]¶ Close an index.
Parameters: index – The index to close See ES’s close-index API for more detail.
-
cluster_state
(metric='_all', index='_all'[, other kwargs listed below])[source]¶ Return state information about the cluster.
Parameters: - metric – Which metric to return: one of “version”, “master_node”, “nodes”, “routing_table”, “meatadata”, or “blocks”, an iterable of them, or a comma-delimited string of them. Defaults to all metrics.
- index – An index or iterable of indexes to return info about
- local – See the ES docs.
See ES’s cluster-state API for more detail.
-
count
(query[, other kwargs listed below])[source]¶ Execute a query against one or more indices and get hit count.
Parameters: - query – A dictionary that will convert to ES’s query DSL or a
string that will serve as a textual query to be passed as the
q
query string parameter - index – An index or iterable of indexes to search. Omit to search all.
- doc_type – A document type or iterable thereof to search. Omit to search all.
- df – See the ES docs.
- analyzer – See the ES docs.
- default_operator – See the ES docs.
- source – See the ES docs.
- routing – See the ES docs.
See ES’s count API for more detail.
- query – A dictionary that will convert to ES’s query DSL or a
string that will serve as a textual query to be passed as the
-
create_index
(index, settings=None)[source]¶ Create an index with optional settings.
Parameters: - index – The name of the index to create
- settings – A dictionary of settings
If the index already exists, raise
IndexAlreadyExistsError
.See ES’s create-index API for more detail.
-
delete
(index, doc_type, id[, other kwargs listed below])[source]¶ Delete a typed JSON document from a specific index based on its ID.
Parameters: - index – The name of the index from which to delete
- doc_type – The type of the document to delete
- id – The (string or int) ID of the document to delete
- routing – See the ES docs.
- parent – See the ES docs.
- replication – See the ES docs.
- consistency – See the ES docs.
- refresh – See the ES docs.
See ES’s delete API for more detail.
-
delete_all
(index, doc_type[, other kwargs listed below])[source]¶ Delete all documents of the given doc type from an index.
Parameters: - index – The name of the index from which to delete. ES does not support this being empty or “_all” or a comma-delimited list of index names (in 0.19.9).
- doc_type – The name of a document type
- routing – See the ES docs.
- parent – See the ES docs.
- replication – See the ES docs.
- consistency – See the ES docs.
- refresh – See the ES docs.
See ES’s delete API for more detail.
-
delete_by_query
(index, doc_type, query[, other kwargs listed below])[source]¶ Delete typed JSON documents from a specific index based on query.
Parameters: - index – An index or iterable thereof from which to delete
- doc_type – The type of document or iterable thereof to delete
- query – A dictionary that will convert to ES’s query DSL or a
string that will serve as a textual query to be passed as the
q
query string parameter. (Passing theq
kwarg yourself is deprecated.) - q – See the ES docs.
- df – See the ES docs.
- analyzer – See the ES docs.
- default_operator – See the ES docs.
- sourcerouting – See the ES docs.
- replication – See the ES docs.
- consistency – See the ES docs.
See ES’s delete-by-query API for more detail.
-
delete_index
(index)[source]¶ Delete an index.
Parameters: index – An index or iterable thereof to delete If the index is not found, raise
ElasticHttpNotFoundError
.See ES’s delete-index API for more detail.
-
flush
(index=None[, other kwargs listed below])[source]¶ Flush one or more indices (clear memory).
Parameters: - index – An index or iterable of indexes
- refresh – See the ES docs.
See ES’s flush API for more detail.
-
gateway_snapshot
(index=None)[source]¶ Gateway snapshot one or more indices.
Parameters: index – An index or iterable of indexes See ES’s gateway-snapshot API for more detail.
-
get
(index, doc_type, id[, other kwargs listed below])[source]¶ Get a typed JSON document from an index by ID.
Parameters: - index – The name of the index from which to retrieve
- doc_type – The type of document to get
- id – The ID of the document to retrieve
- realtime – See the ES docs.
- fields – See the ES docs.
- routing – See the ES docs.
- preference – See the ES docs.
- refresh – See the ES docs.
See ES’s get API for more detail.
-
get_mapping
(index=None, doc_type=None)[source]¶ Fetch the mapping definition for a specific index and type.
Parameters: - index – An index or iterable thereof
- doc_type – A document type or iterable thereof
Omit both arguments to get mappings for all types and indexes.
See ES’s get-mapping API for more detail.
-
get_settings
(index[, other kwargs listed below])[source]¶ Get the settings of one or more indexes.
Parameters: index – An index or iterable of indexes See ES’s get-settings API for more detail.
-
health
(index=None[, other kwargs listed below])[source]¶ Report on the health of the cluster or certain indices.
Parameters: - index – The index or iterable of indexes to examine
- level – See the ES docs.
- wait_for_status – See the ES docs.
- wait_for_relocating_shards – See the ES docs.
- wait_for_nodes – See the ES docs.
- timeout – See the ES docs.
See ES’s cluster-health API for more detail.
-
index
(index, doc_type, doc, id=None, overwrite_existing=True[, other kwargs listed below])[source]¶ Put a typed JSON document into a specific index to make it searchable.
Parameters: - index – The name of the index to which to add the document
- doc_type – The type of the document
- doc – A Python mapping object, convertible to JSON, representing the document
- id – The ID to give the document. Leave blank to make one up.
- overwrite_existing – Whether we should overwrite existing documents of the same ID and doc type
- routing – A value hashed to determine which shard this indexing request is routed to
- parent – The ID of a parent document, which leads this document to
be routed to the same shard as the parent, unless
routing
overrides it. - timestamp – An explicit value for the (typically automatic)
timestamp associated with a document, for use with
ttl
and such - ttl – The time until this document is automatically removed from the index. Can be an integral number of milliseconds or a duration like ‘1d’.
- percolate – An indication of which percolator queries, registered against this index, should be checked against the new document: ‘*’ or a query string like ‘color:green’
- consistency – An indication of how many active shards the contact node should demand to see in order to let the index operation succeed: ‘one’, ‘quorum’, or ‘all’
- replication – Set to ‘async’ to return from ES before finishing replication.
- refresh – Pass True to refresh the index after adding the document.
- timeout – A duration to wait for the relevant primary shard to become available, in the event that it isn’t: for example, “5m”
- fields – See the ES docs.
See ES’s index API for more detail.
-
more_like_this
(index, doc_type, id, fields, body=''[, other kwargs listed below])[source]¶ Execute a “more like this” search query against one or more fields and get back search hits.
Parameters: - index – The index to search and where the document for comparison lives
- doc_type – The type of document to find others like
- id – The ID of the document to find others like
- mlt_fields – The list of fields to compare on
- body – A dictionary that will convert to ES’s query DSL and be passed as the request body
- search_type – See the ES docs.
- search_indices – See the ES docs.
- search_types – See the ES docs.
- search_scroll – See the ES docs.
- search_size – See the ES docs.
- search_from – See the ES docs.
- like_text – See the ES docs.
- percent_terms_to_match – See the ES docs.
- min_term_freq – See the ES docs.
- max_query_terms – See the ES docs.
- stop_words – See the ES docs.
- min_doc_freq – See the ES docs.
- max_doc_freq – See the ES docs.
- min_word_len – See the ES docs.
- max_word_len – See the ES docs.
- boost_terms – See the ES docs.
- boost – See the ES docs.
- analyzer – See the ES docs.
See ES’s more-like-this API for more detail.
-
multi_get
(ids, index=None, doc_type=None, fields=None[, other kwargs listed below])[source]¶ Get multiple typed JSON documents from ES.
Parameters: - ids – An iterable, each element of which can be either an a dict or
an id (int or string). IDs are taken to be document IDs. Dicts are
passed through the Multi Get API essentially verbatim, except that
any missing
_type
,_index
, orfields
keys are filled in from the defaults given in thedoc_type
,index
, andfields
args. - index – Default index name from which to retrieve
- doc_type – Default type of document to get
- fields – Default fields to return
See ES’s Multi Get API for more detail.
- ids – An iterable, each element of which can be either an a dict or
an id (int or string). IDs are taken to be document IDs. Dicts are
passed through the Multi Get API essentially verbatim, except that
any missing
-
open_index
(index)[source]¶ Open an index.
Parameters: index – The index to open See ES’s open-index API for more detail.
-
optimize
(index=None[, other kwargs listed below])[source]¶ Optimize one or more indices.
Parameters: - index – An index or iterable of indexes
- max_num_segments – See the ES docs.
- only_expunge_deletes – See the ES docs.
- refresh – See the ES docs.
- flush – See the ES docs.
- wait_for_merge – See the ES docs.
See ES’s optimize API for more detail.
-
percolate
(index, doc_type, doc[, other kwargs listed below])[source]¶ Run a JSON document through the registered percolator queries, and return which ones match.
Parameters: - index – The name of the index to which the document pretends to belong
- doc_type – The type the document should be treated as if it has
- doc – A Python mapping object, convertible to JSON, representing the document
- routing – See the ES docs.
- preference – See the ES docs.
- ignore_unavailable – See the ES docs.
- percolate_format – See the ES docs.
Use
index()
to register percolators. See ES’s percolate API for more detail.
-
put_mapping
(index, doc_type, mapping[, other kwargs listed below])[source]¶ Register specific mapping definition for a specific type against one or more indices.
Parameters: - index – An index or iterable thereof
- doc_type – The document type to set the mapping of
- mapping – A dict representing the mapping to install. For example, this dict can have top-level keys that are the names of doc types.
- ignore_conflicts – See the ES docs.
See ES’s put-mapping API for more detail.
-
refresh
(index=None)[source]¶ Refresh one or more indices.
Parameters: index – An index or iterable of indexes See ES’s refresh API for more detail.
-
search
(query[, other kwargs listed below])[source]¶ Execute a search query against one or more indices and get back search hits.
Parameters: - query – A dictionary that will convert to ES’s query DSL or a
string that will serve as a textual query to be passed as the
q
query string parameter - index – An index or iterable of indexes to search. Omit to search all.
- doc_type – A document type or iterable thereof to search. Omit to search all.
- size – Limit the number of results to
size
. Use withes_from
to implement paginated searching. - routing – See the ES docs.
See ES’s search API for more detail.
- query – A dictionary that will convert to ES’s query DSL or a
string that will serve as a textual query to be passed as the
-
send_request
(method, path_components, body='', query_params=None)[source]¶ Send an HTTP request to ES, and return the JSON-decoded response.
This is mostly an internal method, but it also comes in handy if you need to use a brand new ES API that isn’t yet explicitly supported by pyelasticsearch, while still taking advantage of our connection pooling and retrying.
Retry the request on different servers if the first one is down and the
max_retries
constructor arg was > 0.On failure, raise an
ElasticHttpError
, aConnectionError
, or aTimeout
.Parameters: - method – An HTTP method, like “GET”
- path_components – An iterable of path components, to be joined by “/”
- body – A map of key/value pairs to be sent as the JSON request body. Alternatively, a string to be sent verbatim, without further JSON encoding.
- query_params – A map of querystring param names to values or
None
-
status
(index=None[, other kwargs listed below])[source]¶ Retrieve the status of one or more indices
Parameters: - index – An index or iterable thereof
- recovery – See the ES docs.
- snapshot – See the ES docs.
See ES’s index-status API for more detail.
-
update
(index, doc_type, id, script[, other kwargs listed below])[source]¶ Update an existing document. Raise
TypeError
ifscript
,doc
andupsert
are all unspecified.Parameters: - index – The name of the index containing the document
- doc_type – The type of the document
- id – The ID of the document
- script – The script to be used to update the document
- params – A dict of the params to be put in scope of the script
- lang – The language of the script. Omit to use the default,
specified by
script.default_lang
. - doc – A partial document to be merged into the existing document
- upsert – The content for the new document created if the document does not exist
- doc_as_upsert – The provided document will be inserted if the document does not already exist
- routing – See the ES docs.
- parent – See the ES docs.
- timeout – See the ES docs.
- replication – See the ES docs.
- consistency – See the ES docs.
- percolate – See the ES docs.
- refresh – See the ES docs.
- retry_on_conflict – See the ES docs.
- fields – See the ES docs.
See ES’s Update API for more detail.
-
update_aliases
(settings[, other kwargs listed below])[source]¶ Atomically add, remove, or update aliases in bulk.
Parameters: actions – A list of the actions to perform
-
update_all_settings
(settings)[source]¶ Update the settings of all indexes.
Parameters: settings – A dictionary of settings See ES’s update-settings API for more detail.
-
update_settings
(index, settings)[source]¶ Change the settings of one or more indexes.
Parameters: - index – An index or iterable of indexes
- settings – A dictionary of settings
See ES’s update-settings API for more detail.
-
Error Handling¶
Any method representing an ES API call can raise one of the following exceptions:
-
exception
pyelasticsearch.exceptions.
ConnectionError
¶ Exception raised there is a connection error and we are out of retries. (See the
max_retries
argument toElasticSearch
.)
-
exception
pyelasticsearch.exceptions.
Timeout
¶ Exception raised when an HTTP request times out and we are out of retries. (See the
max_retries
argument toElasticSearch
.)
-
exception
pyelasticsearch.exceptions.
BulkError
[source]¶ Exception raised when one or more bulk actions fail
You can extract document IDs from these to retry them.
-
errors
¶ Return a list of actions that failed, in the format emitted by ES:
{"index" : { "_index" : "test", "_type" : "type1", "_id" : "1", "status" : 409, "error" : "VersionConflictEngineException[[test][2] [type1][1]: version conflict, current [3], provided [2]]" } }, {"update" : { "_index" : "index1", "_type" : "type1", "_id" : "1", "status" : 404, "error" : "DocumentMissingException[[index1][-1] [type1][1]: document missing]" } }, ...
-
-
exception
pyelasticsearch.exceptions.
ElasticHttpError
[source]¶ Exception raised when ES returns a non-OK (>=400) HTTP status code
-
error
¶ A string error message
-
status_code
¶ The HTTP status code of the response that precipitated the error
-
-
exception
pyelasticsearch.exceptions.
ElasticHttpNotFoundError
[source]¶ Exception raised when a request to ES returns a 404
Debugging¶
pyelasticsearch logs to the elasticsearch.trace
logger using the Python
logging module. If you configure that to show INFO-level messages, then it’ll
show the requests in curl form and their responses. To see when a server is
marked as dead, follow the elasticsearch
logger.
import logging
logging.getLogger('elasticsearch.trace').setLevel(logging.INFO)
logging.getLogger('elasticsearch').setLevel(logging.INFO)
Note
This assumes that logging is already set up with something like this:
import logging
logging.basicConfig()
pyelasticsearch will log lines like:
INFO:elasticsearch.trace: curl
-XGET 'http://localhost:9200/fooindex/testdoc/_search' -d '{"fa
cets": {"topics": {"terms": {"field": "topics"}}}}'
You can copy and paste the curl line, and it’ll work on the command line.