Migrated documentation into the main repo

This commit is contained in:
Clinton Gormley 2013-08-29 01:24:34 +02:00
parent b9558edeff
commit 822043347e
316 changed files with 23987 additions and 0 deletions

3
.gitignore vendored
View File

@ -8,6 +8,8 @@ logs/
build/
target/
.local-execution-hints.log
docs/html/
docs/build.log
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
## The only configuration files which are not ignored are .settings since
@ -19,6 +21,7 @@ target/
*/.project
*/.classpath
*/eclipse-build
.settings/
## netbeans ignores
nb-configuration.xml

View File

@ -0,0 +1,152 @@
== Clients
[float]
=== Perl
* http://github.com/clintongormley/ElasticSearch.pm[ElasticSearch.pm]:
Perl client.
[float]
=== Python
* http://github.com/aparo/pyes[pyes]:
Python client.
* http://github.com/rhec/pyelasticsearch[pyelasticsearch]:
Python client.
* https://github.com/eriky/ESClient[ESClient]:
A lightweight and easy to use Python client for ElasticSearch.
* https://github.com/humangeo/rawes[rawes]:
Python low level client.
* https://github.com/mozilla/elasticutils/[elasticutils]:
A friendly chainable ElasticSearch interface for Python.
* http://intridea.github.io/surfiki-refine-elasticsearch/[Surfiki Refine]:
Python Map-Reduce engine targeting Elasticsearch indices.
[float]
=== Ruby
* http://github.com/karmi/tire[Tire]:
Ruby API & DSL, with ActiveRecord/ActiveModel integration.
* http://github.com/grantr/rubberband[rubberband]:
Ruby client.
* https://github.com/PoseBiz/stretcher[stretcher]:
Ruby client.
* https://github.com/wireframe/elastic_searchable/[elastic_searchable]:
Ruby client + Rails integration.
[float]
=== PHP
* http://github.com/ruflin/Elastica[Elastica]:
PHP client.
* http://github.com/nervetattoo/elasticsearch[elasticsearch] PHP client.
* http://github.com/polyfractal/Sherlock[Sherlock]:
PHP client, one-to-one mapping with query DSL, fluid interface.
[float]
=== Java
* https://github.com/searchbox-io/Jest[Jest]:
Java Rest client.
[float]
=== Javascript
* https://github.com/fullscale/elastic.js[Elastic.js]:
A JavaScript implementation of the ElasticSearch Query DSL and Core API.
* https://github.com/phillro/node-elasticsearch-client[node-elasticsearch-client]:
A NodeJS client for elastic search.
* https://github.com/ramv/node-elastical[node-elastical]:
Node.js client for the ElasticSearch REST API
[float]
=== .Net
* https://github.com/Yegoroff/PlainElastic.Net[PlainElastic.Net]:
.NET client.
* https://github.com/Mpdreamz/NEST[NEST]:
.NET client.
* https://github.com/medcl/ElasticSearch.Net[ElasticSearch.NET]:
.NET client.
[float]
=== Scala
* https://github.com/sksamuel/elastic4s[elastic4s]:
Scala DSL.
* https://github.com/scalastuff/esclient[esclient]:
Thin Scala client.
* https://github.com/bsadeh/scalastic[scalastic]:
Scala client.
[float]
=== Clojure
* http://github.com/clojurewerkz/elastisch[Elastisch]:
Clojure client.
[float]
=== Go
* https://github.com/mattbaird/elastigo[elastigo]:
Go client.
* https://github.com/belogik/goes[goes]:
Go lib.
[float]
=== Erlang
* http://github.com/tsloughter/erlastic_search[erlastic_search]:
Erlang client using HTTP.
* https://github.com/dieswaytoofast/erlasticsearch[erlasticsearch]:
Erlang client using Thrift.
* https://github.com/datahogs/tirexs[Tirexs]:
An https://github.com/elixir-lang/elixir[Elixir] based API/DSL, inspired by
http://github.com/karmi/tire[Tire]. Ready to use in pure Erlang
environment.
[float]
=== EventMachine
* http://github.com/vangberg/em-elasticsearch[em-elasticsearch]:
elasticsearch library for eventmachine.
[float]
=== Command Line
* https://github.com/elasticsearch/es2unix[es2unix]:
Elasticsearch API consumable by the Linux command line.
* https://github.com/javanna/elasticshell[elasticshell]:
command line shell for elasticsearch.
[float]
=== OCaml
* https://github.com/tovbinm/ocaml-elasticsearch[ocaml-elasticsearch]:
OCaml client for Elasticsearch
[float]
=== Smalltalk
* http://ss3.gemstone.com/ss/Elasticsearch.html[Elasticsearch] -
Smalltalk client for Elasticsearch

View File

@ -0,0 +1,16 @@
== Front Ends
* https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo[Sense]:
Chrome curl-like plugin for runninq requests against an Elasticsearch node
* https://github.com/mobz/elasticsearch-head[elasticsearch-head]:
A web front end for an elastic search cluster.
* https://github.com/OlegKunitsyn/elasticsearch-browser[browser]:
Web front-end over elasticsearch data.
* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor]:
Front-end to help debug/diagnose queries and analyzers
* http://elastichammer.exploringelasticsearch.com/[Hammer]:
Web front-end for elasticsearch

View File

@ -0,0 +1,5 @@
== GitHub
GitHub is a place where a lot of development is done around
*elasticsearch*, here is a simple search for
https://github.com/search?q=elasticsearch&type=Repositories[repositories].

View File

@ -0,0 +1,15 @@
= Community Supported Clients
include::clients.asciidoc[]
include::frontends.asciidoc[]
include::integrations.asciidoc[]
include::misc.asciidoc[]
include::monitoring.asciidoc[]
include::github.asciidoc[]

View File

@ -0,0 +1,71 @@
== Integrations
* http://grails.org/plugin/elasticsearch[Grails]:
ElasticSearch Grails plugin.
* https://github.com/carrot2/elasticsearch-carrot2[carrot2]:
Results clustering with carrot2
* https://github.com/angelf/escargot[escargot]:
ElasticSearch connector for Rails (WIP).
* https://metacpan.org/module/Catalyst::Model::Search::ElasticSearch[Catalyst]:
ElasticSearch and Catalyst integration.
* http://github.com/aparo/django-elasticsearch[django-elasticsearch]:
Django ElasticSearch Backend.
* http://github.com/Aconex/elasticflume[elasticflume]:
http://github.com/cloudera/flume[Flume] sink implementation.
* http://code.google.com/p/terrastore/wiki/Search_Integration[Terrastore Search]:
http://code.google.com/p/terrastore/[Terrastore] integration module with elasticsearch.
* https://github.com/infochimps/wonderdog[Wonderdog]:
Hadoop bulk loader into elasticsearch.
* http://geeks.aretotally.in/play-framework-module-elastic-search-distributed-searching-with-json-http-rest-or-java[Play!Framework]:
Integrate with Play! Framework Application.
* https://github.com/Exercise/FOQElasticaBundle[ElasticaBundle]:
Symfony2 Bundle wrapping Elastica.
* http://drupal.org/project/elasticsearch[Drupal]:
Drupal ElasticSearch integration.
* https://github.com/refuge/couch_es[couch_es]:
elasticsearch helper for couchdb based products (apache couchdb, bigcouch & refuge)
* https://github.com/sonian/elasticsearch-jetty[Jetty]:
Jetty HTTP Transport
* https://github.com/dadoonet/spring-elasticsearch[Spring Elasticsearch]:
Spring Factory for Elasticsearch
* https://camel.apache.org/elasticsearch.html[Apache Camel Integration]:
An Apache camel component to integrate elasticsearch
* https://github.com/tlrx/elasticsearch-test[elasticsearch-test]:
Elasticsearch Java annotations for unit testing with
http://www.junit.org/[JUnit]
* http://searchbox-io.github.com/wp-elasticsearch/[Wp-ElasticSearch]:
ElasticSearch WordPress Plugin
* https://github.com/OlegKunitsyn/eslogd[eslogd]:
Linux daemon that replicates events to a central ElasticSearch server in real-time
* https://github.com/drewr/elasticsearch-clojure-repl[elasticsearch-clojure-repl]:
Plugin that embeds nREPL for run-time introspective adventure! Also
serves as an nREPL transport.
* http://haystacksearch.org/[Haystack]:
Modular search for Django
* https://github.com/cleverage/play2-elasticsearch[play2-elasticsearch]:
ElasticSearch module for Play Framework 2.x
* https://github.com/fullscale/dangle[dangle]:
A set of AngularJS directives that provide common visualizations for elasticsearch based on
D3.

View File

@ -0,0 +1,17 @@
== Misc
* https://github.com/electrical/puppet-elasticsearch[Puppet]:
Elasticsearch puppet module.
* http://github.com/elasticsearch/cookbook-elasticsearch[Chef]:
Chef cookbook for Elasticsearch
* https://github.com/tavisto/elasticsearch-rpms[elasticsearch-rpms]:
RPMs for elasticsearch.
* http://www.github.com/neogenix/daikon[daikon]:
Daikon ElasticSearch CLI
* https://github.com/Aconex/scrutineer[Scrutineer]:
A high performance consistency checker to compare what you've indexed
with your source of truth content (e.g. DB)

View File

@ -0,0 +1,27 @@
== Health and Performance Monitoring
* https://github.com/lukas-vlcek/bigdesk[bigdesk]:
Live charts and statistics for elasticsearch cluster.
* https://github.com/karmi/elasticsearch-paramedic[paramedic]:
Live charts with cluster stats and indices/shards information.
* http://www.elastichq.org/[ElasticSearchHQ]:
Free cluster health monitoring tool
* http://sematext.com/spm/index.html[SPM for ElasticSearch]:
Performance monitoring with live charts showing cluster and node stats, integrated
alerts, email reports, etc.
* https://github.com/radu-gheorghe/check-es[check-es]:
Nagios/Shinken plugins for checking on elasticsearch
* https://github.com/anchor/nagios-plugin-elasticsearch[check_elasticsearch]:
An ElasticSearch availability and performance monitoring plugin for
Nagios.
* https://github.com/rbramley/Opsview-elasticsearch[opsview-elasticsearch]:
Opsview plugin written in Perl for monitoring ElasticSearch
* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy]:
Plugin to watch Lucene segment merges across your cluster

View File

@ -0,0 +1,99 @@
[[anatomy]]
== API Anatomy
Once a <<client,GClient>> has been
obtained, all of ElasticSearch APIs can be executed on it. Each Groovy
API is exposed using three different mechanisms.
[float]
=== Closure Request
The first type is to simply provide the request as a Closure, which
automatically gets resolved into the respective request instance (for
the index API, its the `IndexRequest` class). The API returns a special
future, called `GActionFuture`. This is a groovier version of
elasticsearch Java `ActionFuture` (in turn a nicer extension to Java own
`Future`) which allows to register listeners (closures) on it for
success and failures, as well as blocking for the response. For example:
[source,js]
--------------------------------------------------
def indexR = client.index {
index "test"
type "type1"
id "1"
source {
test = "value"
complex {
value1 = "value1"
value2 = "value2"
}
}
}
println "Indexed $indexR.response.id into $indexR.response.index/$indexR.response.type"
--------------------------------------------------
In the above example, calling `indexR.response` will simply block for
the response. We can also block for the response for a specific timeout:
[source,js]
--------------------------------------------------
IndexResponse response = indexR.response "5s" // block for 5 seconds, same as:
response = indexR.response 5, TimeValue.SECONDS //
--------------------------------------------------
We can also register closures that will be called on success and on
failure:
[source,js]
--------------------------------------------------
indexR.success = {IndexResponse response ->
pritnln "Indexed $response.id into $response.index/$response.type"
}
indexR.failure = {Throwable t ->
println "Failed to index: $t.message"
}
--------------------------------------------------
[float]
=== Request
This option allows to pass the actual instance of the request (instead
of a closure) as a parameter. The rest is similar to the closure as a
parameter option (the `GActionFuture` handling). For example:
[source,js]
--------------------------------------------------
def indexR = client.index (new IndexRequest(
index: "test",
type: "type1",
id: "1",
source: {
test = "value"
complex {
value1 = "value1"
value2 = "value2"
}
}))
println "Indexed $indexR.response.id into $indexR.response.index/$indexR.response.type"
--------------------------------------------------
[float]
=== Java Like
The last option is to provide an actual instance of the API request, and
an `ActionListener` for the callback. This is exactly like the Java API
with the added `gexecute` which returns the `GActionFuture`:
[source,js]
--------------------------------------------------
def indexR = node.client.prepareIndex("test", "type1", "1").setSource({
test = "value"
complex {
value1 = "value1"
value2 = "value2"
}
}).gexecute()
--------------------------------------------------

View File

@ -0,0 +1,58 @@
[[client]]
== Client
Obtaining an elasticsearch Groovy `GClient` (a `GClient` is a simple
wrapper on top of the Java `Client`) is simple. The most common way to
get a client is by starting an embedded `Node` which acts as a node
within the cluster.
[float]
=== Node Client
A Node based client is the simplest form to get a `GClient` to start
executing operations against elasticsearch.
[source,js]
--------------------------------------------------
import org.elasticsearch.groovy.client.GClient
import org.elasticsearch.groovy.node.GNode
import static org.elasticsearch.groovy.node.GNodeBuilder.nodeBuilder
// on startup
GNode node = nodeBuilder().node();
GClient client = node.client();
// on shutdown
node.close();
--------------------------------------------------
Since elasticsearch allows to configure it using JSON based settings,
the configuration itself can be done using a closure that represent the
JSON:
[source,js]
--------------------------------------------------
import org.elasticsearch.groovy.node.GNode
import org.elasticsearch.groovy.node.GNodeBuilder
import static org.elasticsearch.groovy.node.GNodeBuilder.*
// on startup
GNodeBuilder nodeBuilder = nodeBuilder();
nodeBuilder.settings {
node {
client = true
}
cluster {
name = "test"
}
}
GNode node = nodeBuilder.node()
// on shutdown
node.stop().close()
--------------------------------------------------

View File

@ -0,0 +1,22 @@
[[count]]
== Count API
The count API is very similar to the
link:{java}/count.html[Java count API]. The Groovy
extension allows to provide the query to execute as a `Closure` (similar
to GORM criteria builder):
[source,js]
--------------------------------------------------
def count = client.count {
indices "test"
types "type1"
query {
term {
test = "value"
}
}
}
--------------------------------------------------
The query follows the same link:{ref}/query-dsl.html[Query DSL].

View File

@ -0,0 +1,15 @@
[[delete]]
== Delete API
The delete API is very similar to the
link:{java}/delete.html[Java delete API], here is an
example:
[source,js]
--------------------------------------------------
def deleteF = node.client.delete {
index "test"
type "type1"
id "1"
}
--------------------------------------------------

View File

@ -0,0 +1,18 @@
[[get]]
== Get API
The get API is very similar to the
link:{java}/get.html[Java get API]. The main benefit
of using groovy is handling the source content. It can be automatically
converted to a `Map` which means using Groovy to navigate it is simple:
[source,js]
--------------------------------------------------
def getF = node.client.get {
index "test"
type "type1"
id "1"
}
println "Result of field2: $getF.response.source.complex.field2"
--------------------------------------------------

View File

@ -0,0 +1,50 @@
= Groovy API
:ref: http://www.elasticsearch.org/guide/elasticsearch/reference/current
:java: http://www.elasticsearch.org/guide/elasticsearch/client/java-api/current
[preface]
== Preface
This section describes the http://groovy.codehaus.org/[Groovy] API
elasticsearch provides. All elasticsearch APIs are executed using a
<<client,GClient>>, and are completely
asynchronous in nature (they either accept a listener, or return a
future).
The Groovy API is a wrapper on top of the
link:{java}[Java API] exposing it in a groovier
manner. The execution options for each API follow a similar manner and
covered in <<anatomy>>.
[float]
==== Maven Repository
The Groovy API is hosted on
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22elasticsearch-client-groovy%22[Maven
Central].
For example, you can define the latest version in your `pom.xml` file:
[source,xml]
--------------------------------------------------
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-client-groovy</artifactId>
<version>${es.version}</version>
</dependency>
--------------------------------------------------
include::anatomy.asciidoc[]
include::client.asciidoc[]
include::index_.asciidoc[]
include::get.asciidoc[]
include::delete.asciidoc[]
include::search.asciidoc[]
include::count.asciidoc[]

View File

@ -0,0 +1,31 @@
[[index_]]
== Index API
The index API is very similar to the
link:{java}/index_.html[Java index API]. The Groovy
extension to it is the ability to provide the indexed source using a
closure. For example:
[source,js]
--------------------------------------------------
def indexR = client.index {
index "test"
type "type1"
id "1"
source {
test = "value"
complex {
value1 = "value1"
value2 = "value2"
}
}
}
--------------------------------------------------
In the above example, the source closure itself gets transformed into an
XContent (defaults to JSON). In order to change how the source closure
is serialized, a global (static) setting can be set on the `GClient` by
changing the `indexContentType` field.
Note also that the `source` can be set using the typical Java based
APIs, the `Closure` option is a Groovy extension.

View File

@ -0,0 +1,114 @@
[[search]]
== Search API
The search API is very similar to the
link:{java}/search.html[Java search API]. The Groovy
extension allows to provide the search source to execute as a `Closure`
including the query itself (similar to GORM criteria builder):
[source,js]
--------------------------------------------------
def search = node.client.search {
indices "test"
types "type1"
source {
query {
term(test: "value")
}
}
}
search.response.hits.each {SearchHit hit ->
println "Got hit $hit.id from $hit.index/$hit.type"
}
--------------------------------------------------
It can also be executed using the "Java API" while still using a closure
for the query:
[source,js]
--------------------------------------------------
def search = node.client.prepareSearch("test").setQuery({
term(test: "value")
}).gexecute();
search.response.hits.each {SearchHit hit ->
println "Got hit $hit.id from $hit.index/$hit.type"
}
--------------------------------------------------
The format of the search `Closure` follows the same JSON syntax as the
link:{ref}/search-search.html[Search API] request.
[float]
=== More examples
Term query where multiple values are provided (see
link:{ref}/query-dsl-terms-query.html[terms]):
[source,js]
--------------------------------------------------
def search = node.client.search {
indices "test"
types "type1"
source {
query {
terms(test: ["value1", "value2"])
}
}
}
--------------------------------------------------
Query string (see
link:{ref}/query-dsl-query-string-query.html[query string]):
[source,js]
--------------------------------------------------
def search = node.client.search {
indices "test"
types "type1"
source {
query {
query_string(
fields: ["test"],
query: "value1 value2")
}
}
}
--------------------------------------------------
Pagination (see
link:{ref}/search-request-from-size.html[from/size]):
[source,js]
--------------------------------------------------
def search = node.client.search {
indices "test"
types "type1"
source {
from = 0
size = 10
query {
term(test: "value")
}
}
}
--------------------------------------------------
Sorting (see link:{ref}/search-request-sort.html[sort]):
[source,js]
--------------------------------------------------
def search = node.client.search {
indices "test"
types "type1"
source {
query {
term(test: "value")
}
sort = [
date : [ order: "desc"]
]
}
}
--------------------------------------------------

View File

@ -0,0 +1,38 @@
[[bulk]]
== Bulk API
The bulk API allows one to index and delete several documents in a
single request. Here is a sample usage:
[source,java]
--------------------------------------------------
import static org.elasticsearch.common.xcontent.XContentFactory.*;
BulkRequestBuilder bulkRequest = client.prepareBulk();
// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elastic Search")
.endObject()
)
);
bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "another post")
.endObject()
)
);
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
// process failures by iterating through each bulk response item
}
--------------------------------------------------

View File

@ -0,0 +1,185 @@
[[client]]
== Client
You can use the *java client* in multiple ways:
* Perform standard <<index_,index>>, <<get,get>>,
<<delete,delete>> and <<search,search>> operations on an
existing cluster
* Perform administrative tasks on a running cluster
* Start full nodes when you want to run Elasticsearch embedded in your
own application or when you want to launch unit or integration tests
Obtaining an elasticsearch `Client` is simple. The most common way to
get a client is by:
1. creating an embedded link:#nodeclient[`Node`] that acts as a node
within a cluster
2. requesting a `Client` from your embedded `Node`.
Another manner is by creating a link:#transportclient[`TransportClient`]
that connects to a cluster.
*Important:*
______________________________________________________________________________________________________________________________________________________________
Please note that you are encouraged to use the same version on client
and cluster sides. You may hit some incompatibilities issues when mixing
major versions.
______________________________________________________________________________________________________________________________________________________________
[float]
=== Node Client
Instantiating a node based client is the simplest way to get a `Client`
that can execute operations against elasticsearch.
[source,java]
--------------------------------------------------
import static org.elasticsearch.node.NodeBuilder.*;
// on startup
Node node = nodeBuilder().node();
Client client = node.client();
// on shutdown
node.close();
--------------------------------------------------
When you start a `Node`, it joins an elasticsearch cluster. You can have
different clusters by simple setting the `cluster.name` setting, or
explicitly using the `clusterName` method on the builder.
You can define `cluster.name` in `/src/main/resources/elasticsearch.yml`
dir in your project. As long as `elasticsearch.yml` is present in the
classloader, it will be used when you start your node.
[source,java]
--------------------------------------------------
cluster.name=yourclustername
--------------------------------------------------
Or in Java:
[source,java]
--------------------------------------------------
Node node = nodeBuilder().clusterName("yourclustername").node();
Client client = node.client();
--------------------------------------------------
The benefit of using the `Client` is the fact that operations are
automatically routed to the node(s) the operations need to be executed
on, without performing a "double hop". For example, the index operation
will automatically be executed on the shard that it will end up existing
at.
When you start a `Node`, the most important decision is whether it
should hold data or not. In other words, should indices and shards be
allocated to it. Many times we would like to have the clients just be
clients, without shards being allocated to them. This is simple to
configure by setting either `node.data` setting to `false` or
`node.client` to `true` (the `NodeBuilder` respective helper methods on
it):
[source,java]
--------------------------------------------------
import static org.elasticsearch.node.NodeBuilder.*;
// on startup
Node node = nodeBuilder().client(true).node();
Client client = node.client();
// on shutdown
node.close();
--------------------------------------------------
Another common usage is to start the `Node` and use the `Client` in
unit/integration tests. In such a case, we would like to start a "local"
`Node` (with a "local" discovery and transport). Again, this is just a
matter of a simple setting when starting the `Node`. Note, "local" here
means local on the JVM (well, actually class loader) level, meaning that
two *local* servers started within the same JVM will discover themselves
and form a cluster.
[source,java]
--------------------------------------------------
import static org.elasticsearch.node.NodeBuilder.*;
// on startup
Node node = nodeBuilder().local(true).node();
Client client = node.client();
// on shutdown
node.close();
--------------------------------------------------
[float]
=== Transport Client
The `TransportClient` connects remotely to an elasticsearch cluster
using the transport module. It does not join the cluster, but simply
gets one or more initial transport addresses and communicates with them
in round robin fashion on each action (though most actions will probably
be "two hop" operations).
[source,java]
--------------------------------------------------
// on startup
Client client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));
// on shutdown
client.close();
--------------------------------------------------
Note that you have to set the cluster name if you use one different to
"elasticsearch":
[source,java]
--------------------------------------------------
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = new TransportClient(settings);
//Add transport addresses and do something with the client...
--------------------------------------------------
Or using `elasticsearch.yml` file as shown in the link:#nodeclient[Node
Client section]
The client allows to sniff the rest of the cluster, and add those into
its list of machines to use. In this case, note that the ip addresses
used will be the ones that the other nodes were started with (the
"publish" address). In order to enable it, set the
`client.transport.sniff` to `true`:
[source,java]
--------------------------------------------------
Settings settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff", true).build();
TransportClient client = new TransportClient(settings);
--------------------------------------------------
Other transport client level settings include:
[cols="<,<",options="header",]
|=======================================================================
|Parameter |Description
|`client.transport.ignore_cluster_name` |Set to `true` to ignore cluster
name validation of connected nodes. (since 0.19.4)
|`client.transport.ping_timeout` |The time to wait for a ping response
from a node. Defaults to `5s`.
|`client.transport.nodes_sampler_interval` |How often to sample / ping
the nodes listed and connected. Defaults to `5s`.
|=======================================================================

View File

@ -0,0 +1,38 @@
[[count]]
== Count API
The count API allows to easily execute a query and get the number of
matches for that query. It can be executed across one or more indices
and across one or more types. The query can be provided using the
link:{ref}/query-dsl.html[Query DSL].
[source,java]
--------------------------------------------------
import static org.elasticsearch.index.query.xcontent.FilterBuilders.*;
import static org.elasticsearch.index.query.xcontent.QueryBuilders.*;
CountResponse response = client.prepareCount("test")
.setQuery(termQuery("_type", "type1"))
.execute()
.actionGet();
--------------------------------------------------
For more information on the count operation, check out the REST
link:{ref}/search-count.html[count] docs.
[float]
=== Operation Threading
The count API allows to set the threading model the operation will be
performed when the actual execution of the API is performed on the same
node (the API is executed on a shard that is allocated on the same
server).
There are three threading modes.The `NO_THREADS` mode means that the
count operation will be executed on the calling thread. The
`SINGLE_THREAD` mode means that the count operation will be executed on
a single different thread for all local shards. The `THREAD_PER_SHARD`
mode means that the count operation will be executed on a different
thread for each local shard.
The default mode is `SINGLE_THREAD`.

View File

@ -0,0 +1,21 @@
[[delete-by-query]]
== Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a <<query-dsl-queries,query>>. Here
is an example:
[source,java]
--------------------------------------------------
import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;
DeleteByQueryResponse response = client.prepareDeleteByQuery("test")
.setQuery(termQuery("_type", "type1"))
.execute()
.actionGet();
--------------------------------------------------
For more information on the delete by query operation, check out the
link:{ref}/docs-delete-by-query.html[delete_by_query API]
docs.

View File

@ -0,0 +1,39 @@
[[delete]]
== Delete API
The delete API allows to delete a typed JSON document from a specific
index based on its id. The following example deletes the JSON document
from an index called twitter, under a type called tweet, with id valued
1:
[source,java]
--------------------------------------------------
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
.execute()
.actionGet();
--------------------------------------------------
For more information on the delete operation, check out the
link:{ref}/docs-delete.html[delete API] docs.
[float]
=== Operation Threading
The delete API allows to set the threading model the operation will be
performed when the actual execution of the API is performed on the same
node (the API is executed on a shard that is allocated on the same
server).
The options are to execute the operation on a different thread, or to
execute it on the calling thread (note that the API is still async). By
default, `operationThreaded` is set to `true` which means the operation
is executed on a different thread. Here is an example that sets it to
`false`:
[source,java]
--------------------------------------------------
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
.setOperationThreaded(false)
.execute()
.actionGet();
--------------------------------------------------

View File

@ -0,0 +1,483 @@
[[facets]]
== Facets
Elasticsearch provides a full Java API to play with facets. See the
link:{ref}/search-facets.html[Facets guide].
Use the factory for facet builders (`FacetBuilders`) and add each facet
you want to compute when querying and add it to your search request:
[source,java]
--------------------------------------------------
SearchResponse sr = node.client().prepareSearch()
.setQuery( /* your query */ )
.addFacet( /* add a facet */ )
.execute().actionGet();
--------------------------------------------------
Note that you can add more than one facet. See
link:{ref}/search-search.html[Search Java API] for details.
To build facet requests, use `FacetBuilders` helpers. Just import them
in your class:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.FacetBuilders.*;
--------------------------------------------------
[float]
=== Facets
[float]
==== Terms Facet
Here is how you can use
link:{ref}/search-facets-terms-facet.html[Terms Facet]
with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.termsFacet("f")
.field("brand")
.size(10);
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.terms.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
TermsFacet f = (TermsFacet) sr.facets().facetsAsMap().get("f");
f.getTotalCount(); // Total terms doc count
f.getOtherCount(); // Not shown terms doc count
f.getMissingCount(); // Without term doc count
// For each entry
for (TermsFacet.Entry entry : f) {
entry.getTerm(); // Term
entry.getCount(); // Doc count
}
--------------------------------------------------
[float]
==== Range Facet
Here is how you can use
link:{ref}/search-facets-range-facet.html[Range Facet]
with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.rangeFacet("f")
.field("price") // Field to compute on
.addUnboundedFrom(3) // from -infinity to 3 (excluded)
.addRange(3, 6) // from 3 to 6 (excluded)
.addUnboundedTo(6); // from 6 to +infinity
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.range.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
RangeFacet f = (RangeFacet) sr.facets().facetsAsMap().get("f");
// For each entry
for (RangeFacet.Entry entry : f) {
entry.getFrom(); // Range from requested
entry.getTo(); // Range to requested
entry.getCount(); // Doc count
entry.getMin(); // Min value
entry.getMax(); // Max value
entry.getMean(); // Mean
entry.getTotal(); // Sum of values
}
--------------------------------------------------
[float]
==== Histogram Facet
Here is how you can use
link:{ref}/search-facets-histogram-facet.html[Histogram
Facet] with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
HistogramFacetBuilder facet = FacetBuilders.histogramFacet("f")
.field("price")
.interval(1);
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.histogram.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
HistogramFacet f = (HistogramFacet) sr.facets().facetsAsMap().get("f");
// For each entry
for (HistogramFacet.Entry entry : f) {
entry.getKey(); // Key (X-Axis)
entry.getCount(); // Doc count (Y-Axis)
}
--------------------------------------------------
[float]
==== Date Histogram Facet
Here is how you can use
link:{ref}/search-facets-date-histogram-facet.html[Date
Histogram Facet] with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.dateHistogramFacet("f")
.field("date") // Your date field
.interval("year"); // You can also use "quarter", "month", "week", "day",
// "hour" and "minute" or notation like "1.5h" or "2w"
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.datehistogram.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
DateHistogramFacet f = (DateHistogramFacet) sr.facets().facetsAsMap().get("f");
// For each entry
for (DateHistogramFacet.Entry entry : f) {
entry.getTime(); // Date in ms since epoch (X-Axis)
entry.getCount(); // Doc count (Y-Axis)
}
--------------------------------------------------
[float]
==== Filter Facet (not facet filter)
Here is how you can use
link:{ref}/search-facets-filter-facet.html[Filter Facet]
with Java API.
If you are looking on how to apply a filter to a facet, have a look at
link:#facet-filter[facet filter] using Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.filterFacet("f",
FilterBuilders.termFilter("brand", "heineken")); // Your Filter here
--------------------------------------------------
See <<query-dsl-filters,Filters>> to
learn how to build filters using Java.
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.filter.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
FilterFacet f = (FilterFacet) sr.facets().facetsAsMap().get("f");
f.getCount(); // Number of docs that matched
--------------------------------------------------
[float]
==== Query Facet
Here is how you can use
link:{ref}/search-facets-query-facet.html[Query Facet]
with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.queryFacet("f",
QueryBuilders.matchQuery("brand", "heineken"));
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.query.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
QueryFacet f = (QueryFacet) sr.facets().facetsAsMap().get("f");
f.getCount(); // Number of docs that matched
--------------------------------------------------
See <<query-dsl-queries,Queries>> to
learn how to build queries using Java.
[float]
==== Statistical
Here is how you can use
link:{ref}/search-facets-statistical-facet.html[Statistical
Facet] with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.statisticalFacet("f")
.field("price");
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.statistical.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
StatisticalFacet f = (StatisticalFacet) sr.facets().facetsAsMap().get("f");
f.getCount(); // Doc count
f.getMin(); // Min value
f.getMax(); // Max value
f.getMean(); // Mean
f.getTotal(); // Sum of values
f.getStdDeviation(); // Standard Deviation
f.getSumOfSquares(); // Sum of Squares
f.getVariance(); // Variance
--------------------------------------------------
[float]
==== Terms Stats Facet
Here is how you can use
link:{ref}/search-facets-terms-stats-facet.html[Terms
Stats Facet] with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.termsStatsFacet("f")
.keyField("brand")
.valueField("price");
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.termsstats.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
TermsStatsFacet f = (TermsStatsFacet) sr.facets().facetsAsMap().get("f");
f.getTotalCount(); // Total terms doc count
f.getOtherCount(); // Not shown terms doc count
f.getMissingCount(); // Without term doc count
// For each entry
for (TermsStatsFacet.Entry entry : f) {
entry.getTerm(); // Term
entry.getCount(); // Doc count
entry.getMin(); // Min value
entry.getMax(); // Max value
entry.getMean(); // Mean
entry.getTotal(); // Sum of values
}
--------------------------------------------------
[float]
==== Geo Distance Facet
Here is how you can use
link:{ref}/search-facets-geo-distance-facet.html[Geo
Distance Facet] with Java API.
[float]
===== Prepare facet request
Here is an example on how to create the facet request:
[source,java]
--------------------------------------------------
FacetBuilders.geoDistanceFacet("f")
.field("pin.location") // Field containing coordinates we want to compare with
.point(40, -70) // Point from where we start (0)
.addUnboundedFrom(10) // 0 to 10 km (excluded)
.addRange(10, 20) // 10 to 20 km (excluded)
.addRange(20, 100) // 20 to 100 km (excluded)
.addUnboundedTo(100) // from 100 km to infinity (and beyond ;-) )
.unit(DistanceUnit.KILOMETERS); // All distances are in kilometers. Can be MILES
--------------------------------------------------
[float]
===== Use facet response
Import Facet definition classes:
[source,java]
--------------------------------------------------
import org.elasticsearch.search.facet.geodistance.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
// sr is here your SearchResponse object
GeoDistanceFacet f = (GeoDistanceFacet) sr.facets().facetsAsMap().get("f");
// For each entry
for (GeoDistanceFacet.Entry entry : f) {
entry.getFrom(); // Distance from requested
entry.getTo(); // Distance to requested
entry.getCount(); // Doc count
entry.getMin(); // Min value
entry.getMax(); // Max value
entry.getTotal(); // Sum of values
entry.getMean(); // Mean
}
--------------------------------------------------
[float]
=== Facet filters (not Filter Facet)
By default, facets are applied on the query resultset whatever filters
exists or are.
If you need to compute facets with the same filters or even with other
filters, you can add the filter to any facet using
`AbstractFacetBuilder#facetFilter(FilterBuilder)` method:
[source,java]
--------------------------------------------------
FacetBuilders
.termsFacet("f").field("brand") // Your facet
.facetFilter( // Your filter here
FilterBuilders.termFilter("colour", "pale")
);
--------------------------------------------------
For example, you can reuse the same filter you created for your query:
[source,java]
--------------------------------------------------
// A common filter
FilterBuilder filter = FilterBuilders.termFilter("colour", "pale");
TermsFacetBuilder facet = FacetBuilders.termsFacet("f")
.field("brand")
.facetFilter(filter); // We apply it to the facet
SearchResponse sr = node.client().prepareSearch()
.setQuery(QueryBuilders.matchAllQuery())
.setFilter(filter) // We apply it to the query
.addFacet(facet)
.execute().actionGet();
--------------------------------------------------
See documentation on how to build
<<query-dsl-filters,Filters>>.
[float]
=== Scope
By default, facets are computed within the query resultset. But, you can
compute facets from all documents in the index whatever the query is,
using `global` parameter:
[source,java]
--------------------------------------------------
TermsFacetBuilder facet = FacetBuilders.termsFacet("f")
.field("brand")
.global(true);
--------------------------------------------------

View File

@ -0,0 +1,38 @@
[[get]]
== Get API
The get API allows to get a typed JSON document from the index based on
its id. The following example gets a JSON document from an index called
twitter, under a type called tweet, with id valued 1:
[source,java]
--------------------------------------------------
GetResponse response = client.prepareGet("twitter", "tweet", "1")
.execute()
.actionGet();
--------------------------------------------------
For more information on the index operation, check out the REST
link:{ref}/docs-get.html[get] docs.
[float]
=== Operation Threading
The get API allows to set the threading model the operation will be
performed when the actual execution of the API is performed on the same
node (the API is executed on a shard that is allocated on the same
server).
The options are to execute the operation on a different thread, or to
execute it on the calling thread (note that the API is still async). By
default, `operationThreaded` is set to `true` which means the operation
is executed on a different thread. Here is an example that sets it to
`false`:
[source,java]
--------------------------------------------------
GetResponse response = client.prepareGet("twitter", "tweet", "1")
.setOperationThreaded(false)
.execute()
.actionGet();
--------------------------------------------------

View File

@ -0,0 +1,61 @@
[[java-api]]
= Java API
:ref: http://www.elasticsearch.org/guide/elasticsearch/reference/current
[preface]
== Preface
This section describes the Java API that elasticsearch provides. All
elasticsearch operations are executed using a
<<client,Client>> object. All
operations are completely asynchronous in nature (either accepts a
listener, or return a future).
Additionally, operations on a client may be accumulated and executed in
<<bulk,Bulk>>.
Note, all the APIs are exposed through the
Java API (actually, the Java API is used internally to execute them).
[float]
== Maven Repository
Elasticsearch is hosted on
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22elasticsearch%22[Maven
Central].
For example, you can define the latest version in your `pom.xml` file:
[source,xml]
--------------------------------------------------
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${es.version}</version>
</dependency>
--------------------------------------------------
include::client.asciidoc[]
include::index_.asciidoc[]
include::get.asciidoc[]
include::delete.asciidoc[]
include::bulk.asciidoc[]
include::search.asciidoc[]
include::count.asciidoc[]
include::delete-by-query.asciidoc[]
include::facets.asciidoc[]
include::percolate.asciidoc[]
include::query-dsl-queries.asciidoc[]
include::query-dsl-filters.asciidoc[]

View File

@ -0,0 +1,201 @@
[[index_]]
== Index API
The index API allows one to index a typed JSON document into a specific
index and make it searchable.
[float]
=== Generate JSON document
There are different way of generating JSON document:
* Manually (aka do it yourself) using native `byte[]` or as a `String`
* Using `Map` that will be automatically converted to its JSON
equivalent
* Using a third party library to serialize your beans such as
http://wiki.fasterxml.com/JacksonHome[Jackson]
* Using built-in helpers XContentFactory.jsonBuilder()
Internally, each type is converted to `byte[]` (so a String is converted
to a `byte[]`). Therefore, if the object is in this form already, then
use it. The `jsonBuilder` is highly optimized JSON generator that
directly constructs a `byte[]`.
[float]
==== Do It Yourself
Nothing really difficult here but note that you will have to encode
dates regarding to the
link:{ref}/mapping-date-format.html[Date Format].
[source,java]
--------------------------------------------------
String json = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2013-01-30\"," +
"\"message\":\"trying out Elastic Search\"," +
"}";
--------------------------------------------------
[float]
==== Using Map
Map is a key:values pair collection. It represents very well a JSON
structure:
[source,java]
--------------------------------------------------
Map<String, Object> json = new HashMap<String, Object>();
json.put("user","kimchy");
json.put("postDate",new Date());
json.put("message","trying out Elastic Search");
--------------------------------------------------
[float]
==== Serialize your beans
Elasticsearch already use Jackson but shade it under
`org.elasticsearch.common.jackson` package. +
So, you can add your own Jackson version in your `pom.xml` file or in
your classpath. See http://wiki.fasterxml.com/JacksonDownload[Jackson
Download Page].
For example:
[source,java]
--------------------------------------------------
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.1.3</version>
</dependency>
--------------------------------------------------
Then, you can start serializing your beans to JSON:
[source,java]
--------------------------------------------------
import com.fasterxml.jackson.databind.*;
// instance a json mapper
ObjectMapper mapper = new ObjectMapper(); // create once, reuse
// generate json
String json = mapper.writeValueAsString(yourbeaninstance);
--------------------------------------------------
[float]
==== Use Elasticsearch helpers
Elasticsearch provides built-in helpers to generate JSON content.
[source,java]
--------------------------------------------------
import static org.elasticsearch.common.xcontent.XContentFactory.*;
XContentBuilder builder = jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elastic Search")
.endObject()
--------------------------------------------------
Note that you can also add arrays with `startArray(String)` and
`endArray()` methods. By the way, `field` method +
accept many object types. You can pass directly numbers, dates and even
other XContentBuilder objects.
If you need to see the generated JSON content, you can use the
@string()@method.
[source,java]
--------------------------------------------------
String json = builder.string();
--------------------------------------------------
[float]
=== Index document
The following example indexes a JSON document into an index called
twitter, under a type called tweet, with id valued 1:
[source,java]
--------------------------------------------------
import static org.elasticsearch.common.xcontent.XContentFactory.*;
IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elastic Search")
.endObject()
)
.execute()
.actionGet();
--------------------------------------------------
Note that you can also index your documents as JSON String and that you
don't have to give an ID:
[source,java]
--------------------------------------------------
String json = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2013-01-30\"," +
"\"message\":\"trying out Elastic Search\"," +
"}";
IndexResponse response = client.prepareIndex("twitter", "tweet")
.setSource(json)
.execute()
.actionGet();
--------------------------------------------------
`IndexResponse` object will give you report:
[source,java]
--------------------------------------------------
// Index name
String _index = response.index();
// Type name
String _type = response.type();
// Document ID (generated or not)
String _id = response.id();
// Version (if it's the first time you index this document, you will get: 1)
long _version = response.version();
--------------------------------------------------
If you use percolation while indexing, `IndexResponse` object will give
you percolator that have matched:
[source,java]
--------------------------------------------------
IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
.setSource(json)
.setPercolate("*")
.execute()
.actionGet();
List<String> matches = response.matches();
--------------------------------------------------
For more information on the index operation, check out the REST
link:{ref}/docs-index_.html[index] docs.
[float]
=== Operation Threading
The index API allows to set the threading model the operation will be
performed when the actual execution of the API is performed on the same
node (the API is executed on a shard that is allocated on the same
server).
The options are to execute the operation on a different thread, or to
execute it on the calling thread (note that the API is still async). By
default, `operationThreaded` is set to `true` which means the operation
is executed on a different thread.

View File

@ -0,0 +1,48 @@
[[percolate]]
== Percolate API
The percolator allows to register queries against an index, and then
send `percolate` requests which include a doc, and getting back the
queries that match on that doc out of the set of registered queries.
Read the main {ref}/search-percolate.html[percolate]
documentation before reading this guide.
[source,java]
--------------------------------------------------
//This is the query we're registering in the percolator
QueryBuilder qb = termQuery("content", "amazing");
//Index the query = register it in the percolator
client.prepareIndex("_percolator", "myIndexName", "myDesignatedQueryName")
.setSource(jsonBuilder()
.startObject()
.field("query", qb) // Register the query
.endObject())
.setRefresh(true) // Needed when the query shall be available immediately
.execute().actionGet();
--------------------------------------------------
This indexes the above term query under the name
*myDesignatedQueryName*.
In order to check a document against the registered queries, use this
code:
[source,java]
--------------------------------------------------
//Build a document to check against the percolator
XContentBuilder docBuilder = XContentFactory.jsonBuilder().startObject();
docBuilder.field("doc").startObject(); //This is needed to designate the document
docBuilder.field("content", "This is amazing!");
docBuilder.endObject(); //End of the doc field
docBuilder.endObject(); //End of the JSON root object
//Percolate
PercolateResponse response =
client.preparePercolate("myIndexName", "myDocumentType").setSource(docBuilder).execute().actionGet();
//Iterate over the results
for(String result : response) {
//Handle the result which is the name of
//the query in the percolator
}
--------------------------------------------------

View File

@ -0,0 +1,459 @@
[[query-dsl-filters]]
== Query DSL - Filters
elasticsearch provides a full Java query dsl in a similar manner to the
REST link:{ref}/query-dsl.html[Query DSL]. The factory for filter
builders is `FilterBuilders`.
Once your query is ready, you can use the <<search,Search API>>.
See also how to build <<query-dsl-queries,Queries>>.
To use `FilterBuilders` just import them in your class:
[source,java]
--------------------------------------------------
import org.elasticsearch.index.query.FilterBuilders.*;
--------------------------------------------------
Note that you can easily print (aka debug) JSON generated queries using
`toString()` method on `FilterBuilder` object.
[float]
=== And Filter
See link:{ref}/query-dsl-and-filter.html[And Filter]
[source,java]
--------------------------------------------------
FilterBuilders.andFilter(
FilterBuilders.rangeFilter("postDate").from("2010-03-01").to("2010-04-01"),
FilterBuilders.prefixFilter("name.second", "ba")
);
--------------------------------------------------
Note that you can cache the result using
`AndFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Bool Filter
See link:{ref}/query-dsl-bool-filter.html[Bool Filter]
[source,java]
--------------------------------------------------
FilterBuilders.boolFilter()
.must(FilterBuilders.termFilter("tag", "wow"))
.mustNot(FilterBuilders.rangeFilter("age").from("10").to("20"))
.should(FilterBuilders.termFilter("tag", "sometag"))
.should(FilterBuilders.termFilter("tag", "sometagtag"));
--------------------------------------------------
Note that you can cache the result using
`BoolFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Exists Filter
See link:{ref}/query-dsl-exists-filter.html[Exists Filter].
[source,java]
--------------------------------------------------
FilterBuilders.existsFilter("user");
--------------------------------------------------
[float]
=== Ids Filter
See link:{ref}/query-dsl-ids-filter.html[IDs Filter]
[source,java]
--------------------------------------------------
FilterBuilders.idsFilter("my_type", "type2").addIds("1", "4", "100");
// Type is optional
FilterBuilders.idsFilter().addIds("1", "4", "100");
--------------------------------------------------
[float]
=== Limit Filter
See link:{ref}/query-dsl-limit-filter.html[Limit Filter]
[source,java]
--------------------------------------------------
FilterBuilders.limitFilter(100);
--------------------------------------------------
[float]
=== Type Filter
See link:{ref}/query-dsl-type-filter.html[Type Filter]
[source,java]
--------------------------------------------------
FilterBuilders.typeFilter("my_type");
--------------------------------------------------
[float]
=== Geo Bounding Box Filter
See link:{ref}/query-dsl-geo-bounding-box-filter.html[Geo
Bounding Box Filter]
[source,java]
--------------------------------------------------
FilterBuilders.geoBoundingBoxFilter("pin.location")
.topLeft(40.73, -74.1)
.bottomRight(40.717, -73.99);
--------------------------------------------------
Note that you can cache the result using
`GeoBoundingBoxFilterBuilder#cache(boolean)` method. See
<<query-dsl-filters-caching>>.
[float]
=== GeoDistance Filter
See link:{ref}/query-dsl-geo-distance-filter.html[Geo
Distance Filter]
[source,java]
--------------------------------------------------
FilterBuilders.geoDistanceFilter("pin.location")
.point(40, -70)
.distance(200, DistanceUnit.KILOMETERS)
.optimizeBbox("memory") // Can be also "indexed" or "none"
.geoDistance(GeoDistance.ARC); // Or GeoDistance.PLANE
--------------------------------------------------
Note that you can cache the result using
`GeoDistanceFilterBuilder#cache(boolean)` method. See
<<query-dsl-filters-caching>>.
[float]
=== Geo Distance Range Filter
See link:{ref}/query-dsl-geo-distance-range-filter.html[Geo
Distance Range Filter]
[source,java]
--------------------------------------------------
FilterBuilders.geoDistanceRangeFilter("pin.location")
.point(40, -70)
.from("200km")
.to("400km")
.includeLower(true)
.includeUpper(false)
.optimizeBbox("memory") // Can be also "indexed" or "none"
.geoDistance(GeoDistance.ARC); // Or GeoDistance.PLANE
--------------------------------------------------
Note that you can cache the result using
`GeoDistanceRangeFilterBuilder#cache(boolean)` method. See
<<query-dsl-filters-caching>>.
[float]
=== Geo Polygon Filter
See link:{ref}/query-dsl-geo-polygon-filter.html[Geo Polygon
Filter]
[source,java]
--------------------------------------------------
FilterBuilders.geoPolygonFilter("pin.location")
.addPoint(40, -70)
.addPoint(30, -80)
.addPoint(20, -90);
--------------------------------------------------
Note that you can cache the result using
`GeoPolygonFilterBuilder#cache(boolean)` method. See
<<query-dsl-filters-caching>>.
[float]
=== Geo Shape Filter
See link:{ref}/query-dsl-geo-shape-filter.html[Geo Shape
Filter]
Note: the `geo_shape` type uses `Spatial4J` and `JTS`, both of which are
optional dependencies. Consequently you must add `Spatial4J` and `JTS`
to your classpath in order to use this type:
[source,xml]
-----------------------------------------------
<dependency>
<groupId>com.spatial4j</groupId>
<artifactId>spatial4j</artifactId>
<version>0.3</version>
</dependency>
<dependency>
<groupId>com.vividsolutions</groupId>
<artifactId>jts</artifactId>
<version>1.12</version>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
</exclusions>
</dependency>
-----------------------------------------------
[source,java]
--------------------------------------------------
// Import Spatial4J shapes
import com.spatial4j.core.context.SpatialContext;
import com.spatial4j.core.shape.Shape;
import com.spatial4j.core.shape.impl.RectangleImpl;
// Also import ShapeRelation
import org.elasticsearch.common.geo.ShapeRelation;
--------------------------------------------------
[source,java]
--------------------------------------------------
// Shape within another
filter = FilterBuilders.geoShapeFilter("location",
new RectangleImpl(0,10,0,10,SpatialContext.GEO))
.relation(ShapeRelation.WITHIN);
// Intersect shapes
filter = FilterBuilders.geoShapeFilter("location",
new PointImpl(0, 0, SpatialContext.GEO))
.relation(ShapeRelation.INTERSECTS);
// Using pre-indexed shapes
filter = FilterBuilders.geoShapeFilter("location", "New Zealand", "countries")
.relation(ShapeRelation.DISJOINT);
--------------------------------------------------
[float]
=== Has Child / Has Parent Filters
See:
* link:{ref}/query-dsl-has-child-filter.html[Has Child Filter]
* link:{ref}/query-dsl-has-parent-filter.html[Has Parent Filter]
[source,java]
--------------------------------------------------
// Has Child
QFilterBuilders.hasChildFilter("blog_tag",
QueryBuilders.termQuery("tag", "something"));
// Has Parent
QFilterBuilders.hasParentFilter("blog",
QueryBuilders.termQuery("tag", "something"));
--------------------------------------------------
[float]
=== Match All Filter
See link:{ref}/query-dsl-match-all-filter.html[Match All Filter]
[source,java]
--------------------------------------------------
FilterBuilders.matchAllFilter();
--------------------------------------------------
[float]
=== Missing Filter
See link:{ref}/query-dsl-missing-filter.html[Missing Filter]
[source,java]
--------------------------------------------------
FilterBuilders.missingFilter("user")
.existence(true)
.nullValue(true);
--------------------------------------------------
[float]
=== Not Filter
See link:{ref}/query-dsl-not-filter.html[Not Filter]
[source,java]
--------------------------------------------------
FilterBuilders.notFilter(
FilterBuilders.rangeFilter("price").from("1").to("2"));
--------------------------------------------------
[float]
=== Numeric Range Filter
See link:{ref}/query-dsl-numeric-range-filter.html[Numeric
Range Filter]
[source,java]
--------------------------------------------------
FilterBuilders.numericRangeFilter("age")
.from(10)
.to(20)
.includeLower(true)
.includeUpper(false);
--------------------------------------------------
Note that you can cache the result using
`NumericRangeFilterBuilder#cache(boolean)` method. See
<<query-dsl-filters-caching>>.
[float]
=== Or Filter
See link:{ref}/query-dsl-or-filter.html[Or Filter]
[source,java]
--------------------------------------------------
FilterBuilders.orFilter(
FilterBuilders.termFilter("name.second", "banon"),
FilterBuilders.termFilter("name.nick", "kimchy")
);
--------------------------------------------------
Note that you can cache the result using
`OrFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Prefix Filter
See link:{ref}/query-dsl-prefix-filter.html[Prefix Filter]
[source,java]
--------------------------------------------------
FilterBuilders.prefixFilter("user", "ki");
--------------------------------------------------
Note that you can cache the result using
`PrefixFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Query Filter
See link:{ref}/query-dsl-query-filter.html[Query Filter]
[source,java]
--------------------------------------------------
FilterBuilders.queryFilter(
QueryBuilders.queryString("this AND that OR thus")
);
--------------------------------------------------
Note that you can cache the result using
`QueryFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Range Filter
See link:{ref}/query-dsl-range-filter.html[Range Filter]
[source,java]
--------------------------------------------------
FilterBuilders.rangeFilter("age")
.from("10")
.to("20")
.includeLower(true)
.includeUpper(false);
// A simplified form using gte, gt, lt or lte
FilterBuilders.rangeFilter("age")
.gte("10")
.lt("20");
--------------------------------------------------
Note that you can ask not to cache the result using
`RangeFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Script Filter
See link:{ref}/query-dsl-script-filter.html[Script Filter]
[source,java]
--------------------------------------------------
FilterBuilder filter = FilterBuilders.scriptFilter(
"doc['age'].value > param1"
).addParam("param1", 10);
--------------------------------------------------
Note that you can cache the result using
`ScriptFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Term Filter
See link:{ref}/query-dsl-term-filter.html[Term Filter]
[source,java]
--------------------------------------------------
FilterBuilders.termFilter("user", "kimchy");
--------------------------------------------------
Note that you can ask not to cache the result using
`TermFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Terms Filter
See link:{ref}/query-dsl-terms-filter.html[Terms Filter]
[source,java]
--------------------------------------------------
FilterBuilders.termsFilter("user", "kimchy", "elasticsearch")
.execution("plain"); // Optional, can be also "bool", "and" or "or"
// or "bool_nocache", "and_nocache" or "or_nocache"
--------------------------------------------------
Note that you can ask not to cache the result using
`TermsFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[float]
=== Nested Filter
See link:{ref}/query-dsl-nested-filter.html[Nested Filter]
[source,java]
--------------------------------------------------
FilterBuilders.nestedFilter("obj1",
QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("obj1.name", "blue"))
.must(QueryBuilders.rangeQuery("obj1.count").gt(5))
);
--------------------------------------------------
Note that you can ask not to cache the result using
`NestedFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
[[query-dsl-filters-caching]]
[float]
=== Caching
By default, some filters are cached or not cached. You can have a fine
tuning control using `cache(boolean)` method when exists. For example:
[source,java]
--------------------------------------------------
FilterBuilder filter = FilterBuilders.andFilter(
FilterBuilders.rangeFilter("postDate").from("2010-03-01").to("2010-04-01"),
FilterBuilders.prefixFilter("name.second", "ba")
)
.cache(true);
--------------------------------------------------

View File

@ -0,0 +1,489 @@
[[query-dsl-queries]]
== Query DSL - Queries
elasticsearch provides a full Java query dsl in a similar manner to the
REST link:{ref}/query-dsl.html[Query DSL]. The factory for query
builders is `QueryBuilders`. Once your query is ready, you can use the
<<search,Search API>>.
See also how to build <<query-dsl-filters,Filters>>
To use `QueryBuilders` just import them in your class:
[source,java]
--------------------------------------------------
import org.elasticsearch.index.query.QueryBuilders.*;
--------------------------------------------------
Note that you can easily print (aka debug) JSON generated queries using
`toString()` method on `QueryBuilder` object.
The `QueryBuilder` can then be used with any API that accepts a query,
such as `count` and `search`.
[float]
=== Match Query
See link:{ref}/query-dsl-match-query.html[Match Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders.matchQuery("name", "kimchy elasticsearch");
--------------------------------------------------
[float]
=== MultiMatch Query
See link:{ref}/query-dsl-multi-match-query.html[MultiMatch
Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders.multiMatchQuery(
"kimchy elasticsearch", // Text you are looking for
"user", "message" // Fields you query on
);
--------------------------------------------------
[float]
=== Boolean Query
See link:{ref}/query-dsl-bool-query.html[Boolean Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders
.boolQuery()
.must(termQuery("content", "test1"))
.must(termQuery("content", "test4"))
.mustNot(termQuery("content", "test2"))
.should(termQuery("content", "test3"));
--------------------------------------------------
[float]
=== Boosting Query
See link:{ref}/query-dsl-boosting-query.html[Boosting Query]
[source,java]
--------------------------------------------------
QueryBuilders.boostingQuery()
.positive(QueryBuilders.termQuery("name","kimchy"))
.negative(QueryBuilders.termQuery("name","dadoonet"))
.negativeBoost(0.2f);
--------------------------------------------------
[float]
=== IDs Query
See link:{ref}/query-dsl-ids-query.html[IDs Query]
[source,java]
--------------------------------------------------
QueryBuilders.idsQuery().ids("1", "2");
--------------------------------------------------
[float]
=== Custom Score Query
See link:{ref}/query-dsl-custom-score-query.html[Custom Score
Query]
[source,java]
--------------------------------------------------
QueryBuilders.customScoreQuery(QueryBuilders.matchAllQuery()) // Your query here
.script("_score * doc['price'].value"); // Your script here
// If the script have parameters, use the same script and provide parameters to it.
QueryBuilders.customScoreQuery(QueryBuilders.matchAllQuery())
.script("_score * doc['price'].value / pow(param1, param2)")
.param("param1", 2)
.param("param2", 3.1);
--------------------------------------------------
[float]
=== Custom Boost Factor Query
See
link:{ref}/query-dsl-custom-boost-factor-query.html[Custom
Boost Factor Query]
[source,java]
--------------------------------------------------
QueryBuilders.customBoostFactorQuery(QueryBuilders.matchAllQuery()) // Your query
.boostFactor(3.1f);
--------------------------------------------------
[float]
=== Constant Score Query
See link:{ref}/query-dsl-constant-score-query.html[Constant
Score Query]
[source,java]
--------------------------------------------------
// Using with Filters
QueryBuilders.constantScoreQuery(FilterBuilders.termFilter("name","kimchy"))
.boost(2.0f);
// With Queries
QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("name","kimchy"))
.boost(2.0f);
--------------------------------------------------
[float]
=== Disjunction Max Query
See link:{ref}/query-dsl-dis-max-query.html[Disjunction Max
Query]
[source,java]
--------------------------------------------------
QueryBuilders.disMaxQuery()
.add(QueryBuilders.termQuery("name","kimchy")) // Your queries
.add(QueryBuilders.termQuery("name","elasticsearch")) // Your queries
.boost(1.2f)
.tieBreaker(0.7f);
--------------------------------------------------
[float]
=== Field Query
See link:{ref}/query-dsl-field-query.html[Field Query]
[source,java]
--------------------------------------------------
QueryBuilders.fieldQuery("name", "+kimchy -dadoonet");
// Note that you can write the same query using queryString query.
QueryBuilders.queryString("+kimchy -dadoonet").field("name");
--------------------------------------------------
[float]
=== Fuzzy Like This (Field) Query (flt and flt_field)
See:
* link:{ref}/query-dsl-flt-query.html[Fuzzy Like This Query]
* link:{ref}/query-dsl-flt-field-query.html[Fuzzy Like This Field Query]
[source,java]
--------------------------------------------------
// flt Query
QueryBuilders.fuzzyLikeThisQuery("name.first", "name.last") // Fields
.likeText("text like this one") // Text
.maxQueryTerms(12); // Max num of Terms
// in generated queries
// flt_field Query
QueryBuilders.fuzzyLikeThisFieldQuery("name.first") // Only on single field
.likeText("text like this one")
.maxQueryTerms(12);
--------------------------------------------------
[float]
=== FuzzyQuery
See link:{ref}/query-dsl-fuzzy-query.html[Fuzzy Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders.fuzzyQuery("name", "kimzhy");
--------------------------------------------------
[float]
=== Has Child / Has Parent
See:
* link:{ref}/query-dsl-has-child-query.html[Has Child Query]
* link:{ref}/query-dsl-has-parent-query.html[Has Parent]
[source,java]
--------------------------------------------------
// Has Child
QueryBuilders.hasChildQuery("blog_tag",
QueryBuilders.termQuery("tag","something"))
// Has Parent
QueryBuilders.hasParentQuery("blog",
QueryBuilders.termQuery("tag","something"));
--------------------------------------------------
[float]
=== MatchAll Query
See link:{ref}/query-dsl-match-all-query.html[Match All
Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders.matchAllQuery();
--------------------------------------------------
[float]
=== Fuzzy Like This (Field) Query (flt and flt_field)
See:
* link:{ref}/query-dsl-mlt-query.html[More Like This Query]
* link:{ref}/query-dsl-mlt-field-query.html[More Like This Field Query]
[source,java]
--------------------------------------------------
// mlt Query
QueryBuilders.moreLikeThisQuery("name.first", "name.last") // Fields
.likeText("text like this one") // Text
.minTermFreq(1) // Ignore Threshold
.maxQueryTerms(12); // Max num of Terms
// in generated queries
// mlt_field Query
QueryBuilders.moreLikeThisFieldQuery("name.first") // Only on single field
.likeText("text like this one")
.minTermFreq(1)
.maxQueryTerms(12);
--------------------------------------------------
[float]
=== Prefix Query
See link:{ref}/query-dsl-prefix-query.html[Prefix Query]
[source,java]
--------------------------------------------------
QueryBuilders.prefixQuery("brand", "heine");
--------------------------------------------------
[float]
=== QueryString Query
See link:{ref}/query-dsl-query-string-query.html[QueryString Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders.queryString("+kimchy -elasticsearch");
--------------------------------------------------
[float]
=== Range Query
See link:{ref}/query-dsl-range-query.html[Range Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders
.rangeQuery("price")
.from(5)
.to(10)
.includeLower(true)
.includeUpper(false);
--------------------------------------------------
[float]
=== Span Queries (first, near, not, or, term)
See:
* link:{ref}/query-dsl-span-first-query.html[Span First Query]
* link:{ref}/query-dsl-span-near-query.html[Span Near Query]
* link:{ref}/query-dsl-span-not-query.html[Span Not Query]
* link:{ref}/query-dsl-span-or-query.html[Span Or Query]
* link:{ref}/query-dsl-span-term-query.html[Span Term Query]
[source,java]
--------------------------------------------------
// Span First
QueryBuilders.spanFirstQuery(
QueryBuilders.spanTermQuery("user", "kimchy"), // Query
3 // Max End position
);
// Span Near
QueryBuilders.spanNearQuery()
.clause(QueryBuilders.spanTermQuery("field","value1")) // Span Term Queries
.clause(QueryBuilders.spanTermQuery("field","value2"))
.clause(QueryBuilders.spanTermQuery("field","value3"))
.slop(12) // Slop factor
.inOrder(false)
.collectPayloads(false);
// Span Not
QueryBuilders.spanNotQuery()
.include(QueryBuilders.spanTermQuery("field","value1"))
.exclude(QueryBuilders.spanTermQuery("field","value2"));
// Span Or
QueryBuilders.spanOrQuery()
.clause(QueryBuilders.spanTermQuery("field","value1"))
.clause(QueryBuilders.spanTermQuery("field","value2"))
.clause(QueryBuilders.spanTermQuery("field","value3"));
// Span Term
QueryBuilders.spanTermQuery("user","kimchy");
--------------------------------------------------
[float]
=== Term Query
See link:{ref}/query-dsl-term-query.html[Term Query]
[source,java]
--------------------------------------------------
QueryBuilder qb = QueryBuilders.termQuery("name", "kimchy");
--------------------------------------------------
[float]
=== Terms Query
See link:{ref}/query-dsl-terms-query.html[Terms Query]
[source,java]
--------------------------------------------------
QueryBuilders.termsQuery("tags", // field
"blue", "pill") // values
.minimumMatch(1); // How many terms must match
--------------------------------------------------
[float]
=== Top Children Query
See link:{ref}/query-dsl-top-children-query.html[Top Children Query]
[source,java]
--------------------------------------------------
QueryBuilders.topChildrenQuery(
"blog_tag", // field
QueryBuilders.termQuery("tag", "something") // Query
)
.score("max") // max, sum or avg
.factor(5)
.incrementalFactor(2);
--------------------------------------------------
[float]
=== Wildcard Query
See link:{ref}/query-dsl-wildcard-query.html[Wildcard Query]
[source,java]
--------------------------------------------------
QueryBuilders.wildcardQuery("user", "k?mc*");
--------------------------------------------------
[float]
=== Nested Query
See link:{ref}/query-dsl-nested-query.html[Nested Query]
[source,java]
--------------------------------------------------
QueryBuilders.nestedQuery("obj1", // Path
QueryBuilders.boolQuery() // Your query
.must(QueryBuilders.matchQuery("obj1.name", "blue"))
.must(QueryBuilders.rangeQuery("obj1.count").gt(5))
)
.scoreMode("avg"); // max, total, avg or none
--------------------------------------------------
[float]
=== Custom Filters Score Query
See
link:{ref}/query-dsl-custom-filters-score-query.html[Custom Filters Score Query]
[source,java]
--------------------------------------------------
QueryBuilders.customFiltersScoreQuery(
QueryBuilders.matchAllQuery()) // Query
// Filters with their boost factors
.add(FilterBuilders.rangeFilter("age").from(0).to(10), 3)
.add(FilterBuilders.rangeFilter("age").from(10).to(20), 2)
.scoreMode("first"); // first, min, max, total, avg or multiply
--------------------------------------------------
[float]
=== Indices Query
See link:{ref}/query-dsl-indices-query.html[Indices Query]
[source,java]
--------------------------------------------------
// Using another query when no match for the main one
QueryBuilders.indicesQuery(
QueryBuilders.termQuery("tag", "wow"),
"index1", "index2"
)
.noMatchQuery(QueryBuilders.termQuery("tag", "kow"));
// Using all (match all) or none (match no documents)
QueryBuilders.indicesQuery(
QueryBuilders.termQuery("tag", "wow"),
"index1", "index2"
)
.noMatchQuery("all"); // all or none
--------------------------------------------------
[float]
=== GeoShape Query
See link:{ref}/query-dsl-geo-shape-query.html[GeoShape Query]
Note: the `geo_shape` type uses `Spatial4J` and `JTS`, both of which are
optional dependencies. Consequently you must add `Spatial4J` and `JTS`
to your classpath in order to use this type:
[source,java]
--------------------------------------------------
<dependency>
<groupId>com.spatial4j</groupId>
<artifactId>spatial4j</artifactId>
<version>0.3</version>
</dependency>
<dependency>
<groupId>com.vividsolutions</groupId>
<artifactId>jts</artifactId>
<version>1.12</version>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
</exclusions>
</dependency>
--------------------------------------------------
[source,java]
--------------------------------------------------
// Import Spatial4J shapes
import com.spatial4j.core.context.SpatialContext;
import com.spatial4j.core.shape.Shape;
import com.spatial4j.core.shape.impl.RectangleImpl;
// Also import ShapeRelation
import org.elasticsearch.common.geo.ShapeRelation;
--------------------------------------------------
[source,java]
--------------------------------------------------
// Shape within another
QueryBuilders.geoShapeQuery("location",
new RectangleImpl(0,10,0,10,SpatialContext.GEO))
.relation(ShapeRelation.WITHIN);
// Intersect shapes
QueryBuilders.geoShapeQuery("location",
new PointImpl(0, 0, SpatialContext.GEO))
.relation(ShapeRelation.INTERSECTS);
// Using pre-indexed shapes
QueryBuilders.geoShapeQuery("location", "New Zealand", "countries")
.relation(ShapeRelation.DISJOINT);
--------------------------------------------------

View File

@ -0,0 +1,137 @@
[[search]]
== Search API
The search API allows to execute a search query and get back search hits
that match the query. It can be executed across one or more indices and
across one or more types. The query can either be provided using the
<<query-dsl-queries,query Java API>> or
the <<query-dsl-filters,filter Java API>>.
The body of the search request is built using the
`SearchSourceBuilder`. Here is an example:
[source,java]
--------------------------------------------------
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.FilterBuilders.*;
import org.elasticsearch.index.query.QueryBuilders.*;
--------------------------------------------------
[source,java]
--------------------------------------------------
SearchResponse response = client.prepareSearch("index1", "index2")
.setTypes("type1", "type2")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("multi", "test")) // Query
.setFilter(FilterBuilders.rangeFilter("age").from(12).to(18)) // Filter
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();
--------------------------------------------------
Note that all parameters are optional. Here is the smallest search call
you can write:
[source,java]
--------------------------------------------------
// MatchAll on the whole cluster with all default options
SearchResponse response = client.prepareSearch().execute().actionGet();
--------------------------------------------------
For more information on the search operation, check out the REST
link:{ref}/search.html[search] docs.
[float]
=== Using scrolls in Java
Read the link:{ref}/search-request-scroll.html[scroll documentation]
first!
[source,java]
--------------------------------------------------
import static org.elasticsearch.index.query.FilterBuilders.*;
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb = termQuery("multi", "test");
SearchResponse scrollResp = client.prepareSearch(test)
.setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
for (SearchHit hit : scrollResp.getHits()) {
//Handle the hit...
}
//Break condition: No hits are returned
if (scrollResp.hits().hits().length == 0) {
break;
}
}
--------------------------------------------------
[float]
=== Operation Threading
The search API allows to set the threading model the operation will be
performed when the actual execution of the API is performed on the same
node (the API is executed on a shard that is allocated on the same
server).
There are three threading modes.The `NO_THREADS` mode means that the
search operation will be executed on the calling thread. The
`SINGLE_THREAD` mode means that the search operation will be executed on
a single different thread for all local shards. The `THREAD_PER_SHARD`
mode means that the search operation will be executed on a different
thread for each local shard.
The default mode is `SINGLE_THREAD`.
[float]
=== MultiSearch API
See link:{ref}/search-multi-search.html[MultiSearch API Query]
documentation
[source,java]
--------------------------------------------------
SearchRequestBuilder srb1 = node.client()
.prepareSearch().setQuery(QueryBuilders.queryString("elasticsearch")).setSize(1);
SearchRequestBuilder srb2 = node.client()
.prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);
MultiSearchResponse sr = node.client().prepareMultiSearch()
.add(srb1)
.add(srb2)
.execute().actionGet();
// You will get all individual responses from MultiSearchResponse#responses()
long nbHits = 0;
for (MultiSearchResponse.Item item : sr.responses()) {
SearchResponse response = item.response();
nbHits += response.hits().totalHits();
}
--------------------------------------------------
[float]
=== Using Facets
The following code shows how to add two facets within your search:
[source,java]
--------------------------------------------------
SearchResponse sr = node.client().prepareSearch()
.setQuery(QueryBuilders.matchAllQuery())
.addFacet(FacetBuilders.termsFacet("f1").field("field"))
.addFacet(FacetBuilders.dateHistogramFacet("f2").field("birth").interval("year"))
.execute().actionGet();
// Get your facet results
TermsFacet f1 = (TermsFacet) sr.facets().facetsAsMap().get("f1");
DateHistogramFacet f2 = (DateHistogramFacet) sr.facets().facetsAsMap().get("f2");
--------------------------------------------------
See <<facets,Facets Java API>>
documentation for details.

View File

@ -0,0 +1,76 @@
[[analysis]]
= Analysis
[partintro]
--
The index analysis module acts as a configurable registry of Analyzers
that can be used in order to both break indexed (analyzed) fields when a
document is indexed and process query strings. It maps to the Lucene
`Analyzer`.
Analyzers are composed of a single <<analysis-tokenizers,Tokenizer>>
and zero or more <<analysis-tokenfilters,TokenFilters>>. The tokenizer may
be preceded by one or more <<analysis-charfilters,CharFilters>>. The
analysis module allows one to register `TokenFilters`, `Tokenizers` and
`Analyzers` under logical names that can then be referenced either in
mapping definitions or in certain APIs. The Analysis module
automatically registers (*if not explicitly defined*) built in
analyzers, token filters, and tokenizers.
Here is a sample configuration:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
standard :
type : standard
stopwords : [stop1, stop2]
myAnalyzer1 :
type : standard
stopwords : [stop1, stop2, stop3]
max_token_length : 500
# configure a custom analyzer which is
# exactly like the default standard analyzer
myAnalyzer2 :
tokenizer : standard
filter : [standard, lowercase, stop]
tokenizer :
myTokenizer1 :
type : standard
max_token_length : 900
myTokenizer2 :
type : keyword
buffer_size : 512
filter :
myTokenFilter1 :
type : stop
stopwords : [stop1, stop2, stop3, stop4]
myTokenFilter2 :
type : length
min : 0
max : 2000
--------------------------------------------------
[float]
=== Backwards compatibility
All analyzers, tokenizers, and token filters can be configured with a
`version` parameter to control which Lucene version behavior they should
use. Possible values are: `3.0` - `3.6`, `4.0` - `4.3` (the highest
version number is the default option).
--
include::analysis/analyzers.asciidoc[]
include::analysis/tokenizers.asciidoc[]
include::analysis/tokenfilters.asciidoc[]
include::analysis/charfilters.asciidoc[]
include::analysis/icu-plugin.asciidoc[]

View File

@ -0,0 +1,69 @@
[[analysis-analyzers]]
== Analyzers
Analyzers are composed of a single <<analysis-tokenizers,Tokenizer>>
and zero or more <<analysis-tokenfilters,TokenFilters>>. The tokenizer may
be preceded by one or more <<analysis-charfilters,CharFilters>>.
The analysis module allows you to register `Analyzers` under logical
names which can then be referenced either in mapping definitions or in
certain APIs.
Elasticsearch comes with a number of prebuilt analyzers which are
ready to use. Alternatively, you can combine the built in
character filters, tokenizers and token filters to create
<<analysis-custom-analyzer,custom analyzers>>.
[float]
=== Default Analyzers
An analyzer is registered under a logical name. It can then be
referenced from mapping definitions or certain APIs. When none are
defined, defaults are used. There is an option to define which analyzers
will be used by default when none can be derived.
The `default` logical name allows one to configure an analyzer that will
be used both for indexing and for searching APIs. The `default_index`
logical name can be used to configure a default analyzer that will be
used just when indexing, and the `default_search` can be used to
configure a default analyzer that will be used just when searching.
[float]
=== Aliasing Analyzers
Analyzers can be aliased to have several registered lookup names
associated with them. For example, the following will allow
the `standard` analyzer to also be referenced with `alias1`
and `alias2` values.
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
standard :
alias: [alias1, alias2]
type : standard
stopwords : [test1, test2, test3]
--------------------------------------------------
Below is a list of the built in analyzers.
include::analyzers/standard-analyzer.asciidoc[]
include::analyzers/simple-analyzer.asciidoc[]
include::analyzers/whitespace-analyzer.asciidoc[]
include::analyzers/stop-analyzer.asciidoc[]
include::analyzers/keyword-analyzer.asciidoc[]
include::analyzers/pattern-analyzer.asciidoc[]
include::analyzers/lang-analyzer.asciidoc[]
include::analyzers/snowball-analyzer.asciidoc[]
include::analyzers/custom-analyzer.asciidoc[]

View File

@ -0,0 +1,52 @@
[[analysis-custom-analyzer]]
=== Custom Analyzer
An analyzer of type `custom` that allows to combine a `Tokenizer` with
zero or more `Token Filters`, and zero or more `Char Filters`. The
custom analyzer accepts a logical/registered name of the tokenizer to
use, and a list of logical/registered names of token filters.
The following are settings that can be set for a `custom` analyzer type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`tokenizer` |The logical / registered name of the tokenizer to use.
|`filter` |An optional list of logical / registered name of token
filters.
|`char_filter` |An optional list of logical / registered name of char
filters.
|=======================================================================
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : myTokenizer1
filter : [myTokenFilter1, myTokenFilter2]
char_filter : [my_html]
tokenizer :
myTokenizer1 :
type : standard
max_token_length : 900
filter :
myTokenFilter1 :
type : stop
stopwords : [stop1, stop2, stop3, stop4]
myTokenFilter2 :
type : length
min : 0
max : 2000
char_filter :
my_html :
type : html_strip
escaped_tags : [xxx, yyy]
read_ahead : 1024
--------------------------------------------------

View File

@ -0,0 +1,7 @@
[[analysis-keyword-analyzer]]
=== Keyword Analyzer
An analyzer of type `keyword` that "tokenizes" an entire stream as a
single token. This is useful for data like zip codes, ids and so on.
Note, when using mapping definitions, it might make more sense to simply
mark the field as `not_analyzed`.

View File

@ -0,0 +1,20 @@
[[analysis-lang-analyzer]]
=== Language Analyzers
A set of analyzers aimed at analyzing specific language text. The
following types are supported: `arabic`, `armenian`, `basque`,
`brazilian`, `bulgarian`, `catalan`, `chinese`, `cjk`, `czech`,
`danish`, `dutch`, `english`, `finnish`, `french`, `galician`, `german`,
`greek`, `hindi`, `hungarian`, `indonesian`, `italian`, `norwegian`,
`persian`, `portuguese`, `romanian`, `russian`, `spanish`, `swedish`,
`turkish`, `thai`.
All analyzers support setting custom `stopwords` either internally in
the config, or by using an external stopwords file by setting
`stopwords_path`.
The following analyzers support setting custom `stem_exclusion` list:
`arabic`, `armenian`, `basque`, `brazilian`, `bulgarian`, `catalan`,
`czech`, `danish`, `dutch`, `english`, `finnish`, `french`, `galician`,
`german`, `hindi`, `hungarian`, `indonesian`, `italian`, `norwegian`,
`portuguese`, `romanian`, `russian`, `spanish`, `swedish`, `turkish`.

View File

@ -0,0 +1,126 @@
[[analysis-pattern-analyzer]]
=== Pattern Analyzer
An analyzer of type `pattern` that can flexibly separate text into terms
via a regular expression. Accepts the following settings:
The following are settings that can be set for a `pattern` analyzer
type:
[cols="<,<",options="header",]
|===================================================================
|Setting |Description
|`lowercase` |Should terms be lowercased or not. Defaults to `true`.
|`pattern` |The regular expression pattern, defaults to `\W+`.
|`flags` |The regular expression flags.
|===================================================================
*IMPORTANT*: The regular expression should match the *token separators*,
not the tokens themselves.
Flags should be pipe-separated, eg `"CASE_INSENSITIVE|COMMENTS"`. Check
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#field_summary[Java
Pattern API] for more details about `flags` options.
[float]
==== Pattern Analyzer Examples
In order to try out these examples, you should delete the `test` index
before running each example:
[source,js]
--------------------------------------------------
curl -XDELETE localhost:9200/test
--------------------------------------------------
[float]
===== Whitespace tokenizer
[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/test' -d '
{
"settings":{
"analysis": {
"analyzer": {
"whitespace":{
"type": "pattern",
"pattern":"\\\\s+"
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 'foo,bar baz'
# "foo,bar", "baz"
--------------------------------------------------
[float]
===== Non-word character tokenizer
[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/test' -d '
{
"settings":{
"analysis": {
"analyzer": {
"nonword":{
"type": "pattern",
"pattern":"[^\\\\w]+"
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'foo,bar baz'
# "foo,bar baz" becomes "foo", "bar", "baz"
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'type_1-type_4'
# "type_1","type_4"
--------------------------------------------------
[float]
===== CamelCase tokenizer
[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/test?pretty=1' -d '
{
"settings":{
"analysis": {
"analyzer": {
"camel":{
"type": "pattern",
"pattern":"([^\\\\p{L}\\\\d]+)|(?<=\\\\D)(?=\\\\d)|(?<=\\\\d)(?=\\\\D)|(?<=[\\\\p{L}&&[^\\\\p{Lu}]])(?=\\\\p{Lu})|(?<=\\\\p{Lu})(?=\\\\p{Lu}[\\\\p{L}&&[^\\\\p{Lu}]])"
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=camel' -d '
MooseX::FTPClass2_beta
'
# "moose","x","ftp","class","2","beta"
--------------------------------------------------
The regex above is easier to understand as:
[source,js]
--------------------------------------------------
([^\\p{L}\\d]+) # swallow non letters and numbers,
| (?<=\\D)(?=\\d) # or non-number followed by number,
| (?<=\\d)(?=\\D) # or number followed by non-number,
| (?<=[ \\p{L} && [^\\p{Lu}]]) # or lower case
(?=\\p{Lu}) # followed by upper case,
| (?<=\\p{Lu}) # or upper case
(?=\\p{Lu} # followed by upper case
[\\p{L}&&[^\\p{Lu}]] # then lower case
)
--------------------------------------------------

View File

@ -0,0 +1,6 @@
[[analysis-simple-analyzer]]
=== Simple Analyzer
An analyzer of type `simple` that is built using a
<<analysis-lowercase-tokenizer,Lower
Case Tokenizer>>.

View File

@ -0,0 +1,63 @@
[[analysis-snowball-analyzer]]
=== Snowball Analyzer
An analyzer of type `snowball` that uses the
<<analysis-standard-tokenizer,standard
tokenizer>>, with
<<analysis-standard-tokenfilter,standard
filter>>,
<<analysis-lowercase-tokenfilter,lowercase
filter>>,
<<analysis-stop-tokenfilter,stop
filter>>, and
<<analysis-snowball-tokenfilter,snowball
filter>>.
The Snowball Analyzer is a stemming analyzer from Lucene that is
originally based on the snowball project from
http://snowball.tartarus.org[snowball.tartarus.org].
Sample usage:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"type" : "snowball",
"language" : "English"
}
}
}
}
}
--------------------------------------------------
The `language` parameter can have the same values as the
<<analysis-snowball-tokenfilter,snowball
filter>> and defaults to `English`. Note that not all the language
analyzers have a default set of stopwords provided.
The `stopwords` parameter can be used to provide stopwords for the
languages that has no defaults, or to simply replace the default set
with your custom list. A default set of stopwords for many of these
languages is available from for instance
https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/[here]
and
https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball[here.]
A sample configuration (in YAML format) specifying Swedish with
stopwords:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
my_analyzer:
type: snowball
language: Swedish
stopwords: "och,det,att,i,en,jag,hon,som,han,på,den,med,var,sig,för,så,till,är,men,ett,om,hade,de,av,icke,mig,du,henne,då,sin,nu,har,inte,hans,honom,skulle,hennes,där,min,man,ej,vid,kunde,något,från,ut,när,efter,upp,vi,dem,vara,vad,över,än,dig,kan,sina,här,ha,mot,alla,under,någon,allt,mycket,sedan,ju,denna,själv,detta,åt,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,några,deras,blir,mina,samma,vilken,er,sådan,vår,blivit,dess,inom,mellan,sådant,varför,varje,vilka,ditt,vem,vilket,sitta,sådana,vart,dina,vars,vårt,våra,ert,era,vilkas"
--------------------------------------------------

View File

@ -0,0 +1,26 @@
[[analysis-standard-analyzer]]
=== Standard Analyzer
An analyzer of type `standard` that is built of using
<<analysis-standard-tokenizer,Standard
Tokenizer>>, with
<<analysis-standard-tokenfilter,Standard
Token Filter>>,
<<analysis-lowercase-tokenfilter,Lower
Case Token Filter>>, and
<<analysis-stop-tokenfilter,Stop
Token Filter>>.
The following are settings that can be set for a `standard` analyzer
type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`stopwords` |A list of stopword to initialize the stop filter with.
Defaults to the english stop words.
|`max_token_length` |The maximum token length. If a token is seen that
exceeds this length then it is discarded. Defaults to `255`.
|=======================================================================

View File

@ -0,0 +1,21 @@
[[analysis-stop-analyzer]]
=== Stop Analyzer
An analyzer of type `stop` that is built using a
<<analysis-lowercase-tokenizer,Lower
Case Tokenizer>>, with
<<analysis-stop-tokenfilter,Stop
Token Filter>>.
The following are settings that can be set for a `stop` analyzer type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`stopwords` |A list of stopword to initialize the stop filter with.
Defaults to the english stop words.
|`stopwords_path` |A path (either relative to `config` location, or
absolute) to a stopwords file configuration.
|=======================================================================

View File

@ -0,0 +1,6 @@
[[analysis-whitespace-analyzer]]
=== Whitespace Analyzer
An analyzer of type `whitespace` that is built using a
<<analysis-whitespace-tokenizer,Whitespace
Tokenizer>>.

View File

@ -0,0 +1,16 @@
[[analysis-charfilters]]
== Character Filters
Character filters are used to preprocess the string of
characters before it is passed to the <<analysis-tokenizers,tokenizer>>.
A character filter may be used to strip out HTML markup, , or to convert
`"&"` characters to the word `"and"`.
Elasticsearch has built in characters filters which can be
used to build <<analysis-custom-analyzer,custom analyzers>>.
include::charfilters/mapping-charfilter.asciidoc[]
include::charfilters/htmlstrip-charfilter.asciidoc[]
include::charfilters/pattern-replace-charfilter.asciidoc[]

View File

@ -0,0 +1,5 @@
[[analysis-htmlstrip-charfilter]]
=== HTML Strip Char Filter
A char filter of type `html_strip` stripping out HTML elements from an
analyzed text.

View File

@ -0,0 +1,38 @@
[[analysis-mapping-charfilter]]
=== Mapping Char Filter
A char filter of type `mapping` replacing characters of an analyzed text
with given mapping.
Here is a sample configuration:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["ph=>f", "qu=>q"]
}
},
"analyzer" : {
"custom_with_char_filter" : {
"tokenizer" : "standard",
"char_filter" : ["my_mapping"]
},
}
}
}
}
--------------------------------------------------
Otherwise the setting `mappings_path` can specify a file where you can
put the list of char mapping :
[source,js]
--------------------------------------------------
ph => f
qu => k
--------------------------------------------------

View File

@ -0,0 +1,37 @@
[[analysis-pattern-replace-charfilter]]
=== Pattern Replace Char Filter
The `pattern_replace` char filter allows the use of a regex to
manipulate the characters in a string before analysis. The regular
expression is defined using the `pattern` parameter, and the replacement
string can be provided using the `replacement` parameter (supporting
referencing the original text, as explained
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
For more information check the
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilter.html[lucene
documentation]
Here is a sample configuration:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"char_filter" : {
"my_pattern":{
"type":"pattern_replace",
"pattern":"sample(.*)",
"replacement":"replacedSample $1"
}
},
"analyzer" : {
"custom_with_char_filter" : {
"tokenizer" : "standard",
"char_filter" : ["my_pattern"]
},
}
}
}
}
--------------------------------------------------

View File

@ -0,0 +1,148 @@
[[analysis-icu-plugin]]
== ICU Analysis Plugin
The http://icu-project.org/[ICU] analysis plugin allows for unicode
normalization, collation and folding. The plugin is called
https://github.com/elasticsearch/elasticsearch-analysis-icu[elasticsearch-analysis-icu].
The plugin includes the following analysis components:
[float]
=== ICU Normalization
Normalizes characters as explained
http://userguide.icu-project.org/transforms/normalization[here]. It
registers itself by default under `icu_normalizer` or `icuNormalizer`
using the default settings. Allows for the name parameter to be provided
which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`.
Here is a sample settings:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"normalization" : {
"tokenizer" : "keyword",
"filter" : ["icu_normalizer"]
}
}
}
}
}
--------------------------------------------------
[float]
=== ICU Folding
Folding of unicode characters based on `UTR#30`. It registers itself
under `icu_folding` and `icuFolding` names.
The filter also does lowercasing, which means the lowercase filter can
normally be left out. Sample setting:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"folding" : {
"tokenizer" : "keyword",
"filter" : ["icu_folding"]
}
}
}
}
}
--------------------------------------------------
[float]
==== Filtering
The folding can be filtered by a set of unicode characters with the
parameter `unicodeSetFilter`. This is useful for a non-internationalized
search engine where retaining a set of national characters which are
primary letters in a specific language is wanted. See syntax for the
UnicodeSet
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html[here].
The Following example excempt Swedish characters from the folding. Note
that the filtered characters are NOT lowercased which is why we add that
filter below.
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"folding" : {
"tokenizer" : "standard",
"filter" : ["my_icu_folding", "lowercase"]
}
}
"filter" : {
"my_icu_folding" : {
"type" : "icu_folding"
"unicodeSetFilter" : "[^åäöÅÄÖ]"
}
}
}
}
}
--------------------------------------------------
[float]
=== ICU Collation
Uses collation token filter. Allows to either specify the rules for
collation (defined
http://www.icu-project.org/userguide/Collate_Customization.html[here])
using the `rules` parameter (can point to a location or expressed in the
settings, location can be relative to config location), or using the
`language` parameter (further specialized by country and variant). By
default registers under `icu_collation` or `icuCollation` and uses the
default locale.
Here is a sample settings:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"collation" : {
"tokenizer" : "keyword",
"filter" : ["icu_collation"]
}
}
}
}
}
--------------------------------------------------
And here is a sample of custom collation:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"collation" : {
"tokenizer" : "keyword",
"filter" : ["myCollator"]
}
},
"filter" : {
"myCollator" : {
"type" : "icu_collation",
"language" : "en"
}
}
}
}
}
--------------------------------------------------

View File

@ -0,0 +1,71 @@
[[analysis-tokenfilters]]
== Token Filters
Token filters accept a stream of tokens from a
<<analysis-tokenizers,tokenizer>> and can modify tokens
(eg lowercasing), delete tokens (eg remove stopwords)
or add tokens (eg synonyms).
Elasticsearch has a number of built in token filters which can be
used to build <<analysis-custom-analyzer,custom analyzers>>.
include::tokenfilters/standard-tokenfilter.asciidoc[]
include::tokenfilters/asciifolding-tokenfilter.asciidoc[]
include::tokenfilters/length-tokenfilter.asciidoc[]
include::tokenfilters/lowercase-tokenfilter.asciidoc[]
include::tokenfilters/ngram-tokenfilter.asciidoc[]
include::tokenfilters/edgengram-tokenfilter.asciidoc[]
include::tokenfilters/porterstem-tokenfilter.asciidoc[]
include::tokenfilters/shingle-tokenfilter.asciidoc[]
include::tokenfilters/stop-tokenfilter.asciidoc[]
include::tokenfilters/word-delimiter-tokenfilter.asciidoc[]
include::tokenfilters/stemmer-tokenfilter.asciidoc[]
include::tokenfilters/stemmer-override-tokenfilter.asciidoc[]
include::tokenfilters/keyword-marker-tokenfilter.asciidoc[]
include::tokenfilters/keyword-repeat-tokenfilter.asciidoc[]
include::tokenfilters/kstem-tokenfilter.asciidoc[]
include::tokenfilters/snowball-tokenfilter.asciidoc[]
include::tokenfilters/phonetic-tokenfilter.asciidoc[]
include::tokenfilters/synonym-tokenfilter.asciidoc[]
include::tokenfilters/compound-word-tokenfilter.asciidoc[]
include::tokenfilters/reverse-tokenfilter.asciidoc[]
include::tokenfilters/elision-tokenfilter.asciidoc[]
include::tokenfilters/truncate-tokenfilter.asciidoc[]
include::tokenfilters/unique-tokenfilter.asciidoc[]
include::tokenfilters/pattern-capture-tokenfilter.asciidoc[]
include::tokenfilters/pattern_replace-tokenfilter.asciidoc[]
include::tokenfilters/trim-tokenfilter.asciidoc[]
include::tokenfilters/limit-token-count-tokenfilter.asciidoc[]
include::tokenfilters/hunspell-tokenfilter.asciidoc[]
include::tokenfilters/common-grams-tokenfilter.asciidoc[]
include::tokenfilters/normalization-tokenfilter.asciidoc[]

View File

@ -0,0 +1,7 @@
[[analysis-asciifolding-tokenfilter]]
=== ASCII Folding Token Filter
A token filter of type `asciifolding` that converts alphabetic, numeric,
and symbolic Unicode characters which are not in the first 127 ASCII
characters (the "Basic Latin" Unicode block) into their ASCII
equivalents, if one exists.

View File

@ -0,0 +1,61 @@
[[analysis-common-grams-tokenfilter]]
=== Common Grams Token Filter
Token filter that generates bigrams for frequently occuring terms.
Single terms are still indexed. It can be used as an alternative to the
<<analysis-stop-tokenfilter,Stop
Token Filter>> when we don't want to completely ignore common terms.
For example, the text "the quick brown is a fox" will be tokenized as
"the", "the_quick", "quick", "brown", "brown_is", "is_a", "a_fox",
"fox". Assuming "the", "is" and "a" are common words.
When `query_mode` is enabled, the token filter removes common words and
single terms followed by a common word. This parameter should be enabled
in the search analyzer.
For example, the query "the quick brown is a fox" will be tokenized as
"the_quick", "quick", "brown_is", "is_a", "a_fox", "fox".
The following are settings that can be set:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`common_words` |A list of common words to use.
|`common_words_path` |A path (either relative to `config` location, or
absolute) to a list of common words. Each word should be in its own
"line" (separated by a line break). The file must be UTF-8 encoded.
|`ignore_case` |If true, common words matching will be case insensitive
(defaults to `false`).
|`query_mode` |Generates bigrams then removes common words and single
terms followed by a common word (defaults to `false`).
|=======================================================================
Note, `common_words` or `common_words_path` field is required.
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
index_grams :
tokenizer : whitespace
filter : [common_grams]
search_grams :
tokenizer : whitespace
filter : [common_grams_query]
filter :
common_grams :
type : common_grams
common_words: [a, an, the]
common_grams_query :
type : common_grams
query_mode: true
common_words: [a, an, the]
--------------------------------------------------

View File

@ -0,0 +1,48 @@
[[analysis-compound-word-tokenfilter]]
=== Compound Word Token Filter
Token filters that allow to decompose compound words. There are two
types available: `dictionary_decompounder` and
`hyphenation_decompounder`.
The following are settings that can be set for a compound word token
filter type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`word_list` |A list of words to use.
|`word_list_path` |A path (either relative to `config` location, or
absolute) to a list of words.
|`min_word_size` |Minimum word size(Integer). Defaults to 5.
|`min_subword_size` |Minimum subword size(Integer). Defaults to 2.
|`max_subword_size` |Maximum subword size(Integer). Defaults to 15.
|`only_longest_match` |Only matching the longest(Boolean). Defaults to
`false`
|=======================================================================
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : standard
filter : [myTokenFilter1, myTokenFilter2]
filter :
myTokenFilter1 :
type : dictionary_decompounder
word_list: [one, two, three]
myTokenFilter2 :
type : hyphenation_decompounder
word_list_path: path/to/words.txt
max_subword_size : 22
--------------------------------------------------

View File

@ -0,0 +1,16 @@
[[analysis-edgengram-tokenfilter]]
=== Edge NGram Token Filter
A token filter of type `edgeNGram`.
The following are settings that can be set for a `edgeNGram` token
filter type:
[cols="<,<",options="header",]
|======================================================
|Setting |Description
|`min_gram` |Defaults to `1`.
|`max_gram` |Defaults to `2`.
|`side` |Either `front` or `back`. Defaults to `front`.
|======================================================

View File

@ -0,0 +1,28 @@
[[analysis-elision-tokenfilter]]
=== Elision Token Filter
A token filter which removes elisions. For example, "l'avion" (the
plane) will tokenized as "avion" (plane).
Accepts `articles` setting which is a set of stop words articles. For
example:
[source,js]
--------------------------------------------------
"index" : {
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "standard",
"filter" : ["standard", "elision"]
}
},
"filter" : {
"elision" : {
"type" : "elision",
"articles" : ["l", "m", "t", "qu", "n", "s", "j"]
}
}
}
}
--------------------------------------------------

View File

@ -0,0 +1,116 @@
[[analysis-hunspell-tokenfilter]]
=== Hunspell Token Filter
Basic support for hunspell stemming. Hunspell dictionaries will be
picked up from a dedicated hunspell directory on the filesystem
(defaults to `<path.conf>/hunspell`). Each dictionary is expected to
have its own directory named after its associated locale (language).
This dictionary directory is expected to hold both the \*.aff and \*.dic
files (all of which will automatically be picked up). For example,
assuming the default hunspell location is used, the following directory
layout will define the `en_US` dictionary:
[source,js]
--------------------------------------------------
- conf
|-- hunspell
| |-- en_US
| | |-- en_US.dic
| | |-- en_US.aff
--------------------------------------------------
The location of the hunspell directory can be configured using the
`indices.analysis.hunspell.dictionary.location` settings in
_elasticsearch.yml_.
Each dictionary can be configured with two settings:
`ignore_case`::
If true, dictionary matching will be case insensitive
(defaults to `false`)
`strict_affix_parsing`::
Determines whether errors while reading a
affix rules file will cause exception or simple be ignored (defaults to
`true`)
These settings can be configured globally in `elasticsearch.yml` using
* `indices.analysis.hunspell.dictionary.ignore_case` and
* `indices.analysis.hunspell.dictionary.strict_affix_parsing`
or for specific dictionaries:
* `indices.analysis.hunspell.dictionary.en_US.ignore_case` and
* `indices.analysis.hunspell.dictionary.en_US.strict_affix_parsing`.
It is also possible to add `settings.yml` file under the dictionary
directory which holds these settings (this will override any other
settings defined in the `elasticsearch.yml`).
One can use the hunspell stem filter by configuring it the analysis
settings:
[source,js]
--------------------------------------------------
{
"analysis" : {
"analyzer" : {
"en" : {
"tokenizer" : "standard",
"filter" : [ "lowercase", "en_US" ]
}
},
"filter" : {
"en_US" : {
"type" : "hunspell",
"locale" : "en_US",
"dedup" : true
}
}
}
}
--------------------------------------------------
The hunspell token filter accepts four options:
`locale`::
A locale for this filter. If this is unset, the `lang` or
`language` are used instead - so one of these has to be set.
`dictionary`::
The name of a dictionary. The path to your hunspell
dictionaries should be configured via
`indices.analysis.hunspell.dictionary.location` before.
`dedup`::
If only unique terms should be returned, this needs to be
set to `true`. Defaults to `true`.
`recursion_level`::
Configures the recursion level a
stemmer can go into. Defaults to `2`. Some languages (for example czech)
give better results when set to `1` or `0`, so you should test it out.
(since 0.90.3)
NOTE: As opposed to the snowball stemmers (which are algorithm based)
this is a dictionary lookup based stemmer and therefore the quality of
the stemming is determined by the quality of the dictionary.
[float]
==== References
Hunspell is a spell checker and morphological analyzer designed for
languages with rich morphology and complex word compounding and
character encoding.
1. Wikipedia, http://en.wikipedia.org/wiki/Hunspell
2. Source code, http://hunspell.sourceforge.net/
3. Open Office Hunspell dictionaries, http://wiki.openoffice.org/wiki/Dictionaries
4. Mozilla Hunspell dictionaries, https://addons.mozilla.org/en-US/firefox/language-tools/
5. Chromium Hunspell dictionaries,
http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/

View File

@ -0,0 +1,34 @@
[[analysis-keyword-marker-tokenfilter]]
=== Keyword Marker Token Filter
Protects words from being modified by stemmers. Must be placed before
any stemming filters.
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`keywords` |A list of words to use.
|`keywords_path` |A path (either relative to `config` location, or
absolute) to a list of words.
|`ignore_case` |Set to `true` to lower case all words first. Defaults to
`false`.
|=======================================================================
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer :
type : custom
tokenizer : standard
filter : [lowercase, protwods, porterStem]
filter :
protwods :
type : keyword_marker
keywords_path : analysis/protwords.txt
--------------------------------------------------

View File

@ -0,0 +1,28 @@
[[analysis-keyword-repeat-tokenfilter]]
=== Keyword Repeat Token Filter
The `keyword_repeat` token filter Emits each incoming token twice once
as keyword and once as a non-keyword to allow an un-stemmed version of a
term to be indexed side by site to the stemmed version of the term.
Given the nature of this filter each token that isn't transformed by a
subsequent stemmer will be indexed twice. Therefore, consider adding a
`unique` filter with `only_on_same_position` set to `true` to drop
unnecessary duplicates.
Note: this is available from `0.90.0.Beta2` on.
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer :
type : custom
tokenizer : standard
filter : [lowercase, keyword_repeat, porterStem, unique_stem]
unique_stem:
type: unique
only_on_same_position : true
--------------------------------------------------

View File

@ -0,0 +1,6 @@
[[analysis-kstem-tokenfilter]]
=== KStem Token Filter
The `kstem` token filter is a high performance filter for english. All
terms must already be lowercased (use `lowercase` filter) for this
filter to work correctly.

View File

@ -0,0 +1,16 @@
[[analysis-length-tokenfilter]]
=== Length Token Filter
A token filter of type `length` that removes words that are too long or
too short for the stream.
The following are settings that can be set for a `length` token filter
type:
[cols="<,<",options="header",]
|===========================================================
|Setting |Description
|`min` |The minimum number. Defaults to `0`.
|`max` |The maximum number. Defaults to `Integer.MAX_VALUE`.
|===========================================================

View File

@ -0,0 +1,32 @@
[[analysis-limit-token-count-tokenfilter]]
=== Limit Token Count Token Filter
Limits the number of tokens that are indexed per document and field.
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`max_token_count` |The maximum number of tokens that should be indexed
per document and field. The default is `1`
|`consume_all_tokens` |If set to `true` the filter exhaust the stream
even if `max_token_count` tokens have been consumed already. The default
is `false`.
|=======================================================================
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer :
type : custom
tokenizer : standard
filter : [lowercase, five_token_limit]
filter :
five_token_limit :
type : limit
max_token_count : 5
--------------------------------------------------

View File

@ -0,0 +1,37 @@
[[analysis-lowercase-tokenfilter]]
=== Lowercase Token Filter
A token filter of type `lowercase` that normalizes token text to lower
case.
Lowercase token filter supports Greek and Turkish lowercase token
filters through the `language` parameter. Below is a usage example in a
custom analyzer
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : myTokenizer1
filter : [myTokenFilter1, myGreekLowerCaseFilter]
char_filter : [my_html]
tokenizer :
myTokenizer1 :
type : standard
max_token_length : 900
filter :
myTokenFilter1 :
type : stop
stopwords : [stop1, stop2, stop3, stop4]
myGreekLowerCaseFilter :
type : lowercase
language : greek
char_filter :
my_html :
type : html_strip
escaped_tags : [xxx, yyy]
read_ahead : 1024
--------------------------------------------------

View File

@ -0,0 +1,15 @@
[[analysis-ngram-tokenfilter]]
=== NGram Token Filter
A token filter of type `nGram`.
The following are settings that can be set for a `nGram` token filter
type:
[cols="<,<",options="header",]
|============================
|Setting |Description
|`min_gram` |Defaults to `1`.
|`max_gram` |Defaults to `2`.
|============================

View File

@ -0,0 +1,15 @@
[[analysis-normalization-tokenfilter]]
=== Normalization Token Filter
There are several token filters available which try to normalize special
characters of a certain language.
You can currently choose between `arabic_normalization` and
`persian_normalization` normalization in your token filter
configuration. For more information check the
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizer.html[ArabicNormalizer]
or the
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizer.html[PersianNormalizer]
documentation.
*Note:* This filters are available since `0.90.2`

View File

@ -0,0 +1,134 @@
[[analysis-pattern-capture-tokenfilter]]
=== Pattern Capture Token Filter
The `pattern_capture` token filter, unlike the `pattern` tokenizer,
emits a token for every capture group in the regular expression.
Patterns are not anchored to the beginning and end of the string, so
each pattern can match multiple times, and matches are allowed to
overlap.
For instance a pattern like :
[source,js]
--------------------------------------------------
"(([a-z]+)(\d*))"
--------------------------------------------------
when matched against:
[source,js]
--------------------------------------------------
"abc123def456"
--------------------------------------------------
would produce the tokens: [ `abc123`, `abc`, `123`, `def456`, `def`,
`456` ]
If `preserve_original` is set to `true` (the default) then it would also
emit the original token: `abc123def456`.
This is particularly useful for indexing text like camel-case code, eg
`stripHTML` where a user may search for `"strip html"` or `"striphtml"`:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/ -d '
{
"settings" : {
"analysis" : {
"filter" : {
"code" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)",
"(\\d+)"
]
}
},
"analyzer" : {
"code" : {
"tokenizer" : "pattern",
"filter" : [ "code", "lowercase" ]
}
}
}
}
}
'
--------------------------------------------------
When used to analyze the text
[source,js]
--------------------------------------------------
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml
--------------------------------------------------
this emits the tokens: [ `import`, `static`, `org`, `apache`, `commons`,
`lang`, `stringescapeutils`, `string`, `escape`, `utils`, `escapehtml`,
`escape`, `html` ]
Another example is analyzing email addresses:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/test/ -d '
{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"(\\w+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
}
}
'
--------------------------------------------------
When the above analyzer is used on an email address like:
[source,js]
--------------------------------------------------
john-smith_123@foo-bar.com
--------------------------------------------------
it would produce the following tokens: [ `john-smith_123`,
`foo-bar.com`, `john`, `smith_123`, `smith`, `123`, `foo`,
`foo-bar.com`, `bar`, `com` ]
Multiple patterns are required to allow overlapping captures, but also
means that patterns are less dense and easier to understand.
*Note:* All tokens are emitted in the same position, and with the same
character offsets, so when combined with highlighting, the whole
original token will be highlighted, not just the matching subset. For
instance, querying the above email address for `"smith"` would
highlight:
[source,js]
--------------------------------------------------
<em>john-smith_123@foo-bar.com</em>
--------------------------------------------------
not:
[source,js]
--------------------------------------------------
john-<em>smith</em>_123@foo-bar.com
--------------------------------------------------

View File

@ -0,0 +1,9 @@
[[analysis-pattern_replace-tokenfilter]]
=== Pattern Replace Token Filter
The `pattern_replace` token filter allows to easily handle string
replacements based on a regular expression. The regular expression is
defined using the `pattern` parameter, and the replacement string can be
provided using the `replacement` parameter (supporting referencing the
original text, as explained
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).

View File

@ -0,0 +1,5 @@
[[analysis-phonetic-tokenfilter]]
=== Phonetic Token Filter
The `phonetic` token filter is provided as a plugin and located
https://github.com/elasticsearch/elasticsearch-analysis-phonetic[here].

View File

@ -0,0 +1,15 @@
[[analysis-porterstem-tokenfilter]]
=== Porter Stem Token Filter
A token filter of type `porterStem` that transforms the token stream as
per the Porter stemming algorithm.
Note, the input to the stemming filter must already be in lower case, so
you will need to use
<<analysis-lowercase-tokenfilter,Lower
Case Token Filter>> or
<<analysis-lowercase-tokenizer,Lower
Case Tokenizer>> farther down the Tokenizer chain in order for this to
work properly!. For example, when using custom analyzer, make sure the
`lowercase` filter comes before the `porterStem` filter in the list of
filters.

View File

@ -0,0 +1,4 @@
[[analysis-reverse-tokenfilter]]
=== Reverse Token Filter
A token filter of type `reverse` that simply reverses each token.

View File

@ -0,0 +1,36 @@
[[analysis-shingle-tokenfilter]]
=== Shingle Token Filter
A token filter of type `shingle` that constructs shingles (token
n-grams) from a token stream. In other words, it creates combinations of
tokens as a single token. For example, the sentence "please divide this
sentence into shingles" might be tokenized into shingles "please
divide", "divide this", "this sentence", "sentence into", and "into
shingles".
This filter handles position increments > 1 by inserting filler tokens
(tokens with termtext "_"). It does not handle a position increment of
0.
The following are settings that can be set for a `shingle` token filter
type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`max_shingle_size` |The maximum shingle size. Defaults to `2`.
|`min_shingle_sizes` |The minimum shingle size. Defaults to `2`.
|`output_unigrams` |If `true` the output will contain the input tokens
(unigrams) as well as the shingles. Defaults to `true`.
|`output_unigrams_if_no_shingles` |If `output_unigrams` is `false` the
output will contain the input tokens (unigrams) if no shingles are
available. Note if `output_unigrams` is set to `true` this setting has
no effect. Defaults to `false`.
|`token_separator` |The string to use when joining adjacent tokens to
form a shingle. Defaults to `" "`.
|=======================================================================

View File

@ -0,0 +1,33 @@
[[analysis-snowball-tokenfilter]]
=== Snowball Token Filter
A filter that stems words using a Snowball-generated stemmer. The
`language` parameter controls the stemmer with the following available
values: `Armenian`, `Basque`, `Catalan`, `Danish`, `Dutch`, `English`,
`Finnish`, `French`, `German`, `German2`, `Hungarian`, `Italian`, `Kp`,
`Lovins`, `Norwegian`, `Porter`, `Portuguese`, `Romanian`, `Russian`,
`Spanish`, `Swedish`, `Turkish`.
For example:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_snow"]
}
},
"filter" : {
"my_snow" : {
"type" : "snowball",
"language" : "Lovins"
}
}
}
}
}
--------------------------------------------------

View File

@ -0,0 +1,7 @@
[[analysis-standard-tokenfilter]]
=== Standard Token Filter
A token filter of type `standard` that normalizes tokens extracted with
the
<<analysis-standard-tokenizer,Standard
Tokenizer>>.

View File

@ -0,0 +1,34 @@
[[analysis-stemmer-override-tokenfilter]]
=== Stemmer Override Token Filter
Overrides stemming algorithms, by applying a custom mapping, then
protecting these terms from being modified by stemmers. Must be placed
before any stemming filters.
Rules are separated by "=>"
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`rules` |A list of mapping rules to use.
|`rules_path` |A path (either relative to `config` location, or
absolute) to a list of mappings.
|=======================================================================
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer :
type : custom
tokenizer : standard
filter : [lowercase, custom_stems, porterStem]
filter:
custom_stems:
type: stemmer_override
rules_path : analysis/custom_stems.txt
--------------------------------------------------

View File

@ -0,0 +1,78 @@
[[analysis-stemmer-tokenfilter]]
=== Stemmer Token Filter
A filter that stems words (similar to `snowball`, but with more
options). The `language`/@name@ parameter controls the stemmer with the
following available values:
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Far%2FArabicStemmer.html[arabic],
http://snowball.tartarus.org/algorithms/armenian/stemmer.html[armenian],
http://snowball.tartarus.org/algorithms/basque/stemmer.html[basque],
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fbr%2FBrazilianStemmer.html[brazilian],
http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf[bulgarian],
http://snowball.tartarus.org/algorithms/catalan/stemmer.html[catalan],
http://portal.acm.org/citation.cfm?id=1598600[czech],
http://snowball.tartarus.org/algorithms/danish/stemmer.html[danish],
http://snowball.tartarus.org/algorithms/dutch/stemmer.html[dutch],
http://snowball.tartarus.org/algorithms/english/stemmer.html[english],
http://snowball.tartarus.org/algorithms/finnish/stemmer.html[finnish],
http://snowball.tartarus.org/algorithms/french/stemmer.html[french],
http://snowball.tartarus.org/algorithms/german/stemmer.html[german],
http://snowball.tartarus.org/algorithms/german2/stemmer.html[german2],
http://sais.se/mthprize/2007/ntais2007.pdf[greek],
http://snowball.tartarus.org/algorithms/hungarian/stemmer.html[hungarian],
http://snowball.tartarus.org/algorithms/italian/stemmer.html[italian],
http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html[kp],
http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[kstem],
http://snowball.tartarus.org/algorithms/lovins/stemmer.html[lovins],
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Flv%2FLatvianStemmer.html[latvian],
http://snowball.tartarus.org/algorithms/norwegian/stemmer.html[norwegian],
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fno%2FNorwegianMinimalStemFilter.html[minimal_norwegian],
http://snowball.tartarus.org/algorithms/porter/stemmer.html[porter],
http://snowball.tartarus.org/algorithms/portuguese/stemmer.html[portuguese],
http://snowball.tartarus.org/algorithms/romanian/stemmer.html[romanian],
http://snowball.tartarus.org/algorithms/russian/stemmer.html[russian],
http://snowball.tartarus.org/algorithms/spanish/stemmer.html[spanish],
http://snowball.tartarus.org/algorithms/swedish/stemmer.html[swedish],
http://snowball.tartarus.org/algorithms/turkish/stemmer.html[turkish],
http://www.medialab.tfe.umu.se/courses/mdm0506a/material/fulltext_ID%3D10049387%26PLACEBO%3DIE.pdf[minimal_english],
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fen%2FEnglishPossessiveFilter.html[possessive_english],
http://clef.isti.cnr.it/2003/WN_web/22.pdf[light_finish],
http://dl.acm.org/citation.cfm?id=1141523[light_french],
http://dl.acm.org/citation.cfm?id=318984[minimal_french],
http://dl.acm.org/citation.cfm?id=1141523[light_german],
http://members.unine.ch/jacques.savoy/clef/morpho.pdf[minimal_german],
http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf[hindi],
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[light_hungarian],
http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf[indonesian],
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[light_italian],
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[light_portuguese],
http://www.inf.ufrgs.br/\~buriol/papers/Orengo_CLEF07.pdf[minimal_portuguese],
http://www.inf.ufrgs.br/\~viviane/rslp/index.htm[portuguese],
http://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf[light_russian],
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[light_spanish],
http://clef.isti.cnr.it/2003/WN_web/22.pdf[light_swedish].
For example:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "light_german"
}
}
}
}
}
--------------------------------------------------

View File

@ -0,0 +1,33 @@
[[analysis-stop-tokenfilter]]
=== Stop Token Filter
A token filter of type `stop` that removes stop words from token
streams.
The following are settings that can be set for a `stop` token filter
type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`stopwords` |A list of stop words to use. Defaults to english stop
words.
|`stopwords_path` |A path (either relative to `config` location, or
absolute) to a stopwords file configuration. Each stop word should be in
its own "line" (separated by a line break). The file must be UTF-8
encoded.
|`enable_position_increments` |Set to `true` if token positions should
record the removed stop words, `false` otherwise. Defaults to `true`.
|`ignore_case` |Set to `true` to lower case all words first. Defaults to
`false`.
|=======================================================================
stopwords allow for custom language specific expansion of default
stopwords. It follows the `_lang_` notation and supports: arabic,
armenian, basque, brazilian, bulgarian, catalan, czech, danish, dutch,
english, finnish, french, galician, german, greek, hindi, hungarian,
indonesian, italian, norwegian, persian, portuguese, romanian, russian,
spanish, swedish, turkish.

View File

@ -0,0 +1,124 @@
[[analysis-synonym-tokenfilter]]
=== Synonym Token Filter
The `synonym` token filter allows to easily handle synonyms during the
analysis process. Synonyms are configured using a configuration file.
Here is an example:
[source,js]
--------------------------------------------------
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}
--------------------------------------------------
The above configures a `synonym` filter, with a path of
`analysis/synonym.txt` (relative to the `config` location). The
`synonym` analyzer is then configured with the filter. Additional
settings are: `ignore_case` (defaults to `false`), and `expand`
(defaults to `true`).
The `tokenizer` parameter controls the tokenizers that will be used to
tokenize the synonym, and defaults to the `whitespace` tokenizer.
As of elasticsearch 0.17.9 two synonym formats are supported: Solr,
WordNet.
[float]
==== Solr synonyms
The following is a sample format of the file:
[source,js]
--------------------------------------------------
# blank lines and lines starting with pound are comments.
#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS. These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
#Equivalent synonyms may be separated with commas and give
#no explicit mapping. In this case the mapping behavior will
#be taken from the expand parameter in the schema. This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod
#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz
--------------------------------------------------
You can also define synonyms for the filter directly in the
configuration file (note use of `synonyms` instead of `synonyms_path`):
[source,js]
--------------------------------------------------
{
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"i-pod, i pod => ipod",
"universe, cosmos"
]
}
}
}
--------------------------------------------------
However, it is recommended to define large synonyms set in a file using
`synonyms_path`.
[float]
==== WordNet synonyms
Synonyms based on http://wordnet.princeton.edu/[WordNet] format can be
declared using `format`:
[source,js]
--------------------------------------------------
{
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms" : [
"s(100000001,1,'abstain',v,1,0).",
"s(100000001,2,'refrain',v,1,0).",
"s(100000001,3,'desist',v,1,0)."
]
}
}
}
--------------------------------------------------
Using `synonyms_path` to define WordNet synonyms in a file is supported
as well.

View File

@ -0,0 +1,4 @@
[[analysis-trim-tokenfilter]]
=== Trim Token Filter
The `trim` token filter trims the whitespace surrounding a token.

View File

@ -0,0 +1,10 @@
[[analysis-truncate-tokenfilter]]
=== Truncate Token Filter
The `truncate` token filter can be used to truncate tokens into a
specific length. This can come in handy with keyword (single token)
based mapped fields that are used for sorting in order to reduce memory
usage.
It accepts a `length` parameter which control the number of characters
to truncate to, defaults to `10`.

View File

@ -0,0 +1,7 @@
[[analysis-unique-tokenfilter]]
=== Unique Token Filter
The `unique` token filter can be used to only index unique tokens during
analysis. By default it is applied on all the token stream. If
`only_on_same_position` is set to `true`, it will only remove duplicate
tokens on the same position.

View File

@ -0,0 +1,80 @@
[[analysis-word-delimiter-tokenfilter]]
=== Word Delimiter Token Filter
Named `word_delimiter`, it Splits words into subwords and performs
optional transformations on subword groups. Words are split into
subwords with the following rules:
* split on intra-word delimiters (by default, all non alpha-numeric
characters).
* "Wi-Fi" -> "Wi", "Fi"
* split on case transitions: "PowerShot" -> "Power", "Shot"
* split on letter-number transitions: "SD500" -> "SD", "500"
* leading and trailing intra-word delimiters on each subword are
ignored: "//hello---there, 'dude'" -> "hello", "there", "dude"
* trailing "'s" are removed for each subword: "O'Neil's" -> "O", "Neil"
Parameters include:
`generate_word_parts`::
If `true` causes parts of words to be
generated: "PowerShot" => "Power" "Shot". Defaults to `true`.
`generate_number_parts`::
If `true` causes number subwords to be
generated: "500-42" => "500" "42". Defaults to `true`.
`catenate_words`::
If `true` causes maximum runs of word parts to be
catenated: "wi-fi" => "wifi". Defaults to `false`.
`catenate_numbers`::
If `true` causes maximum runs of number parts to
be catenated: "500-42" => "50042". Defaults to `false`.
`catenate_all`::
If `true` causes all subword parts to be catenated:
"wi-fi-4000" => "wifi4000". Defaults to `false`.
`split_on_case_change`::
If `true` causes "PowerShot" to be two tokens;
("Power-Shot" remains two parts regards). Defaults to `true`.
`preserve_original`::
If `true` includes original words in subwords:
"500-42" => "500-42" "500" "42". Defaults to `false`.
`split_on_numerics`::
If `true` causes "j2se" to be three tokens; "j"
"2" "se". Defaults to `true`.
`stem_english_possessive`::
If `true` causes trailing "'s" to be
removed for each subword: "O'Neil's" => "O", "Neil". Defaults to `true`.
Advance settings include:
`protected_words`::
A list of protected words from being delimiter.
Either an array, or also can set `protected_words_path` which resolved
to a file configured with protected words (one on each line).
Automatically resolves to `config/` based location if exists.
`type_table`::
A custom type mapping table, for example (when configured
using `type_table_path`):
[source,js]
--------------------------------------------------
# Map the $, %, '.', and ',' characters to DIGIT
# This might be useful for financial data.
$ => DIGIT
% => DIGIT
. => DIGIT
\\u002C => DIGIT
# in some cases you might not want to split on ZWJ
# this also tests the case where we need a bigger byte[]
# see http://en.wikipedia.org/wiki/Zero-width_joiner
\\u200D => ALPHANUM
--------------------------------------------------

View File

@ -0,0 +1,30 @@
[[analysis-tokenizers]]
== Tokenizers
Tokenizers are used to break a string down into a stream of terms
or tokens. A simple tokenizer might split the string up into terms
wherever it encounters whitespace or punctuation.
Elasticsearch has a number of built in tokenizers which can be
used to build <<analysis-custom-analyzer,custom analyzers>>.
include::tokenizers/standard-tokenizer.asciidoc[]
include::tokenizers/edgengram-tokenizer.asciidoc[]
include::tokenizers/keyword-tokenizer.asciidoc[]
include::tokenizers/letter-tokenizer.asciidoc[]
include::tokenizers/lowercase-tokenizer.asciidoc[]
include::tokenizers/ngram-tokenizer.asciidoc[]
include::tokenizers/whitespace-tokenizer.asciidoc[]
include::tokenizers/pattern-tokenizer.asciidoc[]
include::tokenizers/uaxurlemail-tokenizer.asciidoc[]
include::tokenizers/pathhierarchy-tokenizer.asciidoc[]

View File

@ -0,0 +1,80 @@
[[analysis-edgengram-tokenizer]]
=== Edge NGram Tokenizer
A tokenizer of type `edgeNGram`.
This tokenizer is very similar to `nGram` but only keeps n-grams which
start at the beginning of a token.
The following are settings that can be set for a `edgeNGram` tokenizer
type:
[cols="<,<,<",options="header",]
|=======================================================================
|Setting |Description |Default value
|`min_gram` |Minimum size in codepoints of a single n-gram |`1`.
|`max_gram` |Maximum size in codepoints of a single n-gram |`2`.
|`token_chars` |(Since `0.90.2`) Characters classes to keep in the
tokens, Elasticsearch will split on characters that don't belong to any
of these classes. |`[]` (Keep all characters)
|=======================================================================
`token_chars` accepts the following character classes:
[horizontal]
`letter`:: for example `a`, `b`, `ï` or `京`
`digit`:: for example `3` or `7`
`whitespace`:: for example `" "` or `"\n"`
`punctuation`:: for example `!` or `"`
`symbol`:: for example `$` or `√`
[float]
==== Example
[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/test' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_edge_ngram_analyzer" : {
"tokenizer" : "my_edge_ngram_tokenizer"
}
},
"tokenizer" : {
"my_edge_ngram_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "5",
"token_chars": [ "letter", "digit" ]
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'FC Schalke 04'
# FC, Sc, Sch, Scha, Schal, 04
--------------------------------------------------
[float]
==== `side` deprecated
There used to be a @side@ parameter up to `0.90.1` but it is now deprecated. In
order to emulate the behavior of `"side" : "BACK"` a
<<analysis-reverse-tokenfilter,`reverse` token filter>> should be used together
with the <<analysis-edgengram-tokenfilter,`edgeNGram` token filter>>. The
`edgeNGram` filter must be enclosed in `reverse` filters like this:
[source,js]
--------------------------------------------------
"filter" : ["reverse", "edgeNGram", "reverse"]
--------------------------------------------------
which essentially reverses the token, builds front `EdgeNGrams` and reverses
the ngram again. This has the same effect as the previous `"side" : "BACK"` setting.

View File

@ -0,0 +1,15 @@
[[analysis-keyword-tokenizer]]
=== Keyword Tokenizer
A tokenizer of type `keyword` that emits the entire input as a single
input.
The following are settings that can be set for a `keyword` tokenizer
type:
[cols="<,<",options="header",]
|=======================================================
|Setting |Description
|`buffer_size` |The term buffer size. Defaults to `256`.
|=======================================================

View File

@ -0,0 +1,7 @@
[[analysis-letter-tokenizer]]
=== Letter Tokenizer
A tokenizer of type `letter` that divides text at non-letters. That's to
say, it defines tokens as maximal strings of adjacent letters. Note,
this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.

View File

@ -0,0 +1,15 @@
[[analysis-lowercase-tokenizer]]
=== Lowercase Tokenizer
A tokenizer of type `lowercase` that performs the function of
<<analysis-letter-tokenizer,Letter
Tokenizer>> and
<<analysis-lowercase-tokenfilter,Lower
Case Token Filter>> together. It divides text at non-letters and converts
them to lower case. While it is functionally equivalent to the
combination of
<<analysis-letter-tokenizer,Letter
Tokenizer>> and
<<analysis-lowercase-tokenizer,Lower
Case Token Filter>>, there is a performance advantage to doing the two
tasks at once, hence this (redundant) implementation.

View File

@ -0,0 +1,57 @@
[[analysis-ngram-tokenizer]]
=== NGram Tokenizer
A tokenizer of type `nGram`.
The following are settings that can be set for a `nGram` tokenizer type:
[cols="<,<,<",options="header",]
|=======================================================================
|Setting |Description |Default value
|`min_gram` |Minimum size in codepoints of a single n-gram |`1`.
|`max_gram` |Maximum size in codepoints of a single n-gram |`2`.
|`token_chars` |(Since `0.90.2`) Characters classes to keep in the
tokens, Elasticsearch will split on characters that don't belong to any
of these classes. |`[]` (Keep all characters)
|=======================================================================
`token_chars` accepts the following character classes:
[horizontal]
`letter`:: for example `a`, `b`, `ï` or `京`
`digit`:: for example `3` or `7`
`whitespace`:: for example `" "` or `"\n"`
`punctuation`:: for example `!` or `"`
`symbol`:: for example `$` or `√`
[float]
==== Example
[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/test' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_ngram_analyzer" : {
"tokenizer" : "my_ngram_tokenizer"
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "3",
"token_chars": [ "letter", "digit" ]
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'
# FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04
--------------------------------------------------

View File

@ -0,0 +1,32 @@
[[analysis-pathhierarchy-tokenizer]]
=== Path Hierarchy Tokenizer
The `path_hierarchy` tokenizer takes something like this:
-------------------------
/something/something/else
-------------------------
And produces tokens:
-------------------------
/something
/something/something
/something/something/else
-------------------------
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`delimiter` |The character delimiter to use, defaults to `/`.
|`replacement` |An optional replacement character to use. Defaults to
the `delimiter`.
|`buffer_size` |The buffer size to use, defaults to `1024`.
|`reverse` |Generates tokens in reverse order, defaults to `false`.
|`skip` |Controls initial tokens to skip, defaults to `0`.
|=======================================================================

View File

@ -0,0 +1,29 @@
[[analysis-pattern-tokenizer]]
=== Pattern Tokenizer
A tokenizer of type `pattern` that can flexibly separate text into terms
via a regular expression. Accepts the following settings:
[cols="<,<",options="header",]
|======================================================================
|Setting |Description
|`pattern` |The regular expression pattern, defaults to `\\W+`.
|`flags` |The regular expression flags.
|`group` |Which group to extract into tokens. Defaults to `-1` (split).
|======================================================================
*IMPORTANT*: The regular expression should match the *token separators*,
not the tokens themselves.
`group` set to `-1` (the default) is equivalent to "split". Using group
>= 0 selects the matching group as the token. For example, if you have:
------------------------
pattern = \\'([^\']+)\\'
group = 0
input = aaa 'bbb' 'ccc'
------------------------
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks).
With the same input but using group=1, the output would be: bbb and ccc
(no ' marks).

View File

@ -0,0 +1,18 @@
[[analysis-standard-tokenizer]]
=== Standard Tokenizer
A tokenizer of type `standard` providing grammar based tokenizer that is
a good tokenizer for most European language documents. The tokenizer
implements the Unicode Text Segmentation algorithm, as specified in
http://unicode.org/reports/tr29/[Unicode Standard Annex #29].
The following are settings that can be set for a `standard` tokenizer
type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`max_token_length` |The maximum token length. If a token is seen that
exceeds this length then it is discarded. Defaults to `255`.
|=======================================================================

View File

@ -0,0 +1,16 @@
[[analysis-uaxurlemail-tokenizer]]
=== UAX Email URL Tokenizer
A tokenizer of type `uax_url_email` which works exactly like the
`standard` tokenizer, but tokenizes emails and urls as single tokens.
The following are settings that can be set for a `uax_url_email`
tokenizer type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`max_token_length` |The maximum token length. If a token is seen that
exceeds this length then it is discarded. Defaults to `255`.
|=======================================================================

View File

@ -0,0 +1,4 @@
[[analysis-whitespace-tokenizer]]
=== Whitespace Tokenizer
A tokenizer of type `whitespace` that divides text at whitespace.

View File

@ -0,0 +1,46 @@
[[cluster]]
= Cluster APIs
[partintro]
--
["float",id="cluster-nodes"]
== Nodes
Most cluster level APIs allow to specify which nodes to execute on (for
example, getting the node stats for a node). Nodes can be identified in
the APIs either using their internal node id, the node name, address,
custom attributes, or just the `_local` node receiving the request. For
example, here are some sample executions of nodes info:
[source,js]
--------------------------------------------------
# Local
curl localhost:9200/_cluster/nodes/_local
# Address
curl localhost:9200/_cluster/nodes/10.0.0.3,10.0.0.4
curl localhost:9200/_cluster/nodes/10.0.0.*
# Names
curl localhost:9200/_cluster/nodes/node_name_goes_here
curl localhost:9200/_cluster/nodes/node_name_goes_*
# Attributes (set something like node.rack: 2 in the config)
curl localhost:9200/_cluster/nodes/rack:2
curl localhost:9200/_cluster/nodes/ra*:2
curl localhost:9200/_cluster/nodes/ra*:2*
--------------------------------------------------
--
include::cluster/health.asciidoc[]
include::cluster/state.asciidoc[]
include::cluster/reroute.asciidoc[]
include::cluster/update-settings.asciidoc[]
include::cluster/nodes-stats.asciidoc[]
include::cluster/nodes-info.asciidoc[]
include::cluster/nodes-hot-threads.asciidoc[]
include::cluster/nodes-shutdown.asciidoc[]

View File

@ -0,0 +1,86 @@
[[cluster-health]]
== Cluster Health
The cluster health API allows to get a very simple status on the health
of the cluster.
[source,js]
--------------------------------------------------
$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "testcluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
--------------------------------------------------
The API can also be executed against one or more indices to get just the
specified indices health:
[source,js]
--------------------------------------------------
$ curl -XGET 'http://localhost:9200/_cluster/health/test1,test2'
--------------------------------------------------
The cluster health status is: `green`, `yellow` or `red`. On the shard
level, a `red` status indicates that the specific shard is not allocated
in the cluster, `yellow` means that the primary shard is allocated but
replicas are not, and `green` means that all shards are allocated. The
index level status is controlled by the worst shard status. The cluster
status is controlled by the worst index status.
One of the main benefits of the API is the ability to wait until the
cluster reaches a certain high water-mark health level. For example, the
following will wait till the cluster reaches the `yellow` level for 50
seconds (if it reaches the `green` or `yellow` status beforehand, it
will return):
[source,js]
--------------------------------------------------
$ curl -XGET 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=50s'
--------------------------------------------------
[float]
=== Request Parameters
The cluster health API accepts the following request parameters:
`level`::
Can be one of `cluster`, `indices` or `shards`. Controls the
details level of the health information returned. Defaults to `cluster`.
`wait_for_status`::
One of `green`, `yellow` or `red`. Will wait (until
the timeout provided) until the status of the cluster changes to the one
provided. By default, will not wait for any status.
`wait_for_relocating_shards`::
A number controlling to how many relocating
shards to wait for. Usually will be `0` to indicate to wait till all
relocation have happened. Defaults to not to wait.
`wait_for_nodes`::
The request waits until the specified number `N` of
nodes is available. It also accepts `>=N`, `<=N`, `>N` and `<N`.
Alternatively, it is possible to use `ge(N)`, `le(N)`, `gt(N)` and
`lt(N)` notation.
`timeout`::
A time based parameter controlling how long to wait if one of
the wait_for_XXX are provided. Defaults to `30s`.
The following is an example of getting the cluster health at the
`shards` level:
[source,js]
--------------------------------------------------
$ curl -XGET 'http://localhost:9200/_cluster/health/twitter?level=shards'
--------------------------------------------------

View File

@ -0,0 +1,16 @@
[[cluster-nodes-hot-threads]]
== Nodes hot_threads
An API allowing to get the current hot threads on each node in the
cluster. Endpoints are `/_nodes/hot_threads`, and
`/_nodes/{nodesIds}/hot_threads`. This API is experimental.
The output is plain text with a breakdown of each node's top hot
threads. Parameters allowed are:
[horizontal]
`threads`:: number of hot threads to provide, defaults to 3.
`interval`:: the interval to do the second sampling of threads.
Defaults to 500ms.
`type`:: The type to sample, defaults to cpu, but supports wait and
block to see hot threads that are in wait or block state.

View File

@ -0,0 +1,98 @@
[[cluster-nodes-info]]
== Nodes Info
The cluster nodes info API allows to retrieve one or more (or all) of
the cluster nodes information.
[source,js]
--------------------------------------------------
curl -XGET 'http://localhost:9200/_cluster/nodes'
curl -XGET 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2'
# Shorter Format
curl -XGET 'http://localhost:9200/_nodes'
curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2'
--------------------------------------------------
The first command retrieves information of all the nodes in the cluster.
The second command selectively retrieves nodes information of only
`nodeId1` and `nodeId2`. All the nodes selective options are explained
<<cluster-nodes,here>>.
By default, it just returns the attributes and core settings for a node.
It also allows to get information on `settings`, `os`, `process`, `jvm`,
`thread_pool`, `network`, `transport`, `http` and `plugin`:
[source,js]
--------------------------------------------------
curl -XGET 'http://localhost:9200/_nodes?os=true&process=true'
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/?os=true&process=true'
# Or, specific type endpoint:
curl -XGET 'http://localhost:9200/_nodes/process'
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/process'
--------------------------------------------------
The `all` flag can be set to return all the information.
`plugin` - if set, the result will contain details about the loaded
plugins per node:
* `name`: plugin name
* `description`: plugin description if any
* `site`: `true` if the plugin is a site plugin
* `jvm`: `true` if the plugin is a plugin running in the JVM
* `url`: URL if the plugin is a site plugin
The result will look similar to:
[source,js]
--------------------------------------------------
{
"ok" : true,
"cluster_name" : "test-cluster-MacBook-Air-de-David.local",
"nodes" : {
"hJLXmY_NTrCytiIMbX4_1g" : {
"name" : "node4",
"transport_address" : "inet[/172.18.58.139:9303]",
"hostname" : "MacBook-Air-de-David.local",
"version" : "0.90.0.Beta2-SNAPSHOT",
"http_address" : "inet[/172.18.58.139:9203]",
"plugins" : [ {
"name" : "test-plugin",
"description" : "test-plugin description",
"site" : true,
"jvm" : false
}, {
"name" : "test-no-version-plugin",
"description" : "test-no-version-plugin description",
"site" : true,
"jvm" : false
}, {
"name" : "dummy",
"description" : "No description found for dummy.",
"url" : "/_plugin/dummy/",
"site" : false,
"jvm" : true
} ]
}
}
}
--------------------------------------------------
if your `plugin` data is subject to change use
`plugins.info_refresh_interval` to change or disable the caching
interval:
[source,js]
--------------------------------------------------
# Change cache to 20 seconds
plugins.info_refresh_interval: 20s
# Infinite cache
plugins.info_refresh_interval: -1
# Disable cache
plugins.info_refresh_interval: 0
--------------------------------------------------

View File

@ -0,0 +1,56 @@
[[cluster-nodes-shutdown]]
== Nodes Shutdown
The nodes shutdown API allows to shutdown one or more (or all) nodes in
the cluster. Here is an example of shutting the `_local` node the
request is directed to:
[source,js]
--------------------------------------------------
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
--------------------------------------------------
Specific node(s) can be shutdown as well using their respective node ids
(or other selective options as explained
<<cluster-nodes,here>> .):
[source,js]
--------------------------------------------------
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
--------------------------------------------------
The master (of the cluster) can also be shutdown using:
[source,js]
--------------------------------------------------
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
--------------------------------------------------
Finally, all nodes can be shutdown using one of the options below:
[source,js]
--------------------------------------------------
$ curl -XPOST 'http://localhost:9200/_shutdown'
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_all/_shutdown'
--------------------------------------------------
[float]
=== Delay
By default, the shutdown will be executed after a 1 second delay (`1s`).
The delay can be customized by setting the `delay` parameter in a time
value format. For example:
[source,js]
--------------------------------------------------
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown?delay=10s'
--------------------------------------------------
[float]
=== Disable Shutdown
The shutdown API can be disabled by setting `action.disable_shutdown` in
the node configuration.

View File

@ -0,0 +1,100 @@
[[cluster-nodes-stats]]
== Nodes Stats
[float]
=== Nodes statistics
The cluster nodes stats API allows to retrieve one or more (or all) of
the cluster nodes statistics.
[source,js]
--------------------------------------------------
curl -XGET 'http://localhost:9200/_cluster/nodes/stats'
curl -XGET 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/stats'
# simplified
curl -XGET 'http://localhost:9200/_nodes/stats'
curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/stats'
--------------------------------------------------
The first command retrieves stats of all the nodes in the cluster. The
second command selectively retrieves nodes stats of only `nodeId1` and
`nodeId2`. All the nodes selective options are explained
<<cluster-nodes,here>>.
By default, `indices` stats are returned. With options for `indices`,
`os`, `process`, `jvm`, `network`, `transport`, `http`, `fs`, and
`thread_pool`. For example:
[horizontal]
`indices`::
Indices stats about size, document count, indexing and
deletion times, search times, field cache size , merges and flushes
`fs`::
File system information, data path, free disk space, read/write
stats
`http`::
HTTP connection information
`jvm`::
JVM stats, memory pool information, garbage collection, buffer
pools
`network`::
TCP information
`os`::
Operating system stats, load average, cpu, mem, swap
`process`::
Process statistics, memory consumption, cpu usage, open
file descriptors
`thread_pool`::
Statistics about each thread pool, including current
size, queue and rejected tasks
`transport`::
Transport statistics about sent and received bytes in
cluster communication
`clear`::
Clears all the flags (first). Useful, if you only want to
retrieve specific stats.
[source,js]
--------------------------------------------------
# return indices and os
curl -XGET 'http://localhost:9200/_nodes/stats?os=true'
# return just os and process
curl -XGET 'http://localhost:9200/_nodes/stats?clear=true&os=true&process=true'
# specific type endpoint
curl -XGET 'http://localhost:9200/_nodes/process/stats'
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/process/stats'
# or, if you like the other way
curl -XGET 'http://localhost:9200/_nodes/stats/process'
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/stats/process'
--------------------------------------------------
The `all` flag can be set to return all the stats.
[float]
=== Field data statistics
From 0.90, you can get information about field data memory usage on node
level or on index level.
[source,js]
--------------------------------------------------
# Node Stats
curl localhost:9200/_nodes/stats/indices/fielddata/field1,field2?pretty
# Indices Stat
curl localhost:9200/_stats/fielddata/field1,field2?pretty
# You can use wildcards for field names
curl localhost:9200/_stats/fielddata/field*?pretty
curl localhost:9200/_nodes/stats/indices/fielddata/field*?pretty
--------------------------------------------------

View File

@ -0,0 +1,68 @@
[[cluster-reroute]]
== Cluster Reroute
The reroute command allows to explicitly execute a cluster reroute
allocation command including specific commands. For example, a shard can
be moved from one node to another explicitly, an allocation can be
canceled, or an unassigned shard can be explicitly allocated on a
specific node.
Here is a short example of how a simple reroute API call:
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"move" :
{
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate" : {
"index" : "test", "shard" : 1, "node" : "node3"
}
}
]
}'
--------------------------------------------------
An important aspect to remember is the fact that once when an allocation
occurs, the cluster will aim at re-balancing its state back to an even
state. For example, if the allocation includes moving a shard from
`node1` to `node2`, in an `even` state, then another shard will be moved
from `node2` to `node1` to even things out.
The cluster can be set to disable allocations, which means that only the
explicitly allocations will be performed. Obviously, only once all
commands has been applied, the cluster will aim to be re-balance its
state.
Another option is to run the commands in `dry_run` (as a URI flag, or in
the request body). This will cause the commands to apply to the current
cluster state, and return the resulting cluster after the commands (and
re-balancing) has been applied.
The commands supported are:
`move`::
Move a started shard from one node to another node. Accepts
`index` and `shard` for index name and shard number, `from_node` for the
node to move the shard `from`, and `to_node` for the node to move the
shard to.
`cancel`::
Cancel allocation of a shard (or recovery). Accepts `index`
and `shard` for index name and shard number, and `node` for the node to
cancel the shard allocation on. It also accepts `allow_primary` flag to
explicitly specify that it is allowed to cancel allocation for a primary
shard.
`allocate`::
Allocate an unassigned shard to a node. Accepts the
`index` and `shard` for index name and shard number, and `node` to
allocate the shard to. It also accepts `allow_primary` flag to
explicitly specify that it is allowed to explicitly allocate a primary
shard (might result in data loss).

View File

@ -0,0 +1,48 @@
[[cluster-state]]
== Cluster State
The cluster state API allows to get a comprehensive state information of
the whole cluster.
[source,js]
--------------------------------------------------
$ curl -XGET 'http://localhost:9200/_cluster/state'
--------------------------------------------------
By default, the cluster state request is routed to the master node, to
ensure that the latest cluster state is returned.
For debugging purposes, you can retrieve the cluster state local to a
particular node by adding `local=true` to the query string.
[float]
=== Response Filters
It is possible to filter the cluster state response using the following
REST parameters:
`filter_nodes`::
Set to `true` to filter out the `nodes` part of the
response.
`filter_routing_table`::
Set to `true` to filter out the `routing_table`
part of the response.
`filter_metadata`::
Set to `true` to filter out the `metadata` part of the
response.
`filter_blocks`::
Set to `true` to filter out the `blocks` part of the
response.
`filter_indices`::
When not filtering metadata, a comma separated list of
indices to include in the response.
Example follows:
[source,js]
--------------------------------------------------
$ curl -XGET 'http://localhost:9200/_cluster/state?filter_nodes=true'
--------------------------------------------------

View File

@ -0,0 +1,198 @@
[[cluster-update-settings]]
== Cluster Update Settings
Allows to update cluster wide specific settings. Settings updated can
either be persistent (applied cross restarts) or transient (will not
survive a full cluster restart). Here is an example:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"discovery.zen.minimum_master_nodes" : 2
}
}'
--------------------------------------------------
Or:
[source,js]
--------------------------------------------------
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"discovery.zen.minimum_master_nodes" : 2
}
}'
--------------------------------------------------
The cluster responds with the settings updated. So the response for the
last example will be:
[source,js]
--------------------------------------------------
{
"persistent" : {},
"transient" : {
"discovery.zen.minimum_master_nodes" : "2"
}
}'
--------------------------------------------------
Cluster wide settings can be returned using:
[source,js]
--------------------------------------------------
curl -XGET localhost:9200/_cluster/settings
--------------------------------------------------
There is a specific list of settings that can be updated, those include:
[float]
=== Cluster settings
[float]
==== Routing allocation
[float]
===== Awareness
`cluster.routing.allocation.awareness.attributes`::
See <<modules-cluster>>.
`cluster.routing.allocation.awareness.force.*`::
See <<modules-cluster>>.
[float]
===== Balanced Shards
`cluster.routing.allocation.balance.shard`::
Defines the weight factor for shards allocated on a node
(float). Defaults to `0.45f`.
`cluster.routing.allocation.balance.index`::
Defines a factor to the number of shards per index allocated
on a specific node (float). Defaults to `0.5f`.
`cluster.routing.allocation.balance.primary`::
defines a weight factor for the number of primaries of a specific index
allocated on a node (float). `0.05f`.
`cluster.routing.allocation.balance.threshold`::
minimal optimization value of operations that should be performed (non
negative float). Defaults to `1.0f`.
[float]
===== Concurrent Rebalance
`cluster.routing.allocation.cluster_concurrent_rebalance`::
Allow to control how many concurrent rebalancing of shards are
allowed cluster wide, and default it to `2` (integer). `-1` for
unlimited. See also <<modules-cluster>>.
[float]
===== Disable allocation
`cluster.routing.allocation.disable_allocation`::
See <<modules-cluster>>.
`cluster.routing.allocation.disable_replica_allocation`::
See <<modules-cluster>>.
`cluster.routing.allocation.disable_new_allocation`::
See <<modules-cluster>>.
[float]
===== Throttling allocation
`cluster.routing.allocation.node_initial_primaries_recoveries`::
See <<modules-cluster>>.
`cluster.routing.allocation.node_concurrent_recoveries`::
See <<modules-cluster>>.
[float]
===== Filter allocation
`cluster.routing.allocation.include.*`::
See <<modules-cluster>>.
`cluster.routing.allocation.exclude.*`::
See <<modules-cluster>>.
`cluster.routing.allocation.require.*` (from 0.90)::
See <<modules-cluster>>.
[float]
==== Metadata
`cluster.blocks.read_only`::
Have the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices).
[float]
==== Discovery
`discovery.zen.minimum_master_nodes`::
See <<modules-discovery-zen>>
[float]
==== Threadpools
`threadpool.*`::
See <<modules-threadpool>>
[float]
=== Index settings
[float]
==== Index filter cache
`indices.cache.filter.size`::
See <<index-modules-cache>>
`indices.cache.filter.expire` (time)::
See <<index-modules-cache>>
[float]
==== TTL interval
`indices.ttl.interval` (time)::
See <<mapping-ttl-field>>
[float]
==== Recovery
`indices.recovery.concurrent_streams`::
See <<modules-indices>>
`indices.recovery.file_chunk_size`::
See <<modules-indices>>
`indices.recovery.translog_ops`::
See <<modules-indices>>
`indices.recovery.translog_size`::
See <<modules-indices>>
`indices.recovery.compress`::
See <<modules-indices>>
`indices.recovery.max_bytes_per_sec`::
Since 0.90.1. See <<modules-indices>>
`indices.recovery.max_size_per_sec`::
Deprecated since 0.90.1. See `max_bytes_per_sec` instead.
[float]
==== Store level throttling
`indices.store.throttle.type`::
See <<index-modules-store>>
`indices.store.throttle.max_bytes_per_sec`::
See <<index-modules-store>>
[float]
=== Logger
Logger values can also be updated by setting `logger.` prefix. More
settings will be allowed to be updated.

View File

@ -0,0 +1,45 @@
[[search-common-options]]
== Common Options
=== Pretty Results
When appending `?pretty=true` to any request made, the JSON returned
will be pretty formatted (use it for debugging only!). Another option is
to set `format=yaml` which will cause the result to be returned in the
(sometimes) more readable yaml format.
=== Parameters
Rest parameters (when using HTTP, map to HTTP URL parameters) follow the
convention of using underscore casing.
=== Boolean Values
All REST APIs parameters (both request parameters and JSON body) support
providing boolean "false" as the values: `false`, `0`, `no` and `off`.
All other values are considered "true". Note, this is not related to
fields within a document indexed treated as boolean fields.
=== Number Values
All REST APIs support providing numbered parameters as `string` on top
of supporting the native JSON number types.
=== Result Casing
All REST APIs accept the `case` parameter. When set to `camelCase`, all
field names in the result will be returned in camel casing, otherwise,
underscore casing will be used. Note, this does not apply to the source
document indexed.
=== JSONP
All REST APIs accept a `callback` parameter resulting in a
http://en.wikipedia.org/wiki/JSONP[JSONP] result.
=== Request body in query string
For libraries that don't accept a request body for non-POST requests,
you can pass the request body as the `source` query string parameter
instead.

View File

@ -0,0 +1,31 @@
[[docs]]
= Document APIs
[partintro]
--
This section describes the REST APIs *elasticsearch* provides (mainly)
using JSON. The API is exposed using
<<modules-http,HTTP>>,
<<modules-thrift,thrift>>,
<<modules-memcached,memcached>>.
--
include::docs/index_.asciidoc[]
include::docs/get.asciidoc[]
include::docs/delete.asciidoc[]
include::docs/update.asciidoc[]
include::docs/multi-get.asciidoc[]
include::docs/bulk.asciidoc[]
include::docs/delete-by-query.asciidoc[]
include::docs/bulk-udp.asciidoc[]

View File

@ -0,0 +1,57 @@
[[docs-bulk-udp]]
== Bulk UDP API
A Bulk UDP service is a service listening over UDP for bulk format
requests. The idea is to provide a low latency UDP service that allows
to easily index data that is not of critical nature.
The Bulk UDP service is disabled by default, but can be enabled by
setting `bulk.udp.enabled` to `true`.
The bulk UDP service performs internal bulk aggregation of the data and
then flushes it based on several parameters:
`bulk.udp.bulk_actions`::
The number of actions to flush a bulk after,
defaults to `1000`.
`bulk.udp.bulk_size`::
The size of the current bulk request to flush
the request once exceeded, defaults to `5mb`.
`bulk.udp.flush_interval`::
An interval after which the current
request is flushed, regardless of the above limits. Defaults to `5s`.
`bulk.udp.concurrent_requests`::
The number on max in flight bulk
requests allowed. Defaults to `4`.
The allowed network settings are:
`bulk.udp.host`::
The host to bind to, defaults to `network.host`
which defaults to any.
`bulk.udp.port`::
The port to use, defaults to `9700-9800`.
`bulk.udp.receive_buffer_size`::
The receive buffer size, defaults to `10mb`.
Here is an example of how it can be used:
[source,js]
--------------------------------------------------
> cat bulk.txt
{ "index" : { "_index" : "test", "_type" : "type1" } }
{ "field1" : "value1" }
{ "index" : { "_index" : "test", "_type" : "type1" } }
{ "field1" : "value1" }
--------------------------------------------------
[source,js]
--------------------------------------------------
> cat bulk.txt | nc -w 0 -u localhost 9700
--------------------------------------------------

View File

@ -0,0 +1,174 @@
[[docs-bulk]]
== Bulk API
The bulk API makes it possible to perform many index/delete operations
in a single API call. This can greatly increase the indexing speed. The
REST API endpoint is `/_bulk`, and it expects the following JSON
structure:
[source,js]
--------------------------------------------------
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
--------------------------------------------------
*NOTE*: the final line of data must end with a newline character `\n`.
The possible actions are `index`, `create`, `delete` and since version
`0.90.1` also `update`. `index` and `create` expect a source on the next
line, and have the same semantics as the `op_type` parameter to the
standard index API (i.e. create will fail if a document with the same
index and type exists already, whereas index will add or replace a
document as necessary). `delete` does not expect a source on the
following line, and has the same semantics as the standard delete API.
`update` expects that the partial doc, upsert and script and its options
are specified on the next line.
If you're providing text file input to `curl`, you *must* use the
`--data-binary` flag instead of plain `-d`. The latter doesn't preserve
newlines. Example:
[source,js]
--------------------------------------------------
$ cat requests
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo
{"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1,"ok":true}}]}
--------------------------------------------------
Because this format uses literal `\n`'s as delimiters, please be sure
that the JSON actions and sources are not pretty printed. Here is an
example of a correct sequence of bulk commands:
[source,js]
--------------------------------------------------
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
--------------------------------------------------
In the above example `doc` for the `update` action is a partial
document, that will be merged with the already stored document.
The endpoints are `/_bulk`, `/{index}/_bulk`, and `{index}/type/_bulk`.
When the index or the index/type are provided, they will be used by
default on bulk items that don't provide them explicitly.
A note on the format. The idea here is to make processing of this as
fast as possible. As some of the actions will be redirected to other
shards on other nodes, only `action_meta_data` is parsed on the
receiving node side.
Client libraries using this protocol should try and strive to do
something similar on the client side, and reduce buffering as much as
possible.
The response to a bulk action is a large JSON structure with the
individual results of each action that was performed. The failure of a
single action does not affect the remaining actions.
There is no "correct" number of actions to perform in a single bulk
call. You should experiment with different settings to find the optimum
size for your particular workload.
If using the HTTP API, make sure that the client does not send HTTP
chunks, as this will slow things down.
[float]
=== Versioning
Each bulk item can include the version value using the
`_version`/`version` field. It automatically follows the behavior of the
index / delete operation based on the `_version` mapping. It also
support the `version_type`/`_version_type` when using `external`
versioning.
[float]
=== Routing
Each bulk item can include the routing value using the
`_routing`/`routing` field. It automatically follows the behavior of the
index / delete operation based on the `_routing` mapping.
[float]
=== Percolator
Each bulk index action can include a percolate value using the
`_percolate`/`percolate` field.
[float]
=== Parent
Each bulk item can include the parent value using the `_parent`/`parent`
field. It automatically follows the behavior of the index / delete
operation based on the `_parent` / `_routing` mapping.
[float]
=== Timestamp
Each bulk item can include the timestamp value using the
`_timestamp`/`timestamp` field. It automatically follows the behavior of
the index operation based on the `_timestamp` mapping.
[float]
=== TTL
Each bulk item can include the ttl value using the `_ttl`/`ttl` field.
It automatically follows the behavior of the index operation based on
the `_ttl` mapping.
[float]
=== Write Consistency
When making bulk calls, you can require a minimum number of active
shards in the partition through the `consistency` parameter. The values
allowed are `one`, `quorum`, and `all`. It defaults to the node level
setting of `action.write_consistency`, which in turn defaults to
`quorum`.
For example, in a N shards with 2 replicas index, there will have to be
at least 2 active shards within the relevant partition (`quorum`) for
the operation to succeed. In a N shards with 1 replica scenario, there
will need to be a single shard active (in this case, `one` and `quorum`
is the same).
[float]
=== Refresh
The `refresh` parameter can be set to `true` in order to refresh the
relevant shards immediately after the bulk operation has occurred and
make it searchable, instead of waiting for the normal refresh interval
to expire. Setting it to `true` can trigger additional load, and may
slow down indexing.
[float]
=== Update
When using `update` action `_retry_on_conflict` can be used as field in
the action itself (not in the extra payload line), to specify how many
times an update should be retried in the case of a version conflict.
The `update` action payload, supports the following options: `doc`
(partial document), `upsert`, `doc_as_upsert`, `script`, `params` (for
script), `lang` (for script). See update documentation for details on
the options. Curl example with update actions:
[source,js]
--------------------------------------------------
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "script" : "ctx._source.counter += param1", "lang" : "js", "params" : {"param1" : 1}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
--------------------------------------------------

Some files were not shown because too many files have changed in this diff Show More