Migrated documentation into the main repo
This commit is contained in:
parent
b9558edeff
commit
822043347e
|
@ -8,6 +8,8 @@ logs/
|
|||
build/
|
||||
target/
|
||||
.local-execution-hints.log
|
||||
docs/html/
|
||||
docs/build.log
|
||||
|
||||
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
|
||||
## The only configuration files which are not ignored are .settings since
|
||||
|
@ -19,6 +21,7 @@ target/
|
|||
*/.project
|
||||
*/.classpath
|
||||
*/eclipse-build
|
||||
.settings/
|
||||
|
||||
## netbeans ignores
|
||||
nb-configuration.xml
|
||||
|
|
|
@ -0,0 +1,152 @@
|
|||
== Clients
|
||||
|
||||
[float]
|
||||
=== Perl
|
||||
|
||||
* http://github.com/clintongormley/ElasticSearch.pm[ElasticSearch.pm]:
|
||||
Perl client.
|
||||
|
||||
[float]
|
||||
=== Python
|
||||
|
||||
* http://github.com/aparo/pyes[pyes]:
|
||||
Python client.
|
||||
|
||||
* http://github.com/rhec/pyelasticsearch[pyelasticsearch]:
|
||||
Python client.
|
||||
|
||||
* https://github.com/eriky/ESClient[ESClient]:
|
||||
A lightweight and easy to use Python client for ElasticSearch.
|
||||
|
||||
* https://github.com/humangeo/rawes[rawes]:
|
||||
Python low level client.
|
||||
|
||||
* https://github.com/mozilla/elasticutils/[elasticutils]:
|
||||
A friendly chainable ElasticSearch interface for Python.
|
||||
|
||||
* http://intridea.github.io/surfiki-refine-elasticsearch/[Surfiki Refine]:
|
||||
Python Map-Reduce engine targeting Elasticsearch indices.
|
||||
|
||||
[float]
|
||||
=== Ruby
|
||||
|
||||
* http://github.com/karmi/tire[Tire]:
|
||||
Ruby API & DSL, with ActiveRecord/ActiveModel integration.
|
||||
|
||||
* http://github.com/grantr/rubberband[rubberband]:
|
||||
Ruby client.
|
||||
|
||||
* https://github.com/PoseBiz/stretcher[stretcher]:
|
||||
Ruby client.
|
||||
|
||||
* https://github.com/wireframe/elastic_searchable/[elastic_searchable]:
|
||||
Ruby client + Rails integration.
|
||||
|
||||
[float]
|
||||
=== PHP
|
||||
|
||||
* http://github.com/ruflin/Elastica[Elastica]:
|
||||
PHP client.
|
||||
|
||||
* http://github.com/nervetattoo/elasticsearch[elasticsearch] PHP client.
|
||||
|
||||
* http://github.com/polyfractal/Sherlock[Sherlock]:
|
||||
PHP client, one-to-one mapping with query DSL, fluid interface.
|
||||
|
||||
[float]
|
||||
=== Java
|
||||
|
||||
* https://github.com/searchbox-io/Jest[Jest]:
|
||||
Java Rest client.
|
||||
|
||||
[float]
|
||||
=== Javascript
|
||||
|
||||
* https://github.com/fullscale/elastic.js[Elastic.js]:
|
||||
A JavaScript implementation of the ElasticSearch Query DSL and Core API.
|
||||
|
||||
* https://github.com/phillro/node-elasticsearch-client[node-elasticsearch-client]:
|
||||
A NodeJS client for elastic search.
|
||||
|
||||
* https://github.com/ramv/node-elastical[node-elastical]:
|
||||
Node.js client for the ElasticSearch REST API
|
||||
|
||||
[float]
|
||||
=== .Net
|
||||
|
||||
* https://github.com/Yegoroff/PlainElastic.Net[PlainElastic.Net]:
|
||||
.NET client.
|
||||
|
||||
* https://github.com/Mpdreamz/NEST[NEST]:
|
||||
.NET client.
|
||||
|
||||
* https://github.com/medcl/ElasticSearch.Net[ElasticSearch.NET]:
|
||||
.NET client.
|
||||
|
||||
[float]
|
||||
=== Scala
|
||||
|
||||
* https://github.com/sksamuel/elastic4s[elastic4s]:
|
||||
Scala DSL.
|
||||
|
||||
* https://github.com/scalastuff/esclient[esclient]:
|
||||
Thin Scala client.
|
||||
|
||||
* https://github.com/bsadeh/scalastic[scalastic]:
|
||||
Scala client.
|
||||
|
||||
[float]
|
||||
=== Clojure
|
||||
|
||||
* http://github.com/clojurewerkz/elastisch[Elastisch]:
|
||||
Clojure client.
|
||||
|
||||
[float]
|
||||
=== Go
|
||||
|
||||
* https://github.com/mattbaird/elastigo[elastigo]:
|
||||
Go client.
|
||||
|
||||
* https://github.com/belogik/goes[goes]:
|
||||
Go lib.
|
||||
|
||||
[float]
|
||||
=== Erlang
|
||||
|
||||
* http://github.com/tsloughter/erlastic_search[erlastic_search]:
|
||||
Erlang client using HTTP.
|
||||
|
||||
* https://github.com/dieswaytoofast/erlasticsearch[erlasticsearch]:
|
||||
Erlang client using Thrift.
|
||||
|
||||
* https://github.com/datahogs/tirexs[Tirexs]:
|
||||
An https://github.com/elixir-lang/elixir[Elixir] based API/DSL, inspired by
|
||||
http://github.com/karmi/tire[Tire]. Ready to use in pure Erlang
|
||||
environment.
|
||||
|
||||
[float]
|
||||
=== EventMachine
|
||||
|
||||
* http://github.com/vangberg/em-elasticsearch[em-elasticsearch]:
|
||||
elasticsearch library for eventmachine.
|
||||
|
||||
[float]
|
||||
=== Command Line
|
||||
|
||||
* https://github.com/elasticsearch/es2unix[es2unix]:
|
||||
Elasticsearch API consumable by the Linux command line.
|
||||
|
||||
* https://github.com/javanna/elasticshell[elasticshell]:
|
||||
command line shell for elasticsearch.
|
||||
|
||||
[float]
|
||||
=== OCaml
|
||||
|
||||
* https://github.com/tovbinm/ocaml-elasticsearch[ocaml-elasticsearch]:
|
||||
OCaml client for Elasticsearch
|
||||
|
||||
[float]
|
||||
=== Smalltalk
|
||||
|
||||
* http://ss3.gemstone.com/ss/Elasticsearch.html[Elasticsearch] -
|
||||
Smalltalk client for Elasticsearch
|
|
@ -0,0 +1,16 @@
|
|||
== Front Ends
|
||||
|
||||
* https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo[Sense]:
|
||||
Chrome curl-like plugin for runninq requests against an Elasticsearch node
|
||||
|
||||
* https://github.com/mobz/elasticsearch-head[elasticsearch-head]:
|
||||
A web front end for an elastic search cluster.
|
||||
|
||||
* https://github.com/OlegKunitsyn/elasticsearch-browser[browser]:
|
||||
Web front-end over elasticsearch data.
|
||||
|
||||
* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor]:
|
||||
Front-end to help debug/diagnose queries and analyzers
|
||||
|
||||
* http://elastichammer.exploringelasticsearch.com/[Hammer]:
|
||||
Web front-end for elasticsearch
|
|
@ -0,0 +1,5 @@
|
|||
== GitHub
|
||||
|
||||
GitHub is a place where a lot of development is done around
|
||||
*elasticsearch*, here is a simple search for
|
||||
https://github.com/search?q=elasticsearch&type=Repositories[repositories].
|
|
@ -0,0 +1,15 @@
|
|||
= Community Supported Clients
|
||||
|
||||
|
||||
include::clients.asciidoc[]
|
||||
|
||||
include::frontends.asciidoc[]
|
||||
|
||||
include::integrations.asciidoc[]
|
||||
|
||||
include::misc.asciidoc[]
|
||||
|
||||
include::monitoring.asciidoc[]
|
||||
|
||||
include::github.asciidoc[]
|
||||
|
|
@ -0,0 +1,71 @@
|
|||
== Integrations
|
||||
|
||||
|
||||
* http://grails.org/plugin/elasticsearch[Grails]:
|
||||
ElasticSearch Grails plugin.
|
||||
|
||||
* https://github.com/carrot2/elasticsearch-carrot2[carrot2]:
|
||||
Results clustering with carrot2
|
||||
|
||||
* https://github.com/angelf/escargot[escargot]:
|
||||
ElasticSearch connector for Rails (WIP).
|
||||
|
||||
* https://metacpan.org/module/Catalyst::Model::Search::ElasticSearch[Catalyst]:
|
||||
ElasticSearch and Catalyst integration.
|
||||
|
||||
* http://github.com/aparo/django-elasticsearch[django-elasticsearch]:
|
||||
Django ElasticSearch Backend.
|
||||
|
||||
* http://github.com/Aconex/elasticflume[elasticflume]:
|
||||
http://github.com/cloudera/flume[Flume] sink implementation.
|
||||
|
||||
* http://code.google.com/p/terrastore/wiki/Search_Integration[Terrastore Search]:
|
||||
http://code.google.com/p/terrastore/[Terrastore] integration module with elasticsearch.
|
||||
|
||||
* https://github.com/infochimps/wonderdog[Wonderdog]:
|
||||
Hadoop bulk loader into elasticsearch.
|
||||
|
||||
* http://geeks.aretotally.in/play-framework-module-elastic-search-distributed-searching-with-json-http-rest-or-java[Play!Framework]:
|
||||
Integrate with Play! Framework Application.
|
||||
|
||||
* https://github.com/Exercise/FOQElasticaBundle[ElasticaBundle]:
|
||||
Symfony2 Bundle wrapping Elastica.
|
||||
|
||||
* http://drupal.org/project/elasticsearch[Drupal]:
|
||||
Drupal ElasticSearch integration.
|
||||
|
||||
* https://github.com/refuge/couch_es[couch_es]:
|
||||
elasticsearch helper for couchdb based products (apache couchdb, bigcouch & refuge)
|
||||
|
||||
* https://github.com/sonian/elasticsearch-jetty[Jetty]:
|
||||
Jetty HTTP Transport
|
||||
|
||||
* https://github.com/dadoonet/spring-elasticsearch[Spring Elasticsearch]:
|
||||
Spring Factory for Elasticsearch
|
||||
|
||||
* https://camel.apache.org/elasticsearch.html[Apache Camel Integration]:
|
||||
An Apache camel component to integrate elasticsearch
|
||||
|
||||
* https://github.com/tlrx/elasticsearch-test[elasticsearch-test]:
|
||||
Elasticsearch Java annotations for unit testing with
|
||||
http://www.junit.org/[JUnit]
|
||||
|
||||
* http://searchbox-io.github.com/wp-elasticsearch/[Wp-ElasticSearch]:
|
||||
ElasticSearch WordPress Plugin
|
||||
|
||||
* https://github.com/OlegKunitsyn/eslogd[eslogd]:
|
||||
Linux daemon that replicates events to a central ElasticSearch server in real-time
|
||||
|
||||
* https://github.com/drewr/elasticsearch-clojure-repl[elasticsearch-clojure-repl]:
|
||||
Plugin that embeds nREPL for run-time introspective adventure! Also
|
||||
serves as an nREPL transport.
|
||||
|
||||
* http://haystacksearch.org/[Haystack]:
|
||||
Modular search for Django
|
||||
|
||||
* https://github.com/cleverage/play2-elasticsearch[play2-elasticsearch]:
|
||||
ElasticSearch module for Play Framework 2.x
|
||||
|
||||
* https://github.com/fullscale/dangle[dangle]:
|
||||
A set of AngularJS directives that provide common visualizations for elasticsearch based on
|
||||
D3.
|
|
@ -0,0 +1,17 @@
|
|||
== Misc
|
||||
|
||||
* https://github.com/electrical/puppet-elasticsearch[Puppet]:
|
||||
Elasticsearch puppet module.
|
||||
|
||||
* http://github.com/elasticsearch/cookbook-elasticsearch[Chef]:
|
||||
Chef cookbook for Elasticsearch
|
||||
|
||||
* https://github.com/tavisto/elasticsearch-rpms[elasticsearch-rpms]:
|
||||
RPMs for elasticsearch.
|
||||
|
||||
* http://www.github.com/neogenix/daikon[daikon]:
|
||||
Daikon ElasticSearch CLI
|
||||
|
||||
* https://github.com/Aconex/scrutineer[Scrutineer]:
|
||||
A high performance consistency checker to compare what you've indexed
|
||||
with your source of truth content (e.g. DB)
|
|
@ -0,0 +1,27 @@
|
|||
== Health and Performance Monitoring
|
||||
|
||||
* https://github.com/lukas-vlcek/bigdesk[bigdesk]:
|
||||
Live charts and statistics for elasticsearch cluster.
|
||||
|
||||
* https://github.com/karmi/elasticsearch-paramedic[paramedic]:
|
||||
Live charts with cluster stats and indices/shards information.
|
||||
|
||||
* http://www.elastichq.org/[ElasticSearchHQ]:
|
||||
Free cluster health monitoring tool
|
||||
|
||||
* http://sematext.com/spm/index.html[SPM for ElasticSearch]:
|
||||
Performance monitoring with live charts showing cluster and node stats, integrated
|
||||
alerts, email reports, etc.
|
||||
|
||||
* https://github.com/radu-gheorghe/check-es[check-es]:
|
||||
Nagios/Shinken plugins for checking on elasticsearch
|
||||
|
||||
* https://github.com/anchor/nagios-plugin-elasticsearch[check_elasticsearch]:
|
||||
An ElasticSearch availability and performance monitoring plugin for
|
||||
Nagios.
|
||||
|
||||
* https://github.com/rbramley/Opsview-elasticsearch[opsview-elasticsearch]:
|
||||
Opsview plugin written in Perl for monitoring ElasticSearch
|
||||
|
||||
* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy]:
|
||||
Plugin to watch Lucene segment merges across your cluster
|
|
@ -0,0 +1,99 @@
|
|||
[[anatomy]]
|
||||
== API Anatomy
|
||||
|
||||
Once a <<client,GClient>> has been
|
||||
obtained, all of ElasticSearch APIs can be executed on it. Each Groovy
|
||||
API is exposed using three different mechanisms.
|
||||
|
||||
[float]
|
||||
=== Closure Request
|
||||
|
||||
The first type is to simply provide the request as a Closure, which
|
||||
automatically gets resolved into the respective request instance (for
|
||||
the index API, its the `IndexRequest` class). The API returns a special
|
||||
future, called `GActionFuture`. This is a groovier version of
|
||||
elasticsearch Java `ActionFuture` (in turn a nicer extension to Java own
|
||||
`Future`) which allows to register listeners (closures) on it for
|
||||
success and failures, as well as blocking for the response. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def indexR = client.index {
|
||||
index "test"
|
||||
type "type1"
|
||||
id "1"
|
||||
source {
|
||||
test = "value"
|
||||
complex {
|
||||
value1 = "value1"
|
||||
value2 = "value2"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
println "Indexed $indexR.response.id into $indexR.response.index/$indexR.response.type"
|
||||
--------------------------------------------------
|
||||
|
||||
In the above example, calling `indexR.response` will simply block for
|
||||
the response. We can also block for the response for a specific timeout:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
IndexResponse response = indexR.response "5s" // block for 5 seconds, same as:
|
||||
response = indexR.response 5, TimeValue.SECONDS //
|
||||
--------------------------------------------------
|
||||
|
||||
We can also register closures that will be called on success and on
|
||||
failure:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
indexR.success = {IndexResponse response ->
|
||||
pritnln "Indexed $response.id into $response.index/$response.type"
|
||||
}
|
||||
indexR.failure = {Throwable t ->
|
||||
println "Failed to index: $t.message"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Request
|
||||
|
||||
This option allows to pass the actual instance of the request (instead
|
||||
of a closure) as a parameter. The rest is similar to the closure as a
|
||||
parameter option (the `GActionFuture` handling). For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def indexR = client.index (new IndexRequest(
|
||||
index: "test",
|
||||
type: "type1",
|
||||
id: "1",
|
||||
source: {
|
||||
test = "value"
|
||||
complex {
|
||||
value1 = "value1"
|
||||
value2 = "value2"
|
||||
}
|
||||
}))
|
||||
|
||||
println "Indexed $indexR.response.id into $indexR.response.index/$indexR.response.type"
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Java Like
|
||||
|
||||
The last option is to provide an actual instance of the API request, and
|
||||
an `ActionListener` for the callback. This is exactly like the Java API
|
||||
with the added `gexecute` which returns the `GActionFuture`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def indexR = node.client.prepareIndex("test", "type1", "1").setSource({
|
||||
test = "value"
|
||||
complex {
|
||||
value1 = "value1"
|
||||
value2 = "value2"
|
||||
}
|
||||
}).gexecute()
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,58 @@
|
|||
[[client]]
|
||||
== Client
|
||||
|
||||
Obtaining an elasticsearch Groovy `GClient` (a `GClient` is a simple
|
||||
wrapper on top of the Java `Client`) is simple. The most common way to
|
||||
get a client is by starting an embedded `Node` which acts as a node
|
||||
within the cluster.
|
||||
|
||||
[float]
|
||||
=== Node Client
|
||||
|
||||
A Node based client is the simplest form to get a `GClient` to start
|
||||
executing operations against elasticsearch.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.groovy.client.GClient
|
||||
import org.elasticsearch.groovy.node.GNode
|
||||
import static org.elasticsearch.groovy.node.GNodeBuilder.nodeBuilder
|
||||
|
||||
// on startup
|
||||
|
||||
GNode node = nodeBuilder().node();
|
||||
GClient client = node.client();
|
||||
|
||||
// on shutdown
|
||||
|
||||
node.close();
|
||||
--------------------------------------------------
|
||||
|
||||
Since elasticsearch allows to configure it using JSON based settings,
|
||||
the configuration itself can be done using a closure that represent the
|
||||
JSON:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.groovy.node.GNode
|
||||
import org.elasticsearch.groovy.node.GNodeBuilder
|
||||
import static org.elasticsearch.groovy.node.GNodeBuilder.*
|
||||
|
||||
// on startup
|
||||
|
||||
GNodeBuilder nodeBuilder = nodeBuilder();
|
||||
nodeBuilder.settings {
|
||||
node {
|
||||
client = true
|
||||
}
|
||||
cluster {
|
||||
name = "test"
|
||||
}
|
||||
}
|
||||
|
||||
GNode node = nodeBuilder.node()
|
||||
|
||||
// on shutdown
|
||||
|
||||
node.stop().close()
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,22 @@
|
|||
[[count]]
|
||||
== Count API
|
||||
|
||||
The count API is very similar to the
|
||||
link:{java}/count.html[Java count API]. The Groovy
|
||||
extension allows to provide the query to execute as a `Closure` (similar
|
||||
to GORM criteria builder):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def count = client.count {
|
||||
indices "test"
|
||||
types "type1"
|
||||
query {
|
||||
term {
|
||||
test = "value"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The query follows the same link:{ref}/query-dsl.html[Query DSL].
|
|
@ -0,0 +1,15 @@
|
|||
[[delete]]
|
||||
== Delete API
|
||||
|
||||
The delete API is very similar to the
|
||||
link:{java}/delete.html[Java delete API], here is an
|
||||
example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def deleteF = node.client.delete {
|
||||
index "test"
|
||||
type "type1"
|
||||
id "1"
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,18 @@
|
|||
[[get]]
|
||||
== Get API
|
||||
|
||||
The get API is very similar to the
|
||||
link:{java}/get.html[Java get API]. The main benefit
|
||||
of using groovy is handling the source content. It can be automatically
|
||||
converted to a `Map` which means using Groovy to navigate it is simple:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def getF = node.client.get {
|
||||
index "test"
|
||||
type "type1"
|
||||
id "1"
|
||||
}
|
||||
|
||||
println "Result of field2: $getF.response.source.complex.field2"
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,50 @@
|
|||
= Groovy API
|
||||
:ref: http://www.elasticsearch.org/guide/elasticsearch/reference/current
|
||||
:java: http://www.elasticsearch.org/guide/elasticsearch/client/java-api/current
|
||||
|
||||
[preface]
|
||||
== Preface
|
||||
|
||||
This section describes the http://groovy.codehaus.org/[Groovy] API
|
||||
elasticsearch provides. All elasticsearch APIs are executed using a
|
||||
<<client,GClient>>, and are completely
|
||||
asynchronous in nature (they either accept a listener, or return a
|
||||
future).
|
||||
|
||||
The Groovy API is a wrapper on top of the
|
||||
link:{java}[Java API] exposing it in a groovier
|
||||
manner. The execution options for each API follow a similar manner and
|
||||
covered in <<anatomy>>.
|
||||
|
||||
[float]
|
||||
==== Maven Repository
|
||||
|
||||
The Groovy API is hosted on
|
||||
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22elasticsearch-client-groovy%22[Maven
|
||||
Central].
|
||||
|
||||
For example, you can define the latest version in your `pom.xml` file:
|
||||
|
||||
[source,xml]
|
||||
--------------------------------------------------
|
||||
<dependency>
|
||||
<groupId>org.elasticsearch</groupId>
|
||||
<artifactId>elasticsearch-client-groovy</artifactId>
|
||||
<version>${es.version}</version>
|
||||
</dependency>
|
||||
--------------------------------------------------
|
||||
|
||||
include::anatomy.asciidoc[]
|
||||
|
||||
include::client.asciidoc[]
|
||||
|
||||
include::index_.asciidoc[]
|
||||
|
||||
include::get.asciidoc[]
|
||||
|
||||
include::delete.asciidoc[]
|
||||
|
||||
include::search.asciidoc[]
|
||||
|
||||
include::count.asciidoc[]
|
||||
|
|
@ -0,0 +1,31 @@
|
|||
[[index_]]
|
||||
== Index API
|
||||
|
||||
The index API is very similar to the
|
||||
link:{java}/index_.html[Java index API]. The Groovy
|
||||
extension to it is the ability to provide the indexed source using a
|
||||
closure. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def indexR = client.index {
|
||||
index "test"
|
||||
type "type1"
|
||||
id "1"
|
||||
source {
|
||||
test = "value"
|
||||
complex {
|
||||
value1 = "value1"
|
||||
value2 = "value2"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In the above example, the source closure itself gets transformed into an
|
||||
XContent (defaults to JSON). In order to change how the source closure
|
||||
is serialized, a global (static) setting can be set on the `GClient` by
|
||||
changing the `indexContentType` field.
|
||||
|
||||
Note also that the `source` can be set using the typical Java based
|
||||
APIs, the `Closure` option is a Groovy extension.
|
|
@ -0,0 +1,114 @@
|
|||
[[search]]
|
||||
== Search API
|
||||
|
||||
The search API is very similar to the
|
||||
link:{java}/search.html[Java search API]. The Groovy
|
||||
extension allows to provide the search source to execute as a `Closure`
|
||||
including the query itself (similar to GORM criteria builder):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def search = node.client.search {
|
||||
indices "test"
|
||||
types "type1"
|
||||
source {
|
||||
query {
|
||||
term(test: "value")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
search.response.hits.each {SearchHit hit ->
|
||||
println "Got hit $hit.id from $hit.index/$hit.type"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
It can also be executed using the "Java API" while still using a closure
|
||||
for the query:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def search = node.client.prepareSearch("test").setQuery({
|
||||
term(test: "value")
|
||||
}).gexecute();
|
||||
|
||||
search.response.hits.each {SearchHit hit ->
|
||||
println "Got hit $hit.id from $hit.index/$hit.type"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The format of the search `Closure` follows the same JSON syntax as the
|
||||
link:{ref}/search-search.html[Search API] request.
|
||||
|
||||
[float]
|
||||
=== More examples
|
||||
|
||||
Term query where multiple values are provided (see
|
||||
link:{ref}/query-dsl-terms-query.html[terms]):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def search = node.client.search {
|
||||
indices "test"
|
||||
types "type1"
|
||||
source {
|
||||
query {
|
||||
terms(test: ["value1", "value2"])
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Query string (see
|
||||
link:{ref}/query-dsl-query-string-query.html[query string]):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def search = node.client.search {
|
||||
indices "test"
|
||||
types "type1"
|
||||
source {
|
||||
query {
|
||||
query_string(
|
||||
fields: ["test"],
|
||||
query: "value1 value2")
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Pagination (see
|
||||
link:{ref}/search-request-from-size.html[from/size]):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def search = node.client.search {
|
||||
indices "test"
|
||||
types "type1"
|
||||
source {
|
||||
from = 0
|
||||
size = 10
|
||||
query {
|
||||
term(test: "value")
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Sorting (see link:{ref}/search-request-sort.html[sort]):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
def search = node.client.search {
|
||||
indices "test"
|
||||
types "type1"
|
||||
source {
|
||||
query {
|
||||
term(test: "value")
|
||||
}
|
||||
sort = [
|
||||
date : [ order: "desc"]
|
||||
]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,38 @@
|
|||
[[bulk]]
|
||||
== Bulk API
|
||||
|
||||
The bulk API allows one to index and delete several documents in a
|
||||
single request. Here is a sample usage:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.common.xcontent.XContentFactory.*;
|
||||
|
||||
BulkRequestBuilder bulkRequest = client.prepareBulk();
|
||||
|
||||
// either use client#prepare, or use Requests# to directly build index/delete requests
|
||||
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
|
||||
.setSource(jsonBuilder()
|
||||
.startObject()
|
||||
.field("user", "kimchy")
|
||||
.field("postDate", new Date())
|
||||
.field("message", "trying out Elastic Search")
|
||||
.endObject()
|
||||
)
|
||||
);
|
||||
|
||||
bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
|
||||
.setSource(jsonBuilder()
|
||||
.startObject()
|
||||
.field("user", "kimchy")
|
||||
.field("postDate", new Date())
|
||||
.field("message", "another post")
|
||||
.endObject()
|
||||
)
|
||||
);
|
||||
|
||||
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
|
||||
if (bulkResponse.hasFailures()) {
|
||||
// process failures by iterating through each bulk response item
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,185 @@
|
|||
[[client]]
|
||||
== Client
|
||||
|
||||
You can use the *java client* in multiple ways:
|
||||
|
||||
* Perform standard <<index_,index>>, <<get,get>>,
|
||||
<<delete,delete>> and <<search,search>> operations on an
|
||||
existing cluster
|
||||
* Perform administrative tasks on a running cluster
|
||||
* Start full nodes when you want to run Elasticsearch embedded in your
|
||||
own application or when you want to launch unit or integration tests
|
||||
|
||||
Obtaining an elasticsearch `Client` is simple. The most common way to
|
||||
get a client is by:
|
||||
|
||||
1. creating an embedded link:#nodeclient[`Node`] that acts as a node
|
||||
within a cluster
|
||||
2. requesting a `Client` from your embedded `Node`.
|
||||
|
||||
Another manner is by creating a link:#transportclient[`TransportClient`]
|
||||
that connects to a cluster.
|
||||
|
||||
*Important:*
|
||||
|
||||
______________________________________________________________________________________________________________________________________________________________
|
||||
Please note that you are encouraged to use the same version on client
|
||||
and cluster sides. You may hit some incompatibilities issues when mixing
|
||||
major versions.
|
||||
______________________________________________________________________________________________________________________________________________________________
|
||||
|
||||
[float]
|
||||
=== Node Client
|
||||
|
||||
Instantiating a node based client is the simplest way to get a `Client`
|
||||
that can execute operations against elasticsearch.
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.node.NodeBuilder.*;
|
||||
|
||||
// on startup
|
||||
|
||||
Node node = nodeBuilder().node();
|
||||
Client client = node.client();
|
||||
|
||||
// on shutdown
|
||||
|
||||
node.close();
|
||||
--------------------------------------------------
|
||||
|
||||
When you start a `Node`, it joins an elasticsearch cluster. You can have
|
||||
different clusters by simple setting the `cluster.name` setting, or
|
||||
explicitly using the `clusterName` method on the builder.
|
||||
|
||||
You can define `cluster.name` in `/src/main/resources/elasticsearch.yml`
|
||||
dir in your project. As long as `elasticsearch.yml` is present in the
|
||||
classloader, it will be used when you start your node.
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
cluster.name=yourclustername
|
||||
--------------------------------------------------
|
||||
|
||||
Or in Java:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
Node node = nodeBuilder().clusterName("yourclustername").node();
|
||||
Client client = node.client();
|
||||
--------------------------------------------------
|
||||
|
||||
The benefit of using the `Client` is the fact that operations are
|
||||
automatically routed to the node(s) the operations need to be executed
|
||||
on, without performing a "double hop". For example, the index operation
|
||||
will automatically be executed on the shard that it will end up existing
|
||||
at.
|
||||
|
||||
When you start a `Node`, the most important decision is whether it
|
||||
should hold data or not. In other words, should indices and shards be
|
||||
allocated to it. Many times we would like to have the clients just be
|
||||
clients, without shards being allocated to them. This is simple to
|
||||
configure by setting either `node.data` setting to `false` or
|
||||
`node.client` to `true` (the `NodeBuilder` respective helper methods on
|
||||
it):
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.node.NodeBuilder.*;
|
||||
|
||||
// on startup
|
||||
|
||||
Node node = nodeBuilder().client(true).node();
|
||||
Client client = node.client();
|
||||
|
||||
// on shutdown
|
||||
|
||||
node.close();
|
||||
--------------------------------------------------
|
||||
|
||||
Another common usage is to start the `Node` and use the `Client` in
|
||||
unit/integration tests. In such a case, we would like to start a "local"
|
||||
`Node` (with a "local" discovery and transport). Again, this is just a
|
||||
matter of a simple setting when starting the `Node`. Note, "local" here
|
||||
means local on the JVM (well, actually class loader) level, meaning that
|
||||
two *local* servers started within the same JVM will discover themselves
|
||||
and form a cluster.
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.node.NodeBuilder.*;
|
||||
|
||||
// on startup
|
||||
|
||||
Node node = nodeBuilder().local(true).node();
|
||||
Client client = node.client();
|
||||
|
||||
// on shutdown
|
||||
|
||||
node.close();
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Transport Client
|
||||
|
||||
The `TransportClient` connects remotely to an elasticsearch cluster
|
||||
using the transport module. It does not join the cluster, but simply
|
||||
gets one or more initial transport addresses and communicates with them
|
||||
in round robin fashion on each action (though most actions will probably
|
||||
be "two hop" operations).
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// on startup
|
||||
|
||||
Client client = new TransportClient()
|
||||
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
|
||||
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));
|
||||
|
||||
// on shutdown
|
||||
|
||||
client.close();
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you have to set the cluster name if you use one different to
|
||||
"elasticsearch":
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
Settings settings = ImmutableSettings.settingsBuilder()
|
||||
.put("cluster.name", "myClusterName").build();
|
||||
Client client = new TransportClient(settings);
|
||||
//Add transport addresses and do something with the client...
|
||||
--------------------------------------------------
|
||||
|
||||
Or using `elasticsearch.yml` file as shown in the link:#nodeclient[Node
|
||||
Client section]
|
||||
|
||||
The client allows to sniff the rest of the cluster, and add those into
|
||||
its list of machines to use. In this case, note that the ip addresses
|
||||
used will be the ones that the other nodes were started with (the
|
||||
"publish" address). In order to enable it, set the
|
||||
`client.transport.sniff` to `true`:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
Settings settings = ImmutableSettings.settingsBuilder()
|
||||
.put("client.transport.sniff", true).build();
|
||||
TransportClient client = new TransportClient(settings);
|
||||
--------------------------------------------------
|
||||
|
||||
Other transport client level settings include:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Parameter |Description
|
||||
|`client.transport.ignore_cluster_name` |Set to `true` to ignore cluster
|
||||
name validation of connected nodes. (since 0.19.4)
|
||||
|
||||
|`client.transport.ping_timeout` |The time to wait for a ping response
|
||||
from a node. Defaults to `5s`.
|
||||
|
||||
|`client.transport.nodes_sampler_interval` |How often to sample / ping
|
||||
the nodes listed and connected. Defaults to `5s`.
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
[[count]]
|
||||
== Count API
|
||||
|
||||
The count API allows to easily execute a query and get the number of
|
||||
matches for that query. It can be executed across one or more indices
|
||||
and across one or more types. The query can be provided using the
|
||||
link:{ref}/query-dsl.html[Query DSL].
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.index.query.xcontent.FilterBuilders.*;
|
||||
import static org.elasticsearch.index.query.xcontent.QueryBuilders.*;
|
||||
|
||||
CountResponse response = client.prepareCount("test")
|
||||
.setQuery(termQuery("_type", "type1"))
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
For more information on the count operation, check out the REST
|
||||
link:{ref}/search-count.html[count] docs.
|
||||
|
||||
[float]
|
||||
=== Operation Threading
|
||||
|
||||
The count API allows to set the threading model the operation will be
|
||||
performed when the actual execution of the API is performed on the same
|
||||
node (the API is executed on a shard that is allocated on the same
|
||||
server).
|
||||
|
||||
There are three threading modes.The `NO_THREADS` mode means that the
|
||||
count operation will be executed on the calling thread. The
|
||||
`SINGLE_THREAD` mode means that the count operation will be executed on
|
||||
a single different thread for all local shards. The `THREAD_PER_SHARD`
|
||||
mode means that the count operation will be executed on a different
|
||||
thread for each local shard.
|
||||
|
||||
The default mode is `SINGLE_THREAD`.
|
|
@ -0,0 +1,21 @@
|
|||
[[delete-by-query]]
|
||||
== Delete By Query API
|
||||
|
||||
The delete by query API allows to delete documents from one or more
|
||||
indices and one or more types based on a <<query-dsl-queries,query>>. Here
|
||||
is an example:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.index.query.FilterBuilders.*;
|
||||
import static org.elasticsearch.index.query.QueryBuilders.*;
|
||||
|
||||
DeleteByQueryResponse response = client.prepareDeleteByQuery("test")
|
||||
.setQuery(termQuery("_type", "type1"))
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
For more information on the delete by query operation, check out the
|
||||
link:{ref}/docs-delete-by-query.html[delete_by_query API]
|
||||
docs.
|
|
@ -0,0 +1,39 @@
|
|||
[[delete]]
|
||||
== Delete API
|
||||
|
||||
The delete API allows to delete a typed JSON document from a specific
|
||||
index based on its id. The following example deletes the JSON document
|
||||
from an index called twitter, under a type called tweet, with id valued
|
||||
1:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
For more information on the delete operation, check out the
|
||||
link:{ref}/docs-delete.html[delete API] docs.
|
||||
|
||||
[float]
|
||||
=== Operation Threading
|
||||
|
||||
The delete API allows to set the threading model the operation will be
|
||||
performed when the actual execution of the API is performed on the same
|
||||
node (the API is executed on a shard that is allocated on the same
|
||||
server).
|
||||
|
||||
The options are to execute the operation on a different thread, or to
|
||||
execute it on the calling thread (note that the API is still async). By
|
||||
default, `operationThreaded` is set to `true` which means the operation
|
||||
is executed on a different thread. Here is an example that sets it to
|
||||
`false`:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
|
||||
.setOperationThreaded(false)
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,483 @@
|
|||
[[facets]]
|
||||
== Facets
|
||||
|
||||
Elasticsearch provides a full Java API to play with facets. See the
|
||||
link:{ref}/search-facets.html[Facets guide].
|
||||
|
||||
Use the factory for facet builders (`FacetBuilders`) and add each facet
|
||||
you want to compute when querying and add it to your search request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
SearchResponse sr = node.client().prepareSearch()
|
||||
.setQuery( /* your query */ )
|
||||
.addFacet( /* add a facet */ )
|
||||
.execute().actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can add more than one facet. See
|
||||
link:{ref}/search-search.html[Search Java API] for details.
|
||||
|
||||
To build facet requests, use `FacetBuilders` helpers. Just import them
|
||||
in your class:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.FacetBuilders.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Facets
|
||||
|
||||
[float]
|
||||
==== Terms Facet
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-terms-facet.html[Terms Facet]
|
||||
with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.termsFacet("f")
|
||||
.field("brand")
|
||||
.size(10);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.terms.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
TermsFacet f = (TermsFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
f.getTotalCount(); // Total terms doc count
|
||||
f.getOtherCount(); // Not shown terms doc count
|
||||
f.getMissingCount(); // Without term doc count
|
||||
|
||||
// For each entry
|
||||
for (TermsFacet.Entry entry : f) {
|
||||
entry.getTerm(); // Term
|
||||
entry.getCount(); // Doc count
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Range Facet
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-range-facet.html[Range Facet]
|
||||
with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.rangeFacet("f")
|
||||
.field("price") // Field to compute on
|
||||
.addUnboundedFrom(3) // from -infinity to 3 (excluded)
|
||||
.addRange(3, 6) // from 3 to 6 (excluded)
|
||||
.addUnboundedTo(6); // from 6 to +infinity
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.range.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
RangeFacet f = (RangeFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
// For each entry
|
||||
for (RangeFacet.Entry entry : f) {
|
||||
entry.getFrom(); // Range from requested
|
||||
entry.getTo(); // Range to requested
|
||||
entry.getCount(); // Doc count
|
||||
entry.getMin(); // Min value
|
||||
entry.getMax(); // Max value
|
||||
entry.getMean(); // Mean
|
||||
entry.getTotal(); // Sum of values
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Histogram Facet
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-histogram-facet.html[Histogram
|
||||
Facet] with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
HistogramFacetBuilder facet = FacetBuilders.histogramFacet("f")
|
||||
.field("price")
|
||||
.interval(1);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.histogram.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
HistogramFacet f = (HistogramFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
// For each entry
|
||||
for (HistogramFacet.Entry entry : f) {
|
||||
entry.getKey(); // Key (X-Axis)
|
||||
entry.getCount(); // Doc count (Y-Axis)
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Date Histogram Facet
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-date-histogram-facet.html[Date
|
||||
Histogram Facet] with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.dateHistogramFacet("f")
|
||||
.field("date") // Your date field
|
||||
.interval("year"); // You can also use "quarter", "month", "week", "day",
|
||||
// "hour" and "minute" or notation like "1.5h" or "2w"
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.datehistogram.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
DateHistogramFacet f = (DateHistogramFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
// For each entry
|
||||
for (DateHistogramFacet.Entry entry : f) {
|
||||
entry.getTime(); // Date in ms since epoch (X-Axis)
|
||||
entry.getCount(); // Doc count (Y-Axis)
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Filter Facet (not facet filter)
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-filter-facet.html[Filter Facet]
|
||||
with Java API.
|
||||
|
||||
If you are looking on how to apply a filter to a facet, have a look at
|
||||
link:#facet-filter[facet filter] using Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.filterFacet("f",
|
||||
FilterBuilders.termFilter("brand", "heineken")); // Your Filter here
|
||||
--------------------------------------------------
|
||||
|
||||
See <<query-dsl-filters,Filters>> to
|
||||
learn how to build filters using Java.
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.filter.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
FilterFacet f = (FilterFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
f.getCount(); // Number of docs that matched
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Query Facet
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-query-facet.html[Query Facet]
|
||||
with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.queryFacet("f",
|
||||
QueryBuilders.matchQuery("brand", "heineken"));
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.query.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
QueryFacet f = (QueryFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
f.getCount(); // Number of docs that matched
|
||||
--------------------------------------------------
|
||||
|
||||
See <<query-dsl-queries,Queries>> to
|
||||
learn how to build queries using Java.
|
||||
|
||||
[float]
|
||||
==== Statistical
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-statistical-facet.html[Statistical
|
||||
Facet] with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.statisticalFacet("f")
|
||||
.field("price");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.statistical.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
StatisticalFacet f = (StatisticalFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
f.getCount(); // Doc count
|
||||
f.getMin(); // Min value
|
||||
f.getMax(); // Max value
|
||||
f.getMean(); // Mean
|
||||
f.getTotal(); // Sum of values
|
||||
f.getStdDeviation(); // Standard Deviation
|
||||
f.getSumOfSquares(); // Sum of Squares
|
||||
f.getVariance(); // Variance
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Terms Stats Facet
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-terms-stats-facet.html[Terms
|
||||
Stats Facet] with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.termsStatsFacet("f")
|
||||
.keyField("brand")
|
||||
.valueField("price");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.termsstats.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
TermsStatsFacet f = (TermsStatsFacet) sr.facets().facetsAsMap().get("f");
|
||||
f.getTotalCount(); // Total terms doc count
|
||||
f.getOtherCount(); // Not shown terms doc count
|
||||
f.getMissingCount(); // Without term doc count
|
||||
|
||||
// For each entry
|
||||
for (TermsStatsFacet.Entry entry : f) {
|
||||
entry.getTerm(); // Term
|
||||
entry.getCount(); // Doc count
|
||||
entry.getMin(); // Min value
|
||||
entry.getMax(); // Max value
|
||||
entry.getMean(); // Mean
|
||||
entry.getTotal(); // Sum of values
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Geo Distance Facet
|
||||
|
||||
Here is how you can use
|
||||
link:{ref}/search-facets-geo-distance-facet.html[Geo
|
||||
Distance Facet] with Java API.
|
||||
|
||||
[float]
|
||||
===== Prepare facet request
|
||||
|
||||
Here is an example on how to create the facet request:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders.geoDistanceFacet("f")
|
||||
.field("pin.location") // Field containing coordinates we want to compare with
|
||||
.point(40, -70) // Point from where we start (0)
|
||||
.addUnboundedFrom(10) // 0 to 10 km (excluded)
|
||||
.addRange(10, 20) // 10 to 20 km (excluded)
|
||||
.addRange(20, 100) // 20 to 100 km (excluded)
|
||||
.addUnboundedTo(100) // from 100 km to infinity (and beyond ;-) )
|
||||
.unit(DistanceUnit.KILOMETERS); // All distances are in kilometers. Can be MILES
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Use facet response
|
||||
|
||||
Import Facet definition classes:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.search.facet.geodistance.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// sr is here your SearchResponse object
|
||||
GeoDistanceFacet f = (GeoDistanceFacet) sr.facets().facetsAsMap().get("f");
|
||||
|
||||
// For each entry
|
||||
for (GeoDistanceFacet.Entry entry : f) {
|
||||
entry.getFrom(); // Distance from requested
|
||||
entry.getTo(); // Distance to requested
|
||||
entry.getCount(); // Doc count
|
||||
entry.getMin(); // Min value
|
||||
entry.getMax(); // Max value
|
||||
entry.getTotal(); // Sum of values
|
||||
entry.getMean(); // Mean
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Facet filters (not Filter Facet)
|
||||
|
||||
By default, facets are applied on the query resultset whatever filters
|
||||
exists or are.
|
||||
|
||||
If you need to compute facets with the same filters or even with other
|
||||
filters, you can add the filter to any facet using
|
||||
`AbstractFacetBuilder#facetFilter(FilterBuilder)` method:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FacetBuilders
|
||||
.termsFacet("f").field("brand") // Your facet
|
||||
.facetFilter( // Your filter here
|
||||
FilterBuilders.termFilter("colour", "pale")
|
||||
);
|
||||
--------------------------------------------------
|
||||
|
||||
For example, you can reuse the same filter you created for your query:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// A common filter
|
||||
FilterBuilder filter = FilterBuilders.termFilter("colour", "pale");
|
||||
|
||||
TermsFacetBuilder facet = FacetBuilders.termsFacet("f")
|
||||
.field("brand")
|
||||
.facetFilter(filter); // We apply it to the facet
|
||||
|
||||
SearchResponse sr = node.client().prepareSearch()
|
||||
.setQuery(QueryBuilders.matchAllQuery())
|
||||
.setFilter(filter) // We apply it to the query
|
||||
.addFacet(facet)
|
||||
.execute().actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
See documentation on how to build
|
||||
<<query-dsl-filters,Filters>>.
|
||||
|
||||
[float]
|
||||
=== Scope
|
||||
|
||||
By default, facets are computed within the query resultset. But, you can
|
||||
compute facets from all documents in the index whatever the query is,
|
||||
using `global` parameter:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
TermsFacetBuilder facet = FacetBuilders.termsFacet("f")
|
||||
.field("brand")
|
||||
.global(true);
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,38 @@
|
|||
[[get]]
|
||||
== Get API
|
||||
|
||||
The get API allows to get a typed JSON document from the index based on
|
||||
its id. The following example gets a JSON document from an index called
|
||||
twitter, under a type called tweet, with id valued 1:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
GetResponse response = client.prepareGet("twitter", "tweet", "1")
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
For more information on the index operation, check out the REST
|
||||
link:{ref}/docs-get.html[get] docs.
|
||||
|
||||
[float]
|
||||
=== Operation Threading
|
||||
|
||||
The get API allows to set the threading model the operation will be
|
||||
performed when the actual execution of the API is performed on the same
|
||||
node (the API is executed on a shard that is allocated on the same
|
||||
server).
|
||||
|
||||
The options are to execute the operation on a different thread, or to
|
||||
execute it on the calling thread (note that the API is still async). By
|
||||
default, `operationThreaded` is set to `true` which means the operation
|
||||
is executed on a different thread. Here is an example that sets it to
|
||||
`false`:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
GetResponse response = client.prepareGet("twitter", "tweet", "1")
|
||||
.setOperationThreaded(false)
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,61 @@
|
|||
[[java-api]]
|
||||
= Java API
|
||||
:ref: http://www.elasticsearch.org/guide/elasticsearch/reference/current
|
||||
|
||||
[preface]
|
||||
== Preface
|
||||
This section describes the Java API that elasticsearch provides. All
|
||||
elasticsearch operations are executed using a
|
||||
<<client,Client>> object. All
|
||||
operations are completely asynchronous in nature (either accepts a
|
||||
listener, or return a future).
|
||||
|
||||
Additionally, operations on a client may be accumulated and executed in
|
||||
<<bulk,Bulk>>.
|
||||
|
||||
Note, all the APIs are exposed through the
|
||||
Java API (actually, the Java API is used internally to execute them).
|
||||
|
||||
[float]
|
||||
== Maven Repository
|
||||
|
||||
Elasticsearch is hosted on
|
||||
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22elasticsearch%22[Maven
|
||||
Central].
|
||||
|
||||
For example, you can define the latest version in your `pom.xml` file:
|
||||
|
||||
[source,xml]
|
||||
--------------------------------------------------
|
||||
<dependency>
|
||||
<groupId>org.elasticsearch</groupId>
|
||||
<artifactId>elasticsearch</artifactId>
|
||||
<version>${es.version}</version>
|
||||
</dependency>
|
||||
--------------------------------------------------
|
||||
|
||||
|
||||
include::client.asciidoc[]
|
||||
|
||||
include::index_.asciidoc[]
|
||||
|
||||
include::get.asciidoc[]
|
||||
|
||||
include::delete.asciidoc[]
|
||||
|
||||
include::bulk.asciidoc[]
|
||||
|
||||
include::search.asciidoc[]
|
||||
|
||||
include::count.asciidoc[]
|
||||
|
||||
include::delete-by-query.asciidoc[]
|
||||
|
||||
include::facets.asciidoc[]
|
||||
|
||||
include::percolate.asciidoc[]
|
||||
|
||||
include::query-dsl-queries.asciidoc[]
|
||||
|
||||
include::query-dsl-filters.asciidoc[]
|
||||
|
|
@ -0,0 +1,201 @@
|
|||
[[index_]]
|
||||
== Index API
|
||||
|
||||
The index API allows one to index a typed JSON document into a specific
|
||||
index and make it searchable.
|
||||
|
||||
[float]
|
||||
=== Generate JSON document
|
||||
|
||||
There are different way of generating JSON document:
|
||||
|
||||
* Manually (aka do it yourself) using native `byte[]` or as a `String`
|
||||
|
||||
* Using `Map` that will be automatically converted to its JSON
|
||||
equivalent
|
||||
|
||||
* Using a third party library to serialize your beans such as
|
||||
http://wiki.fasterxml.com/JacksonHome[Jackson]
|
||||
|
||||
* Using built-in helpers XContentFactory.jsonBuilder()
|
||||
|
||||
Internally, each type is converted to `byte[]` (so a String is converted
|
||||
to a `byte[]`). Therefore, if the object is in this form already, then
|
||||
use it. The `jsonBuilder` is highly optimized JSON generator that
|
||||
directly constructs a `byte[]`.
|
||||
|
||||
[float]
|
||||
==== Do It Yourself
|
||||
|
||||
Nothing really difficult here but note that you will have to encode
|
||||
dates regarding to the
|
||||
link:{ref}/mapping-date-format.html[Date Format].
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
String json = "{" +
|
||||
"\"user\":\"kimchy\"," +
|
||||
"\"postDate\":\"2013-01-30\"," +
|
||||
"\"message\":\"trying out Elastic Search\"," +
|
||||
"}";
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Using Map
|
||||
|
||||
Map is a key:values pair collection. It represents very well a JSON
|
||||
structure:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
Map<String, Object> json = new HashMap<String, Object>();
|
||||
json.put("user","kimchy");
|
||||
json.put("postDate",new Date());
|
||||
json.put("message","trying out Elastic Search");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Serialize your beans
|
||||
|
||||
Elasticsearch already use Jackson but shade it under
|
||||
`org.elasticsearch.common.jackson` package. +
|
||||
So, you can add your own Jackson version in your `pom.xml` file or in
|
||||
your classpath. See http://wiki.fasterxml.com/JacksonDownload[Jackson
|
||||
Download Page].
|
||||
|
||||
For example:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
<dependency>
|
||||
<groupId>com.fasterxml.jackson.core</groupId>
|
||||
<artifactId>jackson-databind</artifactId>
|
||||
<version>2.1.3</version>
|
||||
</dependency>
|
||||
--------------------------------------------------
|
||||
|
||||
Then, you can start serializing your beans to JSON:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import com.fasterxml.jackson.databind.*;
|
||||
|
||||
// instance a json mapper
|
||||
ObjectMapper mapper = new ObjectMapper(); // create once, reuse
|
||||
|
||||
// generate json
|
||||
String json = mapper.writeValueAsString(yourbeaninstance);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Use Elasticsearch helpers
|
||||
|
||||
Elasticsearch provides built-in helpers to generate JSON content.
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.common.xcontent.XContentFactory.*;
|
||||
|
||||
XContentBuilder builder = jsonBuilder()
|
||||
.startObject()
|
||||
.field("user", "kimchy")
|
||||
.field("postDate", new Date())
|
||||
.field("message", "trying out Elastic Search")
|
||||
.endObject()
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can also add arrays with `startArray(String)` and
|
||||
`endArray()` methods. By the way, `field` method +
|
||||
accept many object types. You can pass directly numbers, dates and even
|
||||
other XContentBuilder objects.
|
||||
|
||||
If you need to see the generated JSON content, you can use the
|
||||
@string()@method.
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
String json = builder.string();
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Index document
|
||||
|
||||
The following example indexes a JSON document into an index called
|
||||
twitter, under a type called tweet, with id valued 1:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.common.xcontent.XContentFactory.*;
|
||||
|
||||
IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
|
||||
.setSource(jsonBuilder()
|
||||
.startObject()
|
||||
.field("user", "kimchy")
|
||||
.field("postDate", new Date())
|
||||
.field("message", "trying out Elastic Search")
|
||||
.endObject()
|
||||
)
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can also index your documents as JSON String and that you
|
||||
don't have to give an ID:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
String json = "{" +
|
||||
"\"user\":\"kimchy\"," +
|
||||
"\"postDate\":\"2013-01-30\"," +
|
||||
"\"message\":\"trying out Elastic Search\"," +
|
||||
"}";
|
||||
|
||||
IndexResponse response = client.prepareIndex("twitter", "tweet")
|
||||
.setSource(json)
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
`IndexResponse` object will give you report:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Index name
|
||||
String _index = response.index();
|
||||
// Type name
|
||||
String _type = response.type();
|
||||
// Document ID (generated or not)
|
||||
String _id = response.id();
|
||||
// Version (if it's the first time you index this document, you will get: 1)
|
||||
long _version = response.version();
|
||||
--------------------------------------------------
|
||||
|
||||
If you use percolation while indexing, `IndexResponse` object will give
|
||||
you percolator that have matched:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
|
||||
.setSource(json)
|
||||
.setPercolate("*")
|
||||
.execute()
|
||||
.actionGet();
|
||||
|
||||
List<String> matches = response.matches();
|
||||
--------------------------------------------------
|
||||
|
||||
For more information on the index operation, check out the REST
|
||||
link:{ref}/docs-index_.html[index] docs.
|
||||
|
||||
[float]
|
||||
=== Operation Threading
|
||||
|
||||
The index API allows to set the threading model the operation will be
|
||||
performed when the actual execution of the API is performed on the same
|
||||
node (the API is executed on a shard that is allocated on the same
|
||||
server).
|
||||
|
||||
The options are to execute the operation on a different thread, or to
|
||||
execute it on the calling thread (note that the API is still async). By
|
||||
default, `operationThreaded` is set to `true` which means the operation
|
||||
is executed on a different thread.
|
|
@ -0,0 +1,48 @@
|
|||
[[percolate]]
|
||||
== Percolate API
|
||||
|
||||
The percolator allows to register queries against an index, and then
|
||||
send `percolate` requests which include a doc, and getting back the
|
||||
queries that match on that doc out of the set of registered queries.
|
||||
|
||||
Read the main {ref}/search-percolate.html[percolate]
|
||||
documentation before reading this guide.
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
//This is the query we're registering in the percolator
|
||||
QueryBuilder qb = termQuery("content", "amazing");
|
||||
|
||||
//Index the query = register it in the percolator
|
||||
client.prepareIndex("_percolator", "myIndexName", "myDesignatedQueryName")
|
||||
.setSource(jsonBuilder()
|
||||
.startObject()
|
||||
.field("query", qb) // Register the query
|
||||
.endObject())
|
||||
.setRefresh(true) // Needed when the query shall be available immediately
|
||||
.execute().actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
This indexes the above term query under the name
|
||||
*myDesignatedQueryName*.
|
||||
|
||||
In order to check a document against the registered queries, use this
|
||||
code:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
//Build a document to check against the percolator
|
||||
XContentBuilder docBuilder = XContentFactory.jsonBuilder().startObject();
|
||||
docBuilder.field("doc").startObject(); //This is needed to designate the document
|
||||
docBuilder.field("content", "This is amazing!");
|
||||
docBuilder.endObject(); //End of the doc field
|
||||
docBuilder.endObject(); //End of the JSON root object
|
||||
//Percolate
|
||||
PercolateResponse response =
|
||||
client.preparePercolate("myIndexName", "myDocumentType").setSource(docBuilder).execute().actionGet();
|
||||
//Iterate over the results
|
||||
for(String result : response) {
|
||||
//Handle the result which is the name of
|
||||
//the query in the percolator
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,459 @@
|
|||
[[query-dsl-filters]]
|
||||
== Query DSL - Filters
|
||||
|
||||
elasticsearch provides a full Java query dsl in a similar manner to the
|
||||
REST link:{ref}/query-dsl.html[Query DSL]. The factory for filter
|
||||
builders is `FilterBuilders`.
|
||||
|
||||
Once your query is ready, you can use the <<search,Search API>>.
|
||||
|
||||
See also how to build <<query-dsl-queries,Queries>>.
|
||||
|
||||
To use `FilterBuilders` just import them in your class:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.index.query.FilterBuilders.*;
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can easily print (aka debug) JSON generated queries using
|
||||
`toString()` method on `FilterBuilder` object.
|
||||
|
||||
[float]
|
||||
=== And Filter
|
||||
|
||||
See link:{ref}/query-dsl-and-filter.html[And Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.andFilter(
|
||||
FilterBuilders.rangeFilter("postDate").from("2010-03-01").to("2010-04-01"),
|
||||
FilterBuilders.prefixFilter("name.second", "ba")
|
||||
);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`AndFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Bool Filter
|
||||
|
||||
See link:{ref}/query-dsl-bool-filter.html[Bool Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.boolFilter()
|
||||
.must(FilterBuilders.termFilter("tag", "wow"))
|
||||
.mustNot(FilterBuilders.rangeFilter("age").from("10").to("20"))
|
||||
.should(FilterBuilders.termFilter("tag", "sometag"))
|
||||
.should(FilterBuilders.termFilter("tag", "sometagtag"));
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`BoolFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Exists Filter
|
||||
|
||||
See link:{ref}/query-dsl-exists-filter.html[Exists Filter].
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.existsFilter("user");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Ids Filter
|
||||
|
||||
See link:{ref}/query-dsl-ids-filter.html[IDs Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.idsFilter("my_type", "type2").addIds("1", "4", "100");
|
||||
|
||||
// Type is optional
|
||||
FilterBuilders.idsFilter().addIds("1", "4", "100");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Limit Filter
|
||||
|
||||
See link:{ref}/query-dsl-limit-filter.html[Limit Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.limitFilter(100);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Type Filter
|
||||
|
||||
See link:{ref}/query-dsl-type-filter.html[Type Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.typeFilter("my_type");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Geo Bounding Box Filter
|
||||
|
||||
See link:{ref}/query-dsl-geo-bounding-box-filter.html[Geo
|
||||
Bounding Box Filter]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.geoBoundingBoxFilter("pin.location")
|
||||
.topLeft(40.73, -74.1)
|
||||
.bottomRight(40.717, -73.99);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`GeoBoundingBoxFilterBuilder#cache(boolean)` method. See
|
||||
<<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== GeoDistance Filter
|
||||
|
||||
See link:{ref}/query-dsl-geo-distance-filter.html[Geo
|
||||
Distance Filter]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.geoDistanceFilter("pin.location")
|
||||
.point(40, -70)
|
||||
.distance(200, DistanceUnit.KILOMETERS)
|
||||
.optimizeBbox("memory") // Can be also "indexed" or "none"
|
||||
.geoDistance(GeoDistance.ARC); // Or GeoDistance.PLANE
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`GeoDistanceFilterBuilder#cache(boolean)` method. See
|
||||
<<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Geo Distance Range Filter
|
||||
|
||||
See link:{ref}/query-dsl-geo-distance-range-filter.html[Geo
|
||||
Distance Range Filter]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.geoDistanceRangeFilter("pin.location")
|
||||
.point(40, -70)
|
||||
.from("200km")
|
||||
.to("400km")
|
||||
.includeLower(true)
|
||||
.includeUpper(false)
|
||||
.optimizeBbox("memory") // Can be also "indexed" or "none"
|
||||
.geoDistance(GeoDistance.ARC); // Or GeoDistance.PLANE
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`GeoDistanceRangeFilterBuilder#cache(boolean)` method. See
|
||||
<<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Geo Polygon Filter
|
||||
|
||||
See link:{ref}/query-dsl-geo-polygon-filter.html[Geo Polygon
|
||||
Filter]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.geoPolygonFilter("pin.location")
|
||||
.addPoint(40, -70)
|
||||
.addPoint(30, -80)
|
||||
.addPoint(20, -90);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`GeoPolygonFilterBuilder#cache(boolean)` method. See
|
||||
<<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Geo Shape Filter
|
||||
|
||||
See link:{ref}/query-dsl-geo-shape-filter.html[Geo Shape
|
||||
Filter]
|
||||
|
||||
Note: the `geo_shape` type uses `Spatial4J` and `JTS`, both of which are
|
||||
optional dependencies. Consequently you must add `Spatial4J` and `JTS`
|
||||
to your classpath in order to use this type:
|
||||
|
||||
[source,xml]
|
||||
-----------------------------------------------
|
||||
<dependency>
|
||||
<groupId>com.spatial4j</groupId>
|
||||
<artifactId>spatial4j</artifactId>
|
||||
<version>0.3</version>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>com.vividsolutions</groupId>
|
||||
<artifactId>jts</artifactId>
|
||||
<version>1.12</version>
|
||||
<exclusions>
|
||||
<exclusion>
|
||||
<groupId>xerces</groupId>
|
||||
<artifactId>xercesImpl</artifactId>
|
||||
</exclusion>
|
||||
</exclusions>
|
||||
</dependency>
|
||||
-----------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Import Spatial4J shapes
|
||||
import com.spatial4j.core.context.SpatialContext;
|
||||
import com.spatial4j.core.shape.Shape;
|
||||
import com.spatial4j.core.shape.impl.RectangleImpl;
|
||||
|
||||
// Also import ShapeRelation
|
||||
import org.elasticsearch.common.geo.ShapeRelation;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Shape within another
|
||||
filter = FilterBuilders.geoShapeFilter("location",
|
||||
new RectangleImpl(0,10,0,10,SpatialContext.GEO))
|
||||
.relation(ShapeRelation.WITHIN);
|
||||
|
||||
// Intersect shapes
|
||||
filter = FilterBuilders.geoShapeFilter("location",
|
||||
new PointImpl(0, 0, SpatialContext.GEO))
|
||||
.relation(ShapeRelation.INTERSECTS);
|
||||
|
||||
// Using pre-indexed shapes
|
||||
filter = FilterBuilders.geoShapeFilter("location", "New Zealand", "countries")
|
||||
.relation(ShapeRelation.DISJOINT);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Has Child / Has Parent Filters
|
||||
|
||||
See:
|
||||
* link:{ref}/query-dsl-has-child-filter.html[Has Child Filter]
|
||||
* link:{ref}/query-dsl-has-parent-filter.html[Has Parent Filter]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Has Child
|
||||
QFilterBuilders.hasChildFilter("blog_tag",
|
||||
QueryBuilders.termQuery("tag", "something"));
|
||||
|
||||
// Has Parent
|
||||
QFilterBuilders.hasParentFilter("blog",
|
||||
QueryBuilders.termQuery("tag", "something"));
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Match All Filter
|
||||
|
||||
See link:{ref}/query-dsl-match-all-filter.html[Match All Filter]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.matchAllFilter();
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Missing Filter
|
||||
|
||||
See link:{ref}/query-dsl-missing-filter.html[Missing Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.missingFilter("user")
|
||||
.existence(true)
|
||||
.nullValue(true);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Not Filter
|
||||
|
||||
See link:{ref}/query-dsl-not-filter.html[Not Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.notFilter(
|
||||
FilterBuilders.rangeFilter("price").from("1").to("2"));
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Numeric Range Filter
|
||||
|
||||
See link:{ref}/query-dsl-numeric-range-filter.html[Numeric
|
||||
Range Filter]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.numericRangeFilter("age")
|
||||
.from(10)
|
||||
.to(20)
|
||||
.includeLower(true)
|
||||
.includeUpper(false);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`NumericRangeFilterBuilder#cache(boolean)` method. See
|
||||
<<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Or Filter
|
||||
|
||||
See link:{ref}/query-dsl-or-filter.html[Or Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.orFilter(
|
||||
FilterBuilders.termFilter("name.second", "banon"),
|
||||
FilterBuilders.termFilter("name.nick", "kimchy")
|
||||
);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`OrFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Prefix Filter
|
||||
|
||||
See link:{ref}/query-dsl-prefix-filter.html[Prefix Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.prefixFilter("user", "ki");
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`PrefixFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Query Filter
|
||||
|
||||
See link:{ref}/query-dsl-query-filter.html[Query Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.queryFilter(
|
||||
QueryBuilders.queryString("this AND that OR thus")
|
||||
);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`QueryFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Range Filter
|
||||
|
||||
See link:{ref}/query-dsl-range-filter.html[Range Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.rangeFilter("age")
|
||||
.from("10")
|
||||
.to("20")
|
||||
.includeLower(true)
|
||||
.includeUpper(false);
|
||||
|
||||
// A simplified form using gte, gt, lt or lte
|
||||
FilterBuilders.rangeFilter("age")
|
||||
.gte("10")
|
||||
.lt("20");
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can ask not to cache the result using
|
||||
`RangeFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Script Filter
|
||||
|
||||
See link:{ref}/query-dsl-script-filter.html[Script Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilder filter = FilterBuilders.scriptFilter(
|
||||
"doc['age'].value > param1"
|
||||
).addParam("param1", 10);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can cache the result using
|
||||
`ScriptFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Term Filter
|
||||
|
||||
See link:{ref}/query-dsl-term-filter.html[Term Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.termFilter("user", "kimchy");
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can ask not to cache the result using
|
||||
`TermFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Terms Filter
|
||||
|
||||
See link:{ref}/query-dsl-terms-filter.html[Terms Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.termsFilter("user", "kimchy", "elasticsearch")
|
||||
.execution("plain"); // Optional, can be also "bool", "and" or "or"
|
||||
// or "bool_nocache", "and_nocache" or "or_nocache"
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can ask not to cache the result using
|
||||
`TermsFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[float]
|
||||
=== Nested Filter
|
||||
|
||||
See link:{ref}/query-dsl-nested-filter.html[Nested Filter]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilders.nestedFilter("obj1",
|
||||
QueryBuilders.boolQuery()
|
||||
.must(QueryBuilders.matchQuery("obj1.name", "blue"))
|
||||
.must(QueryBuilders.rangeQuery("obj1.count").gt(5))
|
||||
);
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can ask not to cache the result using
|
||||
`NestedFilterBuilder#cache(boolean)` method. See <<query-dsl-filters-caching>>.
|
||||
|
||||
[[query-dsl-filters-caching]]
|
||||
[float]
|
||||
=== Caching
|
||||
|
||||
By default, some filters are cached or not cached. You can have a fine
|
||||
tuning control using `cache(boolean)` method when exists. For example:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
FilterBuilder filter = FilterBuilders.andFilter(
|
||||
FilterBuilders.rangeFilter("postDate").from("2010-03-01").to("2010-04-01"),
|
||||
FilterBuilders.prefixFilter("name.second", "ba")
|
||||
)
|
||||
.cache(true);
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,489 @@
|
|||
[[query-dsl-queries]]
|
||||
== Query DSL - Queries
|
||||
|
||||
elasticsearch provides a full Java query dsl in a similar manner to the
|
||||
REST link:{ref}/query-dsl.html[Query DSL]. The factory for query
|
||||
builders is `QueryBuilders`. Once your query is ready, you can use the
|
||||
<<search,Search API>>.
|
||||
|
||||
See also how to build <<query-dsl-filters,Filters>>
|
||||
|
||||
To use `QueryBuilders` just import them in your class:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.index.query.QueryBuilders.*;
|
||||
--------------------------------------------------
|
||||
|
||||
Note that you can easily print (aka debug) JSON generated queries using
|
||||
`toString()` method on `QueryBuilder` object.
|
||||
|
||||
The `QueryBuilder` can then be used with any API that accepts a query,
|
||||
such as `count` and `search`.
|
||||
|
||||
[float]
|
||||
=== Match Query
|
||||
|
||||
See link:{ref}/query-dsl-match-query.html[Match Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders.matchQuery("name", "kimchy elasticsearch");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== MultiMatch Query
|
||||
|
||||
See link:{ref}/query-dsl-multi-match-query.html[MultiMatch
|
||||
Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders.multiMatchQuery(
|
||||
"kimchy elasticsearch", // Text you are looking for
|
||||
"user", "message" // Fields you query on
|
||||
);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Boolean Query
|
||||
|
||||
See link:{ref}/query-dsl-bool-query.html[Boolean Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders
|
||||
.boolQuery()
|
||||
.must(termQuery("content", "test1"))
|
||||
.must(termQuery("content", "test4"))
|
||||
.mustNot(termQuery("content", "test2"))
|
||||
.should(termQuery("content", "test3"));
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Boosting Query
|
||||
|
||||
See link:{ref}/query-dsl-boosting-query.html[Boosting Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.boostingQuery()
|
||||
.positive(QueryBuilders.termQuery("name","kimchy"))
|
||||
.negative(QueryBuilders.termQuery("name","dadoonet"))
|
||||
.negativeBoost(0.2f);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== IDs Query
|
||||
|
||||
See link:{ref}/query-dsl-ids-query.html[IDs Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.idsQuery().ids("1", "2");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Custom Score Query
|
||||
|
||||
See link:{ref}/query-dsl-custom-score-query.html[Custom Score
|
||||
Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.customScoreQuery(QueryBuilders.matchAllQuery()) // Your query here
|
||||
.script("_score * doc['price'].value"); // Your script here
|
||||
|
||||
// If the script have parameters, use the same script and provide parameters to it.
|
||||
QueryBuilders.customScoreQuery(QueryBuilders.matchAllQuery())
|
||||
.script("_score * doc['price'].value / pow(param1, param2)")
|
||||
.param("param1", 2)
|
||||
.param("param2", 3.1);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Custom Boost Factor Query
|
||||
|
||||
See
|
||||
link:{ref}/query-dsl-custom-boost-factor-query.html[Custom
|
||||
Boost Factor Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.customBoostFactorQuery(QueryBuilders.matchAllQuery()) // Your query
|
||||
.boostFactor(3.1f);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Constant Score Query
|
||||
|
||||
See link:{ref}/query-dsl-constant-score-query.html[Constant
|
||||
Score Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Using with Filters
|
||||
QueryBuilders.constantScoreQuery(FilterBuilders.termFilter("name","kimchy"))
|
||||
.boost(2.0f);
|
||||
|
||||
// With Queries
|
||||
QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("name","kimchy"))
|
||||
.boost(2.0f);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Disjunction Max Query
|
||||
|
||||
See link:{ref}/query-dsl-dis-max-query.html[Disjunction Max
|
||||
Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.disMaxQuery()
|
||||
.add(QueryBuilders.termQuery("name","kimchy")) // Your queries
|
||||
.add(QueryBuilders.termQuery("name","elasticsearch")) // Your queries
|
||||
.boost(1.2f)
|
||||
.tieBreaker(0.7f);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Field Query
|
||||
|
||||
See link:{ref}/query-dsl-field-query.html[Field Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.fieldQuery("name", "+kimchy -dadoonet");
|
||||
|
||||
// Note that you can write the same query using queryString query.
|
||||
QueryBuilders.queryString("+kimchy -dadoonet").field("name");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Fuzzy Like This (Field) Query (flt and flt_field)
|
||||
|
||||
See:
|
||||
* link:{ref}/query-dsl-flt-query.html[Fuzzy Like This Query]
|
||||
* link:{ref}/query-dsl-flt-field-query.html[Fuzzy Like This Field Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// flt Query
|
||||
QueryBuilders.fuzzyLikeThisQuery("name.first", "name.last") // Fields
|
||||
.likeText("text like this one") // Text
|
||||
.maxQueryTerms(12); // Max num of Terms
|
||||
// in generated queries
|
||||
|
||||
// flt_field Query
|
||||
QueryBuilders.fuzzyLikeThisFieldQuery("name.first") // Only on single field
|
||||
.likeText("text like this one")
|
||||
.maxQueryTerms(12);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== FuzzyQuery
|
||||
|
||||
See link:{ref}/query-dsl-fuzzy-query.html[Fuzzy Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders.fuzzyQuery("name", "kimzhy");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Has Child / Has Parent
|
||||
|
||||
See:
|
||||
* link:{ref}/query-dsl-has-child-query.html[Has Child Query]
|
||||
* link:{ref}/query-dsl-has-parent-query.html[Has Parent]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Has Child
|
||||
QueryBuilders.hasChildQuery("blog_tag",
|
||||
QueryBuilders.termQuery("tag","something"))
|
||||
|
||||
// Has Parent
|
||||
QueryBuilders.hasParentQuery("blog",
|
||||
QueryBuilders.termQuery("tag","something"));
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== MatchAll Query
|
||||
|
||||
See link:{ref}/query-dsl-match-all-query.html[Match All
|
||||
Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders.matchAllQuery();
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Fuzzy Like This (Field) Query (flt and flt_field)
|
||||
|
||||
See:
|
||||
* link:{ref}/query-dsl-mlt-query.html[More Like This Query]
|
||||
* link:{ref}/query-dsl-mlt-field-query.html[More Like This Field Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// mlt Query
|
||||
QueryBuilders.moreLikeThisQuery("name.first", "name.last") // Fields
|
||||
.likeText("text like this one") // Text
|
||||
.minTermFreq(1) // Ignore Threshold
|
||||
.maxQueryTerms(12); // Max num of Terms
|
||||
// in generated queries
|
||||
|
||||
// mlt_field Query
|
||||
QueryBuilders.moreLikeThisFieldQuery("name.first") // Only on single field
|
||||
.likeText("text like this one")
|
||||
.minTermFreq(1)
|
||||
.maxQueryTerms(12);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Prefix Query
|
||||
|
||||
See link:{ref}/query-dsl-prefix-query.html[Prefix Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.prefixQuery("brand", "heine");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== QueryString Query
|
||||
|
||||
See link:{ref}/query-dsl-query-string-query.html[QueryString Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders.queryString("+kimchy -elasticsearch");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Range Query
|
||||
|
||||
See link:{ref}/query-dsl-range-query.html[Range Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders
|
||||
.rangeQuery("price")
|
||||
.from(5)
|
||||
.to(10)
|
||||
.includeLower(true)
|
||||
.includeUpper(false);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Span Queries (first, near, not, or, term)
|
||||
|
||||
See:
|
||||
* link:{ref}/query-dsl-span-first-query.html[Span First Query]
|
||||
* link:{ref}/query-dsl-span-near-query.html[Span Near Query]
|
||||
* link:{ref}/query-dsl-span-not-query.html[Span Not Query]
|
||||
* link:{ref}/query-dsl-span-or-query.html[Span Or Query]
|
||||
* link:{ref}/query-dsl-span-term-query.html[Span Term Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Span First
|
||||
QueryBuilders.spanFirstQuery(
|
||||
QueryBuilders.spanTermQuery("user", "kimchy"), // Query
|
||||
3 // Max End position
|
||||
);
|
||||
|
||||
// Span Near
|
||||
QueryBuilders.spanNearQuery()
|
||||
.clause(QueryBuilders.spanTermQuery("field","value1")) // Span Term Queries
|
||||
.clause(QueryBuilders.spanTermQuery("field","value2"))
|
||||
.clause(QueryBuilders.spanTermQuery("field","value3"))
|
||||
.slop(12) // Slop factor
|
||||
.inOrder(false)
|
||||
.collectPayloads(false);
|
||||
|
||||
// Span Not
|
||||
QueryBuilders.spanNotQuery()
|
||||
.include(QueryBuilders.spanTermQuery("field","value1"))
|
||||
.exclude(QueryBuilders.spanTermQuery("field","value2"));
|
||||
|
||||
// Span Or
|
||||
QueryBuilders.spanOrQuery()
|
||||
.clause(QueryBuilders.spanTermQuery("field","value1"))
|
||||
.clause(QueryBuilders.spanTermQuery("field","value2"))
|
||||
.clause(QueryBuilders.spanTermQuery("field","value3"));
|
||||
|
||||
// Span Term
|
||||
QueryBuilders.spanTermQuery("user","kimchy");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Term Query
|
||||
|
||||
See link:{ref}/query-dsl-term-query.html[Term Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilder qb = QueryBuilders.termQuery("name", "kimchy");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Terms Query
|
||||
|
||||
See link:{ref}/query-dsl-terms-query.html[Terms Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.termsQuery("tags", // field
|
||||
"blue", "pill") // values
|
||||
.minimumMatch(1); // How many terms must match
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Top Children Query
|
||||
|
||||
See link:{ref}/query-dsl-top-children-query.html[Top Children Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.topChildrenQuery(
|
||||
"blog_tag", // field
|
||||
QueryBuilders.termQuery("tag", "something") // Query
|
||||
)
|
||||
.score("max") // max, sum or avg
|
||||
.factor(5)
|
||||
.incrementalFactor(2);
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Wildcard Query
|
||||
|
||||
See link:{ref}/query-dsl-wildcard-query.html[Wildcard Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.wildcardQuery("user", "k?mc*");
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Nested Query
|
||||
|
||||
See link:{ref}/query-dsl-nested-query.html[Nested Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.nestedQuery("obj1", // Path
|
||||
QueryBuilders.boolQuery() // Your query
|
||||
.must(QueryBuilders.matchQuery("obj1.name", "blue"))
|
||||
.must(QueryBuilders.rangeQuery("obj1.count").gt(5))
|
||||
)
|
||||
.scoreMode("avg"); // max, total, avg or none
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Custom Filters Score Query
|
||||
|
||||
See
|
||||
link:{ref}/query-dsl-custom-filters-score-query.html[Custom Filters Score Query]
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
QueryBuilders.customFiltersScoreQuery(
|
||||
QueryBuilders.matchAllQuery()) // Query
|
||||
// Filters with their boost factors
|
||||
.add(FilterBuilders.rangeFilter("age").from(0).to(10), 3)
|
||||
.add(FilterBuilders.rangeFilter("age").from(10).to(20), 2)
|
||||
.scoreMode("first"); // first, min, max, total, avg or multiply
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Indices Query
|
||||
|
||||
See link:{ref}/query-dsl-indices-query.html[Indices Query]
|
||||
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Using another query when no match for the main one
|
||||
QueryBuilders.indicesQuery(
|
||||
QueryBuilders.termQuery("tag", "wow"),
|
||||
"index1", "index2"
|
||||
)
|
||||
.noMatchQuery(QueryBuilders.termQuery("tag", "kow"));
|
||||
|
||||
// Using all (match all) or none (match no documents)
|
||||
QueryBuilders.indicesQuery(
|
||||
QueryBuilders.termQuery("tag", "wow"),
|
||||
"index1", "index2"
|
||||
)
|
||||
.noMatchQuery("all"); // all or none
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== GeoShape Query
|
||||
|
||||
See link:{ref}/query-dsl-geo-shape-query.html[GeoShape Query]
|
||||
|
||||
|
||||
Note: the `geo_shape` type uses `Spatial4J` and `JTS`, both of which are
|
||||
optional dependencies. Consequently you must add `Spatial4J` and `JTS`
|
||||
to your classpath in order to use this type:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
<dependency>
|
||||
<groupId>com.spatial4j</groupId>
|
||||
<artifactId>spatial4j</artifactId>
|
||||
<version>0.3</version>
|
||||
</dependency>
|
||||
|
||||
<dependency>
|
||||
<groupId>com.vividsolutions</groupId>
|
||||
<artifactId>jts</artifactId>
|
||||
<version>1.12</version>
|
||||
<exclusions>
|
||||
<exclusion>
|
||||
<groupId>xerces</groupId>
|
||||
<artifactId>xercesImpl</artifactId>
|
||||
</exclusion>
|
||||
</exclusions>
|
||||
</dependency>
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Import Spatial4J shapes
|
||||
import com.spatial4j.core.context.SpatialContext;
|
||||
import com.spatial4j.core.shape.Shape;
|
||||
import com.spatial4j.core.shape.impl.RectangleImpl;
|
||||
|
||||
// Also import ShapeRelation
|
||||
import org.elasticsearch.common.geo.ShapeRelation;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// Shape within another
|
||||
QueryBuilders.geoShapeQuery("location",
|
||||
new RectangleImpl(0,10,0,10,SpatialContext.GEO))
|
||||
.relation(ShapeRelation.WITHIN);
|
||||
|
||||
// Intersect shapes
|
||||
QueryBuilders.geoShapeQuery("location",
|
||||
new PointImpl(0, 0, SpatialContext.GEO))
|
||||
.relation(ShapeRelation.INTERSECTS);
|
||||
|
||||
// Using pre-indexed shapes
|
||||
QueryBuilders.geoShapeQuery("location", "New Zealand", "countries")
|
||||
.relation(ShapeRelation.DISJOINT);
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,137 @@
|
|||
[[search]]
|
||||
== Search API
|
||||
|
||||
The search API allows to execute a search query and get back search hits
|
||||
that match the query. It can be executed across one or more indices and
|
||||
across one or more types. The query can either be provided using the
|
||||
<<query-dsl-queries,query Java API>> or
|
||||
the <<query-dsl-filters,filter Java API>>.
|
||||
The body of the search request is built using the
|
||||
`SearchSourceBuilder`. Here is an example:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import org.elasticsearch.action.search.SearchResponse;
|
||||
import org.elasticsearch.action.search.SearchType;
|
||||
import org.elasticsearch.index.query.FilterBuilders.*;
|
||||
import org.elasticsearch.index.query.QueryBuilders.*;
|
||||
--------------------------------------------------
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
SearchResponse response = client.prepareSearch("index1", "index2")
|
||||
.setTypes("type1", "type2")
|
||||
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
|
||||
.setQuery(QueryBuilders.termQuery("multi", "test")) // Query
|
||||
.setFilter(FilterBuilders.rangeFilter("age").from(12).to(18)) // Filter
|
||||
.setFrom(0).setSize(60).setExplain(true)
|
||||
.execute()
|
||||
.actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
Note that all parameters are optional. Here is the smallest search call
|
||||
you can write:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
// MatchAll on the whole cluster with all default options
|
||||
SearchResponse response = client.prepareSearch().execute().actionGet();
|
||||
--------------------------------------------------
|
||||
|
||||
For more information on the search operation, check out the REST
|
||||
link:{ref}/search.html[search] docs.
|
||||
|
||||
[float]
|
||||
=== Using scrolls in Java
|
||||
|
||||
Read the link:{ref}/search-request-scroll.html[scroll documentation]
|
||||
first!
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
import static org.elasticsearch.index.query.FilterBuilders.*;
|
||||
import static org.elasticsearch.index.query.QueryBuilders.*;
|
||||
|
||||
QueryBuilder qb = termQuery("multi", "test");
|
||||
|
||||
SearchResponse scrollResp = client.prepareSearch(test)
|
||||
.setSearchType(SearchType.SCAN)
|
||||
.setScroll(new TimeValue(60000))
|
||||
.setQuery(qb)
|
||||
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
|
||||
//Scroll until no hits are returned
|
||||
while (true) {
|
||||
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
|
||||
for (SearchHit hit : scrollResp.getHits()) {
|
||||
//Handle the hit...
|
||||
}
|
||||
//Break condition: No hits are returned
|
||||
if (scrollResp.hits().hits().length == 0) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Operation Threading
|
||||
|
||||
The search API allows to set the threading model the operation will be
|
||||
performed when the actual execution of the API is performed on the same
|
||||
node (the API is executed on a shard that is allocated on the same
|
||||
server).
|
||||
|
||||
There are three threading modes.The `NO_THREADS` mode means that the
|
||||
search operation will be executed on the calling thread. The
|
||||
`SINGLE_THREAD` mode means that the search operation will be executed on
|
||||
a single different thread for all local shards. The `THREAD_PER_SHARD`
|
||||
mode means that the search operation will be executed on a different
|
||||
thread for each local shard.
|
||||
|
||||
The default mode is `SINGLE_THREAD`.
|
||||
|
||||
[float]
|
||||
=== MultiSearch API
|
||||
|
||||
See link:{ref}/search-multi-search.html[MultiSearch API Query]
|
||||
documentation
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
SearchRequestBuilder srb1 = node.client()
|
||||
.prepareSearch().setQuery(QueryBuilders.queryString("elasticsearch")).setSize(1);
|
||||
SearchRequestBuilder srb2 = node.client()
|
||||
.prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);
|
||||
|
||||
MultiSearchResponse sr = node.client().prepareMultiSearch()
|
||||
.add(srb1)
|
||||
.add(srb2)
|
||||
.execute().actionGet();
|
||||
|
||||
// You will get all individual responses from MultiSearchResponse#responses()
|
||||
long nbHits = 0;
|
||||
for (MultiSearchResponse.Item item : sr.responses()) {
|
||||
SearchResponse response = item.response();
|
||||
nbHits += response.hits().totalHits();
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Using Facets
|
||||
|
||||
The following code shows how to add two facets within your search:
|
||||
|
||||
[source,java]
|
||||
--------------------------------------------------
|
||||
SearchResponse sr = node.client().prepareSearch()
|
||||
.setQuery(QueryBuilders.matchAllQuery())
|
||||
.addFacet(FacetBuilders.termsFacet("f1").field("field"))
|
||||
.addFacet(FacetBuilders.dateHistogramFacet("f2").field("birth").interval("year"))
|
||||
.execute().actionGet();
|
||||
|
||||
// Get your facet results
|
||||
TermsFacet f1 = (TermsFacet) sr.facets().facetsAsMap().get("f1");
|
||||
DateHistogramFacet f2 = (DateHistogramFacet) sr.facets().facetsAsMap().get("f2");
|
||||
--------------------------------------------------
|
||||
|
||||
See <<facets,Facets Java API>>
|
||||
documentation for details.
|
|
@ -0,0 +1,76 @@
|
|||
[[analysis]]
|
||||
= Analysis
|
||||
|
||||
[partintro]
|
||||
--
|
||||
The index analysis module acts as a configurable registry of Analyzers
|
||||
that can be used in order to both break indexed (analyzed) fields when a
|
||||
document is indexed and process query strings. It maps to the Lucene
|
||||
`Analyzer`.
|
||||
|
||||
|
||||
Analyzers are composed of a single <<analysis-tokenizers,Tokenizer>>
|
||||
and zero or more <<analysis-tokenfilters,TokenFilters>>. The tokenizer may
|
||||
be preceded by one or more <<analysis-charfilters,CharFilters>>. The
|
||||
analysis module allows one to register `TokenFilters`, `Tokenizers` and
|
||||
`Analyzers` under logical names that can then be referenced either in
|
||||
mapping definitions or in certain APIs. The Analysis module
|
||||
automatically registers (*if not explicitly defined*) built in
|
||||
analyzers, token filters, and tokenizers.
|
||||
|
||||
Here is a sample configuration:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
standard :
|
||||
type : standard
|
||||
stopwords : [stop1, stop2]
|
||||
myAnalyzer1 :
|
||||
type : standard
|
||||
stopwords : [stop1, stop2, stop3]
|
||||
max_token_length : 500
|
||||
# configure a custom analyzer which is
|
||||
# exactly like the default standard analyzer
|
||||
myAnalyzer2 :
|
||||
tokenizer : standard
|
||||
filter : [standard, lowercase, stop]
|
||||
tokenizer :
|
||||
myTokenizer1 :
|
||||
type : standard
|
||||
max_token_length : 900
|
||||
myTokenizer2 :
|
||||
type : keyword
|
||||
buffer_size : 512
|
||||
filter :
|
||||
myTokenFilter1 :
|
||||
type : stop
|
||||
stopwords : [stop1, stop2, stop3, stop4]
|
||||
myTokenFilter2 :
|
||||
type : length
|
||||
min : 0
|
||||
max : 2000
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Backwards compatibility
|
||||
|
||||
All analyzers, tokenizers, and token filters can be configured with a
|
||||
`version` parameter to control which Lucene version behavior they should
|
||||
use. Possible values are: `3.0` - `3.6`, `4.0` - `4.3` (the highest
|
||||
version number is the default option).
|
||||
|
||||
--
|
||||
|
||||
include::analysis/analyzers.asciidoc[]
|
||||
|
||||
include::analysis/tokenizers.asciidoc[]
|
||||
|
||||
include::analysis/tokenfilters.asciidoc[]
|
||||
|
||||
include::analysis/charfilters.asciidoc[]
|
||||
|
||||
include::analysis/icu-plugin.asciidoc[]
|
||||
|
|
@ -0,0 +1,69 @@
|
|||
[[analysis-analyzers]]
|
||||
== Analyzers
|
||||
|
||||
Analyzers are composed of a single <<analysis-tokenizers,Tokenizer>>
|
||||
and zero or more <<analysis-tokenfilters,TokenFilters>>. The tokenizer may
|
||||
be preceded by one or more <<analysis-charfilters,CharFilters>>.
|
||||
The analysis module allows you to register `Analyzers` under logical
|
||||
names which can then be referenced either in mapping definitions or in
|
||||
certain APIs.
|
||||
|
||||
Elasticsearch comes with a number of prebuilt analyzers which are
|
||||
ready to use. Alternatively, you can combine the built in
|
||||
character filters, tokenizers and token filters to create
|
||||
<<analysis-custom-analyzer,custom analyzers>>.
|
||||
|
||||
[float]
|
||||
=== Default Analyzers
|
||||
|
||||
An analyzer is registered under a logical name. It can then be
|
||||
referenced from mapping definitions or certain APIs. When none are
|
||||
defined, defaults are used. There is an option to define which analyzers
|
||||
will be used by default when none can be derived.
|
||||
|
||||
The `default` logical name allows one to configure an analyzer that will
|
||||
be used both for indexing and for searching APIs. The `default_index`
|
||||
logical name can be used to configure a default analyzer that will be
|
||||
used just when indexing, and the `default_search` can be used to
|
||||
configure a default analyzer that will be used just when searching.
|
||||
|
||||
[float]
|
||||
=== Aliasing Analyzers
|
||||
|
||||
Analyzers can be aliased to have several registered lookup names
|
||||
associated with them. For example, the following will allow
|
||||
the `standard` analyzer to also be referenced with `alias1`
|
||||
and `alias2` values.
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
standard :
|
||||
alias: [alias1, alias2]
|
||||
type : standard
|
||||
stopwords : [test1, test2, test3]
|
||||
--------------------------------------------------
|
||||
|
||||
Below is a list of the built in analyzers.
|
||||
|
||||
include::analyzers/standard-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/simple-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/whitespace-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/stop-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/keyword-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/pattern-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/lang-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/snowball-analyzer.asciidoc[]
|
||||
|
||||
include::analyzers/custom-analyzer.asciidoc[]
|
||||
|
|
@ -0,0 +1,52 @@
|
|||
[[analysis-custom-analyzer]]
|
||||
=== Custom Analyzer
|
||||
|
||||
An analyzer of type `custom` that allows to combine a `Tokenizer` with
|
||||
zero or more `Token Filters`, and zero or more `Char Filters`. The
|
||||
custom analyzer accepts a logical/registered name of the tokenizer to
|
||||
use, and a list of logical/registered names of token filters.
|
||||
|
||||
The following are settings that can be set for a `custom` analyzer type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`tokenizer` |The logical / registered name of the tokenizer to use.
|
||||
|
||||
|`filter` |An optional list of logical / registered name of token
|
||||
filters.
|
||||
|
||||
|`char_filter` |An optional list of logical / registered name of char
|
||||
filters.
|
||||
|=======================================================================
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
myAnalyzer2 :
|
||||
type : custom
|
||||
tokenizer : myTokenizer1
|
||||
filter : [myTokenFilter1, myTokenFilter2]
|
||||
char_filter : [my_html]
|
||||
tokenizer :
|
||||
myTokenizer1 :
|
||||
type : standard
|
||||
max_token_length : 900
|
||||
filter :
|
||||
myTokenFilter1 :
|
||||
type : stop
|
||||
stopwords : [stop1, stop2, stop3, stop4]
|
||||
myTokenFilter2 :
|
||||
type : length
|
||||
min : 0
|
||||
max : 2000
|
||||
char_filter :
|
||||
my_html :
|
||||
type : html_strip
|
||||
escaped_tags : [xxx, yyy]
|
||||
read_ahead : 1024
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,7 @@
|
|||
[[analysis-keyword-analyzer]]
|
||||
=== Keyword Analyzer
|
||||
|
||||
An analyzer of type `keyword` that "tokenizes" an entire stream as a
|
||||
single token. This is useful for data like zip codes, ids and so on.
|
||||
Note, when using mapping definitions, it might make more sense to simply
|
||||
mark the field as `not_analyzed`.
|
|
@ -0,0 +1,20 @@
|
|||
[[analysis-lang-analyzer]]
|
||||
=== Language Analyzers
|
||||
|
||||
A set of analyzers aimed at analyzing specific language text. The
|
||||
following types are supported: `arabic`, `armenian`, `basque`,
|
||||
`brazilian`, `bulgarian`, `catalan`, `chinese`, `cjk`, `czech`,
|
||||
`danish`, `dutch`, `english`, `finnish`, `french`, `galician`, `german`,
|
||||
`greek`, `hindi`, `hungarian`, `indonesian`, `italian`, `norwegian`,
|
||||
`persian`, `portuguese`, `romanian`, `russian`, `spanish`, `swedish`,
|
||||
`turkish`, `thai`.
|
||||
|
||||
All analyzers support setting custom `stopwords` either internally in
|
||||
the config, or by using an external stopwords file by setting
|
||||
`stopwords_path`.
|
||||
|
||||
The following analyzers support setting custom `stem_exclusion` list:
|
||||
`arabic`, `armenian`, `basque`, `brazilian`, `bulgarian`, `catalan`,
|
||||
`czech`, `danish`, `dutch`, `english`, `finnish`, `french`, `galician`,
|
||||
`german`, `hindi`, `hungarian`, `indonesian`, `italian`, `norwegian`,
|
||||
`portuguese`, `romanian`, `russian`, `spanish`, `swedish`, `turkish`.
|
|
@ -0,0 +1,126 @@
|
|||
[[analysis-pattern-analyzer]]
|
||||
=== Pattern Analyzer
|
||||
|
||||
An analyzer of type `pattern` that can flexibly separate text into terms
|
||||
via a regular expression. Accepts the following settings:
|
||||
|
||||
The following are settings that can be set for a `pattern` analyzer
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|===================================================================
|
||||
|Setting |Description
|
||||
|`lowercase` |Should terms be lowercased or not. Defaults to `true`.
|
||||
|`pattern` |The regular expression pattern, defaults to `\W+`.
|
||||
|`flags` |The regular expression flags.
|
||||
|===================================================================
|
||||
|
||||
*IMPORTANT*: The regular expression should match the *token separators*,
|
||||
not the tokens themselves.
|
||||
|
||||
Flags should be pipe-separated, eg `"CASE_INSENSITIVE|COMMENTS"`. Check
|
||||
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#field_summary[Java
|
||||
Pattern API] for more details about `flags` options.
|
||||
|
||||
[float]
|
||||
==== Pattern Analyzer Examples
|
||||
|
||||
In order to try out these examples, you should delete the `test` index
|
||||
before running each example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XDELETE localhost:9200/test
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Whitespace tokenizer
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XPUT 'localhost:9200/test' -d '
|
||||
{
|
||||
"settings":{
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"whitespace":{
|
||||
"type": "pattern",
|
||||
"pattern":"\\\\s+"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
|
||||
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=whitespace' -d 'foo,bar baz'
|
||||
# "foo,bar", "baz"
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Non-word character tokenizer
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
||||
curl -XPUT 'localhost:9200/test' -d '
|
||||
{
|
||||
"settings":{
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"nonword":{
|
||||
"type": "pattern",
|
||||
"pattern":"[^\\\\w]+"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
|
||||
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'foo,bar baz'
|
||||
# "foo,bar baz" becomes "foo", "bar", "baz"
|
||||
|
||||
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=nonword' -d 'type_1-type_4'
|
||||
# "type_1","type_4"
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== CamelCase tokenizer
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
||||
curl -XPUT 'localhost:9200/test?pretty=1' -d '
|
||||
{
|
||||
"settings":{
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"camel":{
|
||||
"type": "pattern",
|
||||
"pattern":"([^\\\\p{L}\\\\d]+)|(?<=\\\\D)(?=\\\\d)|(?<=\\\\d)(?=\\\\D)|(?<=[\\\\p{L}&&[^\\\\p{Lu}]])(?=\\\\p{Lu})|(?<=\\\\p{Lu})(?=\\\\p{Lu}[\\\\p{L}&&[^\\\\p{Lu}]])"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
|
||||
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=camel' -d '
|
||||
MooseX::FTPClass2_beta
|
||||
'
|
||||
# "moose","x","ftp","class","2","beta"
|
||||
--------------------------------------------------
|
||||
|
||||
The regex above is easier to understand as:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
||||
([^\\p{L}\\d]+) # swallow non letters and numbers,
|
||||
| (?<=\\D)(?=\\d) # or non-number followed by number,
|
||||
| (?<=\\d)(?=\\D) # or number followed by non-number,
|
||||
| (?<=[ \\p{L} && [^\\p{Lu}]]) # or lower case
|
||||
(?=\\p{Lu}) # followed by upper case,
|
||||
| (?<=\\p{Lu}) # or upper case
|
||||
(?=\\p{Lu} # followed by upper case
|
||||
[\\p{L}&&[^\\p{Lu}]] # then lower case
|
||||
)
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,6 @@
|
|||
[[analysis-simple-analyzer]]
|
||||
=== Simple Analyzer
|
||||
|
||||
An analyzer of type `simple` that is built using a
|
||||
<<analysis-lowercase-tokenizer,Lower
|
||||
Case Tokenizer>>.
|
|
@ -0,0 +1,63 @@
|
|||
[[analysis-snowball-analyzer]]
|
||||
=== Snowball Analyzer
|
||||
|
||||
An analyzer of type `snowball` that uses the
|
||||
<<analysis-standard-tokenizer,standard
|
||||
tokenizer>>, with
|
||||
<<analysis-standard-tokenfilter,standard
|
||||
filter>>,
|
||||
<<analysis-lowercase-tokenfilter,lowercase
|
||||
filter>>,
|
||||
<<analysis-stop-tokenfilter,stop
|
||||
filter>>, and
|
||||
<<analysis-snowball-tokenfilter,snowball
|
||||
filter>>.
|
||||
|
||||
The Snowball Analyzer is a stemming analyzer from Lucene that is
|
||||
originally based on the snowball project from
|
||||
http://snowball.tartarus.org[snowball.tartarus.org].
|
||||
|
||||
Sample usage:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"my_analyzer" : {
|
||||
"type" : "snowball",
|
||||
"language" : "English"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The `language` parameter can have the same values as the
|
||||
<<analysis-snowball-tokenfilter,snowball
|
||||
filter>> and defaults to `English`. Note that not all the language
|
||||
analyzers have a default set of stopwords provided.
|
||||
|
||||
The `stopwords` parameter can be used to provide stopwords for the
|
||||
languages that has no defaults, or to simply replace the default set
|
||||
with your custom list. A default set of stopwords for many of these
|
||||
languages is available from for instance
|
||||
https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/[here]
|
||||
and
|
||||
https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball[here.]
|
||||
|
||||
A sample configuration (in YAML format) specifying Swedish with
|
||||
stopwords:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
my_analyzer:
|
||||
type: snowball
|
||||
language: Swedish
|
||||
stopwords: "och,det,att,i,en,jag,hon,som,han,på,den,med,var,sig,för,så,till,är,men,ett,om,hade,de,av,icke,mig,du,henne,då,sin,nu,har,inte,hans,honom,skulle,hennes,där,min,man,ej,vid,kunde,något,från,ut,när,efter,upp,vi,dem,vara,vad,över,än,dig,kan,sina,här,ha,mot,alla,under,någon,allt,mycket,sedan,ju,denna,själv,detta,åt,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,några,deras,blir,mina,samma,vilken,er,sådan,vår,blivit,dess,inom,mellan,sådant,varför,varje,vilka,ditt,vem,vilket,sitta,sådana,vart,dina,vars,vårt,våra,ert,era,vilkas"
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,26 @@
|
|||
[[analysis-standard-analyzer]]
|
||||
=== Standard Analyzer
|
||||
|
||||
An analyzer of type `standard` that is built of using
|
||||
<<analysis-standard-tokenizer,Standard
|
||||
Tokenizer>>, with
|
||||
<<analysis-standard-tokenfilter,Standard
|
||||
Token Filter>>,
|
||||
<<analysis-lowercase-tokenfilter,Lower
|
||||
Case Token Filter>>, and
|
||||
<<analysis-stop-tokenfilter,Stop
|
||||
Token Filter>>.
|
||||
|
||||
The following are settings that can be set for a `standard` analyzer
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`stopwords` |A list of stopword to initialize the stop filter with.
|
||||
Defaults to the english stop words.
|
||||
|
||||
|`max_token_length` |The maximum token length. If a token is seen that
|
||||
exceeds this length then it is discarded. Defaults to `255`.
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
[[analysis-stop-analyzer]]
|
||||
=== Stop Analyzer
|
||||
|
||||
An analyzer of type `stop` that is built using a
|
||||
<<analysis-lowercase-tokenizer,Lower
|
||||
Case Tokenizer>>, with
|
||||
<<analysis-stop-tokenfilter,Stop
|
||||
Token Filter>>.
|
||||
|
||||
The following are settings that can be set for a `stop` analyzer type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`stopwords` |A list of stopword to initialize the stop filter with.
|
||||
Defaults to the english stop words.
|
||||
|
||||
|`stopwords_path` |A path (either relative to `config` location, or
|
||||
absolute) to a stopwords file configuration.
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,6 @@
|
|||
[[analysis-whitespace-analyzer]]
|
||||
=== Whitespace Analyzer
|
||||
|
||||
An analyzer of type `whitespace` that is built using a
|
||||
<<analysis-whitespace-tokenizer,Whitespace
|
||||
Tokenizer>>.
|
|
@ -0,0 +1,16 @@
|
|||
[[analysis-charfilters]]
|
||||
== Character Filters
|
||||
|
||||
Character filters are used to preprocess the string of
|
||||
characters before it is passed to the <<analysis-tokenizers,tokenizer>>.
|
||||
A character filter may be used to strip out HTML markup, , or to convert
|
||||
`"&"` characters to the word `"and"`.
|
||||
|
||||
Elasticsearch has built in characters filters which can be
|
||||
used to build <<analysis-custom-analyzer,custom analyzers>>.
|
||||
|
||||
include::charfilters/mapping-charfilter.asciidoc[]
|
||||
|
||||
include::charfilters/htmlstrip-charfilter.asciidoc[]
|
||||
|
||||
include::charfilters/pattern-replace-charfilter.asciidoc[]
|
|
@ -0,0 +1,5 @@
|
|||
[[analysis-htmlstrip-charfilter]]
|
||||
=== HTML Strip Char Filter
|
||||
|
||||
A char filter of type `html_strip` stripping out HTML elements from an
|
||||
analyzed text.
|
|
@ -0,0 +1,38 @@
|
|||
[[analysis-mapping-charfilter]]
|
||||
=== Mapping Char Filter
|
||||
|
||||
A char filter of type `mapping` replacing characters of an analyzed text
|
||||
with given mapping.
|
||||
|
||||
Here is a sample configuration:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"char_filter" : {
|
||||
"my_mapping" : {
|
||||
"type" : "mapping",
|
||||
"mappings" : ["ph=>f", "qu=>q"]
|
||||
}
|
||||
},
|
||||
"analyzer" : {
|
||||
"custom_with_char_filter" : {
|
||||
"tokenizer" : "standard",
|
||||
"char_filter" : ["my_mapping"]
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Otherwise the setting `mappings_path` can specify a file where you can
|
||||
put the list of char mapping :
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
ph => f
|
||||
qu => k
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,37 @@
|
|||
[[analysis-pattern-replace-charfilter]]
|
||||
=== Pattern Replace Char Filter
|
||||
|
||||
The `pattern_replace` char filter allows the use of a regex to
|
||||
manipulate the characters in a string before analysis. The regular
|
||||
expression is defined using the `pattern` parameter, and the replacement
|
||||
string can be provided using the `replacement` parameter (supporting
|
||||
referencing the original text, as explained
|
||||
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
|
||||
For more information check the
|
||||
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilter.html[lucene
|
||||
documentation]
|
||||
|
||||
Here is a sample configuration:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"char_filter" : {
|
||||
"my_pattern":{
|
||||
"type":"pattern_replace",
|
||||
"pattern":"sample(.*)",
|
||||
"replacement":"replacedSample $1"
|
||||
}
|
||||
},
|
||||
"analyzer" : {
|
||||
"custom_with_char_filter" : {
|
||||
"tokenizer" : "standard",
|
||||
"char_filter" : ["my_pattern"]
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,148 @@
|
|||
[[analysis-icu-plugin]]
|
||||
== ICU Analysis Plugin
|
||||
|
||||
The http://icu-project.org/[ICU] analysis plugin allows for unicode
|
||||
normalization, collation and folding. The plugin is called
|
||||
https://github.com/elasticsearch/elasticsearch-analysis-icu[elasticsearch-analysis-icu].
|
||||
|
||||
The plugin includes the following analysis components:
|
||||
|
||||
[float]
|
||||
=== ICU Normalization
|
||||
|
||||
Normalizes characters as explained
|
||||
http://userguide.icu-project.org/transforms/normalization[here]. It
|
||||
registers itself by default under `icu_normalizer` or `icuNormalizer`
|
||||
using the default settings. Allows for the name parameter to be provided
|
||||
which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`.
|
||||
Here is a sample settings:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"normalization" : {
|
||||
"tokenizer" : "keyword",
|
||||
"filter" : ["icu_normalizer"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== ICU Folding
|
||||
|
||||
Folding of unicode characters based on `UTR#30`. It registers itself
|
||||
under `icu_folding` and `icuFolding` names.
|
||||
The filter also does lowercasing, which means the lowercase filter can
|
||||
normally be left out. Sample setting:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"folding" : {
|
||||
"tokenizer" : "keyword",
|
||||
"filter" : ["icu_folding"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Filtering
|
||||
|
||||
The folding can be filtered by a set of unicode characters with the
|
||||
parameter `unicodeSetFilter`. This is useful for a non-internationalized
|
||||
search engine where retaining a set of national characters which are
|
||||
primary letters in a specific language is wanted. See syntax for the
|
||||
UnicodeSet
|
||||
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html[here].
|
||||
|
||||
The Following example excempt Swedish characters from the folding. Note
|
||||
that the filtered characters are NOT lowercased which is why we add that
|
||||
filter below.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"folding" : {
|
||||
"tokenizer" : "standard",
|
||||
"filter" : ["my_icu_folding", "lowercase"]
|
||||
}
|
||||
}
|
||||
"filter" : {
|
||||
"my_icu_folding" : {
|
||||
"type" : "icu_folding"
|
||||
"unicodeSetFilter" : "[^åäöÅÄÖ]"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== ICU Collation
|
||||
|
||||
Uses collation token filter. Allows to either specify the rules for
|
||||
collation (defined
|
||||
http://www.icu-project.org/userguide/Collate_Customization.html[here])
|
||||
using the `rules` parameter (can point to a location or expressed in the
|
||||
settings, location can be relative to config location), or using the
|
||||
`language` parameter (further specialized by country and variant). By
|
||||
default registers under `icu_collation` or `icuCollation` and uses the
|
||||
default locale.
|
||||
|
||||
Here is a sample settings:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"collation" : {
|
||||
"tokenizer" : "keyword",
|
||||
"filter" : ["icu_collation"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
And here is a sample of custom collation:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"collation" : {
|
||||
"tokenizer" : "keyword",
|
||||
"filter" : ["myCollator"]
|
||||
}
|
||||
},
|
||||
"filter" : {
|
||||
"myCollator" : {
|
||||
"type" : "icu_collation",
|
||||
"language" : "en"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,71 @@
|
|||
[[analysis-tokenfilters]]
|
||||
== Token Filters
|
||||
|
||||
Token filters accept a stream of tokens from a
|
||||
<<analysis-tokenizers,tokenizer>> and can modify tokens
|
||||
(eg lowercasing), delete tokens (eg remove stopwords)
|
||||
or add tokens (eg synonyms).
|
||||
|
||||
Elasticsearch has a number of built in token filters which can be
|
||||
used to build <<analysis-custom-analyzer,custom analyzers>>.
|
||||
|
||||
include::tokenfilters/standard-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/asciifolding-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/length-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/lowercase-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/ngram-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/edgengram-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/porterstem-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/shingle-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/stop-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/word-delimiter-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/stemmer-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/stemmer-override-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/keyword-marker-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/keyword-repeat-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/kstem-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/snowball-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/phonetic-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/synonym-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/compound-word-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/reverse-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/elision-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/truncate-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/unique-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/pattern-capture-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/pattern_replace-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/trim-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/limit-token-count-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/hunspell-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/common-grams-tokenfilter.asciidoc[]
|
||||
|
||||
include::tokenfilters/normalization-tokenfilter.asciidoc[]
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
[[analysis-asciifolding-tokenfilter]]
|
||||
=== ASCII Folding Token Filter
|
||||
|
||||
A token filter of type `asciifolding` that converts alphabetic, numeric,
|
||||
and symbolic Unicode characters which are not in the first 127 ASCII
|
||||
characters (the "Basic Latin" Unicode block) into their ASCII
|
||||
equivalents, if one exists.
|
|
@ -0,0 +1,61 @@
|
|||
[[analysis-common-grams-tokenfilter]]
|
||||
=== Common Grams Token Filter
|
||||
|
||||
Token filter that generates bigrams for frequently occuring terms.
|
||||
Single terms are still indexed. It can be used as an alternative to the
|
||||
<<analysis-stop-tokenfilter,Stop
|
||||
Token Filter>> when we don't want to completely ignore common terms.
|
||||
|
||||
For example, the text "the quick brown is a fox" will be tokenized as
|
||||
"the", "the_quick", "quick", "brown", "brown_is", "is_a", "a_fox",
|
||||
"fox". Assuming "the", "is" and "a" are common words.
|
||||
|
||||
When `query_mode` is enabled, the token filter removes common words and
|
||||
single terms followed by a common word. This parameter should be enabled
|
||||
in the search analyzer.
|
||||
|
||||
For example, the query "the quick brown is a fox" will be tokenized as
|
||||
"the_quick", "quick", "brown_is", "is_a", "a_fox", "fox".
|
||||
|
||||
The following are settings that can be set:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`common_words` |A list of common words to use.
|
||||
|
||||
|`common_words_path` |A path (either relative to `config` location, or
|
||||
absolute) to a list of common words. Each word should be in its own
|
||||
"line" (separated by a line break). The file must be UTF-8 encoded.
|
||||
|
||||
|`ignore_case` |If true, common words matching will be case insensitive
|
||||
(defaults to `false`).
|
||||
|
||||
|`query_mode` |Generates bigrams then removes common words and single
|
||||
terms followed by a common word (defaults to `false`).
|
||||
|=======================================================================
|
||||
|
||||
Note, `common_words` or `common_words_path` field is required.
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
index_grams :
|
||||
tokenizer : whitespace
|
||||
filter : [common_grams]
|
||||
search_grams :
|
||||
tokenizer : whitespace
|
||||
filter : [common_grams_query]
|
||||
filter :
|
||||
common_grams :
|
||||
type : common_grams
|
||||
common_words: [a, an, the]
|
||||
common_grams_query :
|
||||
type : common_grams
|
||||
query_mode: true
|
||||
common_words: [a, an, the]
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,48 @@
|
|||
[[analysis-compound-word-tokenfilter]]
|
||||
=== Compound Word Token Filter
|
||||
|
||||
Token filters that allow to decompose compound words. There are two
|
||||
types available: `dictionary_decompounder` and
|
||||
`hyphenation_decompounder`.
|
||||
|
||||
The following are settings that can be set for a compound word token
|
||||
filter type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`word_list` |A list of words to use.
|
||||
|
||||
|`word_list_path` |A path (either relative to `config` location, or
|
||||
absolute) to a list of words.
|
||||
|
||||
|`min_word_size` |Minimum word size(Integer). Defaults to 5.
|
||||
|
||||
|`min_subword_size` |Minimum subword size(Integer). Defaults to 2.
|
||||
|
||||
|`max_subword_size` |Maximum subword size(Integer). Defaults to 15.
|
||||
|
||||
|`only_longest_match` |Only matching the longest(Boolean). Defaults to
|
||||
`false`
|
||||
|=======================================================================
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
myAnalyzer2 :
|
||||
type : custom
|
||||
tokenizer : standard
|
||||
filter : [myTokenFilter1, myTokenFilter2]
|
||||
filter :
|
||||
myTokenFilter1 :
|
||||
type : dictionary_decompounder
|
||||
word_list: [one, two, three]
|
||||
myTokenFilter2 :
|
||||
type : hyphenation_decompounder
|
||||
word_list_path: path/to/words.txt
|
||||
max_subword_size : 22
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,16 @@
|
|||
[[analysis-edgengram-tokenfilter]]
|
||||
=== Edge NGram Token Filter
|
||||
|
||||
A token filter of type `edgeNGram`.
|
||||
|
||||
The following are settings that can be set for a `edgeNGram` token
|
||||
filter type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|======================================================
|
||||
|Setting |Description
|
||||
|`min_gram` |Defaults to `1`.
|
||||
|`max_gram` |Defaults to `2`.
|
||||
|`side` |Either `front` or `back`. Defaults to `front`.
|
||||
|======================================================
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
[[analysis-elision-tokenfilter]]
|
||||
=== Elision Token Filter
|
||||
|
||||
A token filter which removes elisions. For example, "l'avion" (the
|
||||
plane) will tokenized as "avion" (plane).
|
||||
|
||||
Accepts `articles` setting which is a set of stop words articles. For
|
||||
example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"default" : {
|
||||
"tokenizer" : "standard",
|
||||
"filter" : ["standard", "elision"]
|
||||
}
|
||||
},
|
||||
"filter" : {
|
||||
"elision" : {
|
||||
"type" : "elision",
|
||||
"articles" : ["l", "m", "t", "qu", "n", "s", "j"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,116 @@
|
|||
[[analysis-hunspell-tokenfilter]]
|
||||
=== Hunspell Token Filter
|
||||
|
||||
Basic support for hunspell stemming. Hunspell dictionaries will be
|
||||
picked up from a dedicated hunspell directory on the filesystem
|
||||
(defaults to `<path.conf>/hunspell`). Each dictionary is expected to
|
||||
have its own directory named after its associated locale (language).
|
||||
This dictionary directory is expected to hold both the \*.aff and \*.dic
|
||||
files (all of which will automatically be picked up). For example,
|
||||
assuming the default hunspell location is used, the following directory
|
||||
layout will define the `en_US` dictionary:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
- conf
|
||||
|-- hunspell
|
||||
| |-- en_US
|
||||
| | |-- en_US.dic
|
||||
| | |-- en_US.aff
|
||||
--------------------------------------------------
|
||||
|
||||
The location of the hunspell directory can be configured using the
|
||||
`indices.analysis.hunspell.dictionary.location` settings in
|
||||
_elasticsearch.yml_.
|
||||
|
||||
Each dictionary can be configured with two settings:
|
||||
|
||||
`ignore_case`::
|
||||
If true, dictionary matching will be case insensitive
|
||||
(defaults to `false`)
|
||||
|
||||
`strict_affix_parsing`::
|
||||
Determines whether errors while reading a
|
||||
affix rules file will cause exception or simple be ignored (defaults to
|
||||
`true`)
|
||||
|
||||
These settings can be configured globally in `elasticsearch.yml` using
|
||||
|
||||
* `indices.analysis.hunspell.dictionary.ignore_case` and
|
||||
* `indices.analysis.hunspell.dictionary.strict_affix_parsing`
|
||||
|
||||
or for specific dictionaries:
|
||||
|
||||
* `indices.analysis.hunspell.dictionary.en_US.ignore_case` and
|
||||
* `indices.analysis.hunspell.dictionary.en_US.strict_affix_parsing`.
|
||||
|
||||
It is also possible to add `settings.yml` file under the dictionary
|
||||
directory which holds these settings (this will override any other
|
||||
settings defined in the `elasticsearch.yml`).
|
||||
|
||||
One can use the hunspell stem filter by configuring it the analysis
|
||||
settings:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"en" : {
|
||||
"tokenizer" : "standard",
|
||||
"filter" : [ "lowercase", "en_US" ]
|
||||
}
|
||||
},
|
||||
"filter" : {
|
||||
"en_US" : {
|
||||
"type" : "hunspell",
|
||||
"locale" : "en_US",
|
||||
"dedup" : true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The hunspell token filter accepts four options:
|
||||
|
||||
`locale`::
|
||||
A locale for this filter. If this is unset, the `lang` or
|
||||
`language` are used instead - so one of these has to be set.
|
||||
|
||||
`dictionary`::
|
||||
The name of a dictionary. The path to your hunspell
|
||||
dictionaries should be configured via
|
||||
`indices.analysis.hunspell.dictionary.location` before.
|
||||
|
||||
`dedup`::
|
||||
If only unique terms should be returned, this needs to be
|
||||
set to `true`. Defaults to `true`.
|
||||
|
||||
`recursion_level`::
|
||||
Configures the recursion level a
|
||||
stemmer can go into. Defaults to `2`. Some languages (for example czech)
|
||||
give better results when set to `1` or `0`, so you should test it out.
|
||||
(since 0.90.3)
|
||||
|
||||
NOTE: As opposed to the snowball stemmers (which are algorithm based)
|
||||
this is a dictionary lookup based stemmer and therefore the quality of
|
||||
the stemming is determined by the quality of the dictionary.
|
||||
|
||||
[float]
|
||||
==== References
|
||||
|
||||
Hunspell is a spell checker and morphological analyzer designed for
|
||||
languages with rich morphology and complex word compounding and
|
||||
character encoding.
|
||||
|
||||
1. Wikipedia, http://en.wikipedia.org/wiki/Hunspell
|
||||
|
||||
2. Source code, http://hunspell.sourceforge.net/
|
||||
|
||||
3. Open Office Hunspell dictionaries, http://wiki.openoffice.org/wiki/Dictionaries
|
||||
|
||||
4. Mozilla Hunspell dictionaries, https://addons.mozilla.org/en-US/firefox/language-tools/
|
||||
|
||||
5. Chromium Hunspell dictionaries,
|
||||
http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/
|
|
@ -0,0 +1,34 @@
|
|||
[[analysis-keyword-marker-tokenfilter]]
|
||||
=== Keyword Marker Token Filter
|
||||
|
||||
Protects words from being modified by stemmers. Must be placed before
|
||||
any stemming filters.
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`keywords` |A list of words to use.
|
||||
|
||||
|`keywords_path` |A path (either relative to `config` location, or
|
||||
absolute) to a list of words.
|
||||
|
||||
|`ignore_case` |Set to `true` to lower case all words first. Defaults to
|
||||
`false`.
|
||||
|=======================================================================
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
myAnalyzer :
|
||||
type : custom
|
||||
tokenizer : standard
|
||||
filter : [lowercase, protwods, porterStem]
|
||||
filter :
|
||||
protwods :
|
||||
type : keyword_marker
|
||||
keywords_path : analysis/protwords.txt
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,28 @@
|
|||
[[analysis-keyword-repeat-tokenfilter]]
|
||||
=== Keyword Repeat Token Filter
|
||||
|
||||
The `keyword_repeat` token filter Emits each incoming token twice once
|
||||
as keyword and once as a non-keyword to allow an un-stemmed version of a
|
||||
term to be indexed side by site to the stemmed version of the term.
|
||||
Given the nature of this filter each token that isn't transformed by a
|
||||
subsequent stemmer will be indexed twice. Therefore, consider adding a
|
||||
`unique` filter with `only_on_same_position` set to `true` to drop
|
||||
unnecessary duplicates.
|
||||
|
||||
Note: this is available from `0.90.0.Beta2` on.
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
myAnalyzer :
|
||||
type : custom
|
||||
tokenizer : standard
|
||||
filter : [lowercase, keyword_repeat, porterStem, unique_stem]
|
||||
unique_stem:
|
||||
type: unique
|
||||
only_on_same_position : true
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,6 @@
|
|||
[[analysis-kstem-tokenfilter]]
|
||||
=== KStem Token Filter
|
||||
|
||||
The `kstem` token filter is a high performance filter for english. All
|
||||
terms must already be lowercased (use `lowercase` filter) for this
|
||||
filter to work correctly.
|
|
@ -0,0 +1,16 @@
|
|||
[[analysis-length-tokenfilter]]
|
||||
=== Length Token Filter
|
||||
|
||||
A token filter of type `length` that removes words that are too long or
|
||||
too short for the stream.
|
||||
|
||||
The following are settings that can be set for a `length` token filter
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|===========================================================
|
||||
|Setting |Description
|
||||
|`min` |The minimum number. Defaults to `0`.
|
||||
|`max` |The maximum number. Defaults to `Integer.MAX_VALUE`.
|
||||
|===========================================================
|
||||
|
|
@ -0,0 +1,32 @@
|
|||
[[analysis-limit-token-count-tokenfilter]]
|
||||
=== Limit Token Count Token Filter
|
||||
|
||||
Limits the number of tokens that are indexed per document and field.
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`max_token_count` |The maximum number of tokens that should be indexed
|
||||
per document and field. The default is `1`
|
||||
|
||||
|`consume_all_tokens` |If set to `true` the filter exhaust the stream
|
||||
even if `max_token_count` tokens have been consumed already. The default
|
||||
is `false`.
|
||||
|=======================================================================
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
myAnalyzer :
|
||||
type : custom
|
||||
tokenizer : standard
|
||||
filter : [lowercase, five_token_limit]
|
||||
filter :
|
||||
five_token_limit :
|
||||
type : limit
|
||||
max_token_count : 5
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,37 @@
|
|||
[[analysis-lowercase-tokenfilter]]
|
||||
=== Lowercase Token Filter
|
||||
|
||||
A token filter of type `lowercase` that normalizes token text to lower
|
||||
case.
|
||||
|
||||
Lowercase token filter supports Greek and Turkish lowercase token
|
||||
filters through the `language` parameter. Below is a usage example in a
|
||||
custom analyzer
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
myAnalyzer2 :
|
||||
type : custom
|
||||
tokenizer : myTokenizer1
|
||||
filter : [myTokenFilter1, myGreekLowerCaseFilter]
|
||||
char_filter : [my_html]
|
||||
tokenizer :
|
||||
myTokenizer1 :
|
||||
type : standard
|
||||
max_token_length : 900
|
||||
filter :
|
||||
myTokenFilter1 :
|
||||
type : stop
|
||||
stopwords : [stop1, stop2, stop3, stop4]
|
||||
myGreekLowerCaseFilter :
|
||||
type : lowercase
|
||||
language : greek
|
||||
char_filter :
|
||||
my_html :
|
||||
type : html_strip
|
||||
escaped_tags : [xxx, yyy]
|
||||
read_ahead : 1024
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,15 @@
|
|||
[[analysis-ngram-tokenfilter]]
|
||||
=== NGram Token Filter
|
||||
|
||||
A token filter of type `nGram`.
|
||||
|
||||
The following are settings that can be set for a `nGram` token filter
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|============================
|
||||
|Setting |Description
|
||||
|`min_gram` |Defaults to `1`.
|
||||
|`max_gram` |Defaults to `2`.
|
||||
|============================
|
||||
|
|
@ -0,0 +1,15 @@
|
|||
[[analysis-normalization-tokenfilter]]
|
||||
=== Normalization Token Filter
|
||||
|
||||
There are several token filters available which try to normalize special
|
||||
characters of a certain language.
|
||||
|
||||
You can currently choose between `arabic_normalization` and
|
||||
`persian_normalization` normalization in your token filter
|
||||
configuration. For more information check the
|
||||
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizer.html[ArabicNormalizer]
|
||||
or the
|
||||
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizer.html[PersianNormalizer]
|
||||
documentation.
|
||||
|
||||
*Note:* This filters are available since `0.90.2`
|
|
@ -0,0 +1,134 @@
|
|||
[[analysis-pattern-capture-tokenfilter]]
|
||||
=== Pattern Capture Token Filter
|
||||
|
||||
The `pattern_capture` token filter, unlike the `pattern` tokenizer,
|
||||
emits a token for every capture group in the regular expression.
|
||||
Patterns are not anchored to the beginning and end of the string, so
|
||||
each pattern can match multiple times, and matches are allowed to
|
||||
overlap.
|
||||
|
||||
For instance a pattern like :
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"(([a-z]+)(\d*))"
|
||||
--------------------------------------------------
|
||||
|
||||
when matched against:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"abc123def456"
|
||||
--------------------------------------------------
|
||||
|
||||
would produce the tokens: [ `abc123`, `abc`, `123`, `def456`, `def`,
|
||||
`456` ]
|
||||
|
||||
If `preserve_original` is set to `true` (the default) then it would also
|
||||
emit the original token: `abc123def456`.
|
||||
|
||||
This is particularly useful for indexing text like camel-case code, eg
|
||||
`stripHTML` where a user may search for `"strip html"` or `"striphtml"`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XPUT localhost:9200/test/ -d '
|
||||
{
|
||||
"settings" : {
|
||||
"analysis" : {
|
||||
"filter" : {
|
||||
"code" : {
|
||||
"type" : "pattern_capture",
|
||||
"preserve_original" : 1,
|
||||
"patterns" : [
|
||||
"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)",
|
||||
"(\\d+)"
|
||||
]
|
||||
}
|
||||
},
|
||||
"analyzer" : {
|
||||
"code" : {
|
||||
"tokenizer" : "pattern",
|
||||
"filter" : [ "code", "lowercase" ]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
'
|
||||
--------------------------------------------------
|
||||
|
||||
When used to analyze the text
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml
|
||||
--------------------------------------------------
|
||||
|
||||
this emits the tokens: [ `import`, `static`, `org`, `apache`, `commons`,
|
||||
`lang`, `stringescapeutils`, `string`, `escape`, `utils`, `escapehtml`,
|
||||
`escape`, `html` ]
|
||||
|
||||
Another example is analyzing email addresses:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XPUT localhost:9200/test/ -d '
|
||||
{
|
||||
"settings" : {
|
||||
"analysis" : {
|
||||
"filter" : {
|
||||
"email" : {
|
||||
"type" : "pattern_capture",
|
||||
"preserve_original" : 1,
|
||||
"patterns" : [
|
||||
"(\\w+)",
|
||||
"(\\p{L}+)",
|
||||
"(\\d+)",
|
||||
"@(.+)"
|
||||
]
|
||||
}
|
||||
},
|
||||
"analyzer" : {
|
||||
"email" : {
|
||||
"tokenizer" : "uax_url_email",
|
||||
"filter" : [ "email", "lowercase", "unique" ]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
'
|
||||
--------------------------------------------------
|
||||
|
||||
When the above analyzer is used on an email address like:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
john-smith_123@foo-bar.com
|
||||
--------------------------------------------------
|
||||
|
||||
it would produce the following tokens: [ `john-smith_123`,
|
||||
`foo-bar.com`, `john`, `smith_123`, `smith`, `123`, `foo`,
|
||||
`foo-bar.com`, `bar`, `com` ]
|
||||
|
||||
Multiple patterns are required to allow overlapping captures, but also
|
||||
means that patterns are less dense and easier to understand.
|
||||
|
||||
*Note:* All tokens are emitted in the same position, and with the same
|
||||
character offsets, so when combined with highlighting, the whole
|
||||
original token will be highlighted, not just the matching subset. For
|
||||
instance, querying the above email address for `"smith"` would
|
||||
highlight:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
<em>john-smith_123@foo-bar.com</em>
|
||||
--------------------------------------------------
|
||||
|
||||
not:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
john-<em>smith</em>_123@foo-bar.com
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,9 @@
|
|||
[[analysis-pattern_replace-tokenfilter]]
|
||||
=== Pattern Replace Token Filter
|
||||
|
||||
The `pattern_replace` token filter allows to easily handle string
|
||||
replacements based on a regular expression. The regular expression is
|
||||
defined using the `pattern` parameter, and the replacement string can be
|
||||
provided using the `replacement` parameter (supporting referencing the
|
||||
original text, as explained
|
||||
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
|
|
@ -0,0 +1,5 @@
|
|||
[[analysis-phonetic-tokenfilter]]
|
||||
=== Phonetic Token Filter
|
||||
|
||||
The `phonetic` token filter is provided as a plugin and located
|
||||
https://github.com/elasticsearch/elasticsearch-analysis-phonetic[here].
|
|
@ -0,0 +1,15 @@
|
|||
[[analysis-porterstem-tokenfilter]]
|
||||
=== Porter Stem Token Filter
|
||||
|
||||
A token filter of type `porterStem` that transforms the token stream as
|
||||
per the Porter stemming algorithm.
|
||||
|
||||
Note, the input to the stemming filter must already be in lower case, so
|
||||
you will need to use
|
||||
<<analysis-lowercase-tokenfilter,Lower
|
||||
Case Token Filter>> or
|
||||
<<analysis-lowercase-tokenizer,Lower
|
||||
Case Tokenizer>> farther down the Tokenizer chain in order for this to
|
||||
work properly!. For example, when using custom analyzer, make sure the
|
||||
`lowercase` filter comes before the `porterStem` filter in the list of
|
||||
filters.
|
|
@ -0,0 +1,4 @@
|
|||
[[analysis-reverse-tokenfilter]]
|
||||
=== Reverse Token Filter
|
||||
|
||||
A token filter of type `reverse` that simply reverses each token.
|
|
@ -0,0 +1,36 @@
|
|||
[[analysis-shingle-tokenfilter]]
|
||||
=== Shingle Token Filter
|
||||
|
||||
A token filter of type `shingle` that constructs shingles (token
|
||||
n-grams) from a token stream. In other words, it creates combinations of
|
||||
tokens as a single token. For example, the sentence "please divide this
|
||||
sentence into shingles" might be tokenized into shingles "please
|
||||
divide", "divide this", "this sentence", "sentence into", and "into
|
||||
shingles".
|
||||
|
||||
This filter handles position increments > 1 by inserting filler tokens
|
||||
(tokens with termtext "_"). It does not handle a position increment of
|
||||
0.
|
||||
|
||||
The following are settings that can be set for a `shingle` token filter
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`max_shingle_size` |The maximum shingle size. Defaults to `2`.
|
||||
|
||||
|`min_shingle_sizes` |The minimum shingle size. Defaults to `2`.
|
||||
|
||||
|`output_unigrams` |If `true` the output will contain the input tokens
|
||||
(unigrams) as well as the shingles. Defaults to `true`.
|
||||
|
||||
|`output_unigrams_if_no_shingles` |If `output_unigrams` is `false` the
|
||||
output will contain the input tokens (unigrams) if no shingles are
|
||||
available. Note if `output_unigrams` is set to `true` this setting has
|
||||
no effect. Defaults to `false`.
|
||||
|
||||
|`token_separator` |The string to use when joining adjacent tokens to
|
||||
form a shingle. Defaults to `" "`.
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,33 @@
|
|||
[[analysis-snowball-tokenfilter]]
|
||||
=== Snowball Token Filter
|
||||
|
||||
A filter that stems words using a Snowball-generated stemmer. The
|
||||
`language` parameter controls the stemmer with the following available
|
||||
values: `Armenian`, `Basque`, `Catalan`, `Danish`, `Dutch`, `English`,
|
||||
`Finnish`, `French`, `German`, `German2`, `Hungarian`, `Italian`, `Kp`,
|
||||
`Lovins`, `Norwegian`, `Porter`, `Portuguese`, `Romanian`, `Russian`,
|
||||
`Spanish`, `Swedish`, `Turkish`.
|
||||
|
||||
For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"my_analyzer" : {
|
||||
"tokenizer" : "standard",
|
||||
"filter" : ["standard", "lowercase", "my_snow"]
|
||||
}
|
||||
},
|
||||
"filter" : {
|
||||
"my_snow" : {
|
||||
"type" : "snowball",
|
||||
"language" : "Lovins"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,7 @@
|
|||
[[analysis-standard-tokenfilter]]
|
||||
=== Standard Token Filter
|
||||
|
||||
A token filter of type `standard` that normalizes tokens extracted with
|
||||
the
|
||||
<<analysis-standard-tokenizer,Standard
|
||||
Tokenizer>>.
|
|
@ -0,0 +1,34 @@
|
|||
[[analysis-stemmer-override-tokenfilter]]
|
||||
=== Stemmer Override Token Filter
|
||||
|
||||
Overrides stemming algorithms, by applying a custom mapping, then
|
||||
protecting these terms from being modified by stemmers. Must be placed
|
||||
before any stemming filters.
|
||||
|
||||
Rules are separated by "=>"
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`rules` |A list of mapping rules to use.
|
||||
|
||||
|`rules_path` |A path (either relative to `config` location, or
|
||||
absolute) to a list of mappings.
|
||||
|=======================================================================
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
index :
|
||||
analysis :
|
||||
analyzer :
|
||||
myAnalyzer :
|
||||
type : custom
|
||||
tokenizer : standard
|
||||
filter : [lowercase, custom_stems, porterStem]
|
||||
filter:
|
||||
custom_stems:
|
||||
type: stemmer_override
|
||||
rules_path : analysis/custom_stems.txt
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,78 @@
|
|||
[[analysis-stemmer-tokenfilter]]
|
||||
=== Stemmer Token Filter
|
||||
|
||||
A filter that stems words (similar to `snowball`, but with more
|
||||
options). The `language`/@name@ parameter controls the stemmer with the
|
||||
following available values:
|
||||
|
||||
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Far%2FArabicStemmer.html[arabic],
|
||||
http://snowball.tartarus.org/algorithms/armenian/stemmer.html[armenian],
|
||||
http://snowball.tartarus.org/algorithms/basque/stemmer.html[basque],
|
||||
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fbr%2FBrazilianStemmer.html[brazilian],
|
||||
http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf[bulgarian],
|
||||
http://snowball.tartarus.org/algorithms/catalan/stemmer.html[catalan],
|
||||
http://portal.acm.org/citation.cfm?id=1598600[czech],
|
||||
http://snowball.tartarus.org/algorithms/danish/stemmer.html[danish],
|
||||
http://snowball.tartarus.org/algorithms/dutch/stemmer.html[dutch],
|
||||
http://snowball.tartarus.org/algorithms/english/stemmer.html[english],
|
||||
http://snowball.tartarus.org/algorithms/finnish/stemmer.html[finnish],
|
||||
http://snowball.tartarus.org/algorithms/french/stemmer.html[french],
|
||||
http://snowball.tartarus.org/algorithms/german/stemmer.html[german],
|
||||
http://snowball.tartarus.org/algorithms/german2/stemmer.html[german2],
|
||||
http://sais.se/mthprize/2007/ntais2007.pdf[greek],
|
||||
http://snowball.tartarus.org/algorithms/hungarian/stemmer.html[hungarian],
|
||||
http://snowball.tartarus.org/algorithms/italian/stemmer.html[italian],
|
||||
http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html[kp],
|
||||
http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[kstem],
|
||||
http://snowball.tartarus.org/algorithms/lovins/stemmer.html[lovins],
|
||||
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Flv%2FLatvianStemmer.html[latvian],
|
||||
http://snowball.tartarus.org/algorithms/norwegian/stemmer.html[norwegian],
|
||||
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fno%2FNorwegianMinimalStemFilter.html[minimal_norwegian],
|
||||
http://snowball.tartarus.org/algorithms/porter/stemmer.html[porter],
|
||||
http://snowball.tartarus.org/algorithms/portuguese/stemmer.html[portuguese],
|
||||
http://snowball.tartarus.org/algorithms/romanian/stemmer.html[romanian],
|
||||
http://snowball.tartarus.org/algorithms/russian/stemmer.html[russian],
|
||||
http://snowball.tartarus.org/algorithms/spanish/stemmer.html[spanish],
|
||||
http://snowball.tartarus.org/algorithms/swedish/stemmer.html[swedish],
|
||||
http://snowball.tartarus.org/algorithms/turkish/stemmer.html[turkish],
|
||||
http://www.medialab.tfe.umu.se/courses/mdm0506a/material/fulltext_ID%3D10049387%26PLACEBO%3DIE.pdf[minimal_english],
|
||||
http://lucene.apache.org/core/4_3_0/analyzers-common/index.html?org%2Fapache%2Flucene%2Fanalysis%2Fen%2FEnglishPossessiveFilter.html[possessive_english],
|
||||
http://clef.isti.cnr.it/2003/WN_web/22.pdf[light_finish],
|
||||
http://dl.acm.org/citation.cfm?id=1141523[light_french],
|
||||
http://dl.acm.org/citation.cfm?id=318984[minimal_french],
|
||||
http://dl.acm.org/citation.cfm?id=1141523[light_german],
|
||||
http://members.unine.ch/jacques.savoy/clef/morpho.pdf[minimal_german],
|
||||
http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf[hindi],
|
||||
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[light_hungarian],
|
||||
http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf[indonesian],
|
||||
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[light_italian],
|
||||
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[light_portuguese],
|
||||
http://www.inf.ufrgs.br/\~buriol/papers/Orengo_CLEF07.pdf[minimal_portuguese],
|
||||
http://www.inf.ufrgs.br/\~viviane/rslp/index.htm[portuguese],
|
||||
http://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf[light_russian],
|
||||
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[light_spanish],
|
||||
http://clef.isti.cnr.it/2003/WN_web/22.pdf[light_swedish].
|
||||
|
||||
For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"my_analyzer" : {
|
||||
"tokenizer" : "standard",
|
||||
"filter" : ["standard", "lowercase", "my_stemmer"]
|
||||
}
|
||||
},
|
||||
"filter" : {
|
||||
"my_stemmer" : {
|
||||
"type" : "stemmer",
|
||||
"name" : "light_german"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,33 @@
|
|||
[[analysis-stop-tokenfilter]]
|
||||
=== Stop Token Filter
|
||||
|
||||
A token filter of type `stop` that removes stop words from token
|
||||
streams.
|
||||
|
||||
The following are settings that can be set for a `stop` token filter
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`stopwords` |A list of stop words to use. Defaults to english stop
|
||||
words.
|
||||
|
||||
|`stopwords_path` |A path (either relative to `config` location, or
|
||||
absolute) to a stopwords file configuration. Each stop word should be in
|
||||
its own "line" (separated by a line break). The file must be UTF-8
|
||||
encoded.
|
||||
|
||||
|`enable_position_increments` |Set to `true` if token positions should
|
||||
record the removed stop words, `false` otherwise. Defaults to `true`.
|
||||
|
||||
|`ignore_case` |Set to `true` to lower case all words first. Defaults to
|
||||
`false`.
|
||||
|=======================================================================
|
||||
|
||||
stopwords allow for custom language specific expansion of default
|
||||
stopwords. It follows the `_lang_` notation and supports: arabic,
|
||||
armenian, basque, brazilian, bulgarian, catalan, czech, danish, dutch,
|
||||
english, finnish, french, galician, german, greek, hindi, hungarian,
|
||||
indonesian, italian, norwegian, persian, portuguese, romanian, russian,
|
||||
spanish, swedish, turkish.
|
|
@ -0,0 +1,124 @@
|
|||
[[analysis-synonym-tokenfilter]]
|
||||
=== Synonym Token Filter
|
||||
|
||||
The `synonym` token filter allows to easily handle synonyms during the
|
||||
analysis process. Synonyms are configured using a configuration file.
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"index" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"synonym" : {
|
||||
"tokenizer" : "whitespace",
|
||||
"filter" : ["synonym"]
|
||||
}
|
||||
},
|
||||
"filter" : {
|
||||
"synonym" : {
|
||||
"type" : "synonym",
|
||||
"synonyms_path" : "analysis/synonym.txt"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The above configures a `synonym` filter, with a path of
|
||||
`analysis/synonym.txt` (relative to the `config` location). The
|
||||
`synonym` analyzer is then configured with the filter. Additional
|
||||
settings are: `ignore_case` (defaults to `false`), and `expand`
|
||||
(defaults to `true`).
|
||||
|
||||
The `tokenizer` parameter controls the tokenizers that will be used to
|
||||
tokenize the synonym, and defaults to the `whitespace` tokenizer.
|
||||
|
||||
As of elasticsearch 0.17.9 two synonym formats are supported: Solr,
|
||||
WordNet.
|
||||
|
||||
[float]
|
||||
==== Solr synonyms
|
||||
|
||||
The following is a sample format of the file:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
# blank lines and lines starting with pound are comments.
|
||||
|
||||
#Explicit mappings match any token sequence on the LHS of "=>"
|
||||
#and replace with all alternatives on the RHS. These types of mappings
|
||||
#ignore the expand parameter in the schema.
|
||||
#Examples:
|
||||
i-pod, i pod => ipod,
|
||||
sea biscuit, sea biscit => seabiscuit
|
||||
|
||||
#Equivalent synonyms may be separated with commas and give
|
||||
#no explicit mapping. In this case the mapping behavior will
|
||||
#be taken from the expand parameter in the schema. This allows
|
||||
#the same synonym file to be used in different synonym handling strategies.
|
||||
#Examples:
|
||||
ipod, i-pod, i pod
|
||||
foozball , foosball
|
||||
universe , cosmos
|
||||
|
||||
# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
|
||||
ipod, i-pod, i pod => ipod, i-pod, i pod
|
||||
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
|
||||
ipod, i-pod, i pod => ipod
|
||||
|
||||
#multiple synonym mapping entries are merged.
|
||||
foo => foo bar
|
||||
foo => baz
|
||||
#is equivalent to
|
||||
foo => foo bar, baz
|
||||
--------------------------------------------------
|
||||
|
||||
You can also define synonyms for the filter directly in the
|
||||
configuration file (note use of `synonyms` instead of `synonyms_path`):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"filter" : {
|
||||
"synonym" : {
|
||||
"type" : "synonym",
|
||||
"synonyms" : [
|
||||
"i-pod, i pod => ipod",
|
||||
"universe, cosmos"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
However, it is recommended to define large synonyms set in a file using
|
||||
`synonyms_path`.
|
||||
|
||||
[float]
|
||||
==== WordNet synonyms
|
||||
|
||||
Synonyms based on http://wordnet.princeton.edu/[WordNet] format can be
|
||||
declared using `format`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"filter" : {
|
||||
"synonym" : {
|
||||
"type" : "synonym",
|
||||
"format" : "wordnet",
|
||||
"synonyms" : [
|
||||
"s(100000001,1,'abstain',v,1,0).",
|
||||
"s(100000001,2,'refrain',v,1,0).",
|
||||
"s(100000001,3,'desist',v,1,0)."
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Using `synonyms_path` to define WordNet synonyms in a file is supported
|
||||
as well.
|
|
@ -0,0 +1,4 @@
|
|||
[[analysis-trim-tokenfilter]]
|
||||
=== Trim Token Filter
|
||||
|
||||
The `trim` token filter trims the whitespace surrounding a token.
|
|
@ -0,0 +1,10 @@
|
|||
[[analysis-truncate-tokenfilter]]
|
||||
=== Truncate Token Filter
|
||||
|
||||
The `truncate` token filter can be used to truncate tokens into a
|
||||
specific length. This can come in handy with keyword (single token)
|
||||
based mapped fields that are used for sorting in order to reduce memory
|
||||
usage.
|
||||
|
||||
It accepts a `length` parameter which control the number of characters
|
||||
to truncate to, defaults to `10`.
|
|
@ -0,0 +1,7 @@
|
|||
[[analysis-unique-tokenfilter]]
|
||||
=== Unique Token Filter
|
||||
|
||||
The `unique` token filter can be used to only index unique tokens during
|
||||
analysis. By default it is applied on all the token stream. If
|
||||
`only_on_same_position` is set to `true`, it will only remove duplicate
|
||||
tokens on the same position.
|
|
@ -0,0 +1,80 @@
|
|||
[[analysis-word-delimiter-tokenfilter]]
|
||||
=== Word Delimiter Token Filter
|
||||
|
||||
Named `word_delimiter`, it Splits words into subwords and performs
|
||||
optional transformations on subword groups. Words are split into
|
||||
subwords with the following rules:
|
||||
|
||||
* split on intra-word delimiters (by default, all non alpha-numeric
|
||||
characters).
|
||||
* "Wi-Fi" -> "Wi", "Fi"
|
||||
* split on case transitions: "PowerShot" -> "Power", "Shot"
|
||||
* split on letter-number transitions: "SD500" -> "SD", "500"
|
||||
* leading and trailing intra-word delimiters on each subword are
|
||||
ignored: "//hello---there, 'dude'" -> "hello", "there", "dude"
|
||||
* trailing "'s" are removed for each subword: "O'Neil's" -> "O", "Neil"
|
||||
|
||||
Parameters include:
|
||||
|
||||
`generate_word_parts`::
|
||||
If `true` causes parts of words to be
|
||||
generated: "PowerShot" => "Power" "Shot". Defaults to `true`.
|
||||
|
||||
`generate_number_parts`::
|
||||
If `true` causes number subwords to be
|
||||
generated: "500-42" => "500" "42". Defaults to `true`.
|
||||
|
||||
`catenate_words`::
|
||||
If `true` causes maximum runs of word parts to be
|
||||
catenated: "wi-fi" => "wifi". Defaults to `false`.
|
||||
|
||||
`catenate_numbers`::
|
||||
If `true` causes maximum runs of number parts to
|
||||
be catenated: "500-42" => "50042". Defaults to `false`.
|
||||
|
||||
`catenate_all`::
|
||||
If `true` causes all subword parts to be catenated:
|
||||
"wi-fi-4000" => "wifi4000". Defaults to `false`.
|
||||
|
||||
`split_on_case_change`::
|
||||
If `true` causes "PowerShot" to be two tokens;
|
||||
("Power-Shot" remains two parts regards). Defaults to `true`.
|
||||
|
||||
`preserve_original`::
|
||||
If `true` includes original words in subwords:
|
||||
"500-42" => "500-42" "500" "42". Defaults to `false`.
|
||||
|
||||
`split_on_numerics`::
|
||||
If `true` causes "j2se" to be three tokens; "j"
|
||||
"2" "se". Defaults to `true`.
|
||||
|
||||
`stem_english_possessive`::
|
||||
If `true` causes trailing "'s" to be
|
||||
removed for each subword: "O'Neil's" => "O", "Neil". Defaults to `true`.
|
||||
|
||||
Advance settings include:
|
||||
|
||||
`protected_words`::
|
||||
A list of protected words from being delimiter.
|
||||
Either an array, or also can set `protected_words_path` which resolved
|
||||
to a file configured with protected words (one on each line).
|
||||
Automatically resolves to `config/` based location if exists.
|
||||
|
||||
`type_table`::
|
||||
A custom type mapping table, for example (when configured
|
||||
using `type_table_path`):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
# Map the $, %, '.', and ',' characters to DIGIT
|
||||
# This might be useful for financial data.
|
||||
$ => DIGIT
|
||||
% => DIGIT
|
||||
. => DIGIT
|
||||
\\u002C => DIGIT
|
||||
|
||||
# in some cases you might not want to split on ZWJ
|
||||
# this also tests the case where we need a bigger byte[]
|
||||
# see http://en.wikipedia.org/wiki/Zero-width_joiner
|
||||
\\u200D => ALPHANUM
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,30 @@
|
|||
[[analysis-tokenizers]]
|
||||
== Tokenizers
|
||||
|
||||
Tokenizers are used to break a string down into a stream of terms
|
||||
or tokens. A simple tokenizer might split the string up into terms
|
||||
wherever it encounters whitespace or punctuation.
|
||||
|
||||
Elasticsearch has a number of built in tokenizers which can be
|
||||
used to build <<analysis-custom-analyzer,custom analyzers>>.
|
||||
|
||||
include::tokenizers/standard-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/edgengram-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/keyword-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/letter-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/lowercase-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/ngram-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/whitespace-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/pattern-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/uaxurlemail-tokenizer.asciidoc[]
|
||||
|
||||
include::tokenizers/pathhierarchy-tokenizer.asciidoc[]
|
||||
|
|
@ -0,0 +1,80 @@
|
|||
[[analysis-edgengram-tokenizer]]
|
||||
=== Edge NGram Tokenizer
|
||||
|
||||
A tokenizer of type `edgeNGram`.
|
||||
|
||||
This tokenizer is very similar to `nGram` but only keeps n-grams which
|
||||
start at the beginning of a token.
|
||||
|
||||
The following are settings that can be set for a `edgeNGram` tokenizer
|
||||
type:
|
||||
|
||||
[cols="<,<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description |Default value
|
||||
|`min_gram` |Minimum size in codepoints of a single n-gram |`1`.
|
||||
|
||||
|`max_gram` |Maximum size in codepoints of a single n-gram |`2`.
|
||||
|
||||
|`token_chars` |(Since `0.90.2`) Characters classes to keep in the
|
||||
tokens, Elasticsearch will split on characters that don't belong to any
|
||||
of these classes. |`[]` (Keep all characters)
|
||||
|=======================================================================
|
||||
|
||||
|
||||
`token_chars` accepts the following character classes:
|
||||
|
||||
[horizontal]
|
||||
`letter`:: for example `a`, `b`, `ï` or `京`
|
||||
`digit`:: for example `3` or `7`
|
||||
`whitespace`:: for example `" "` or `"\n"`
|
||||
`punctuation`:: for example `!` or `"`
|
||||
`symbol`:: for example `$` or `â`
|
||||
|
||||
[float]
|
||||
==== Example
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XPUT 'localhost:9200/test' -d '
|
||||
{
|
||||
"settings" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"my_edge_ngram_analyzer" : {
|
||||
"tokenizer" : "my_edge_ngram_tokenizer"
|
||||
}
|
||||
},
|
||||
"tokenizer" : {
|
||||
"my_edge_ngram_tokenizer" : {
|
||||
"type" : "edgeNGram",
|
||||
"min_gram" : "2",
|
||||
"max_gram" : "5",
|
||||
"token_chars": [ "letter", "digit" ]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
|
||||
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'FC Schalke 04'
|
||||
# FC, Sc, Sch, Scha, Schal, 04
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== `side` deprecated
|
||||
|
||||
There used to be a @side@ parameter up to `0.90.1` but it is now deprecated. In
|
||||
order to emulate the behavior of `"side" : "BACK"` a
|
||||
<<analysis-reverse-tokenfilter,`reverse` token filter>> should be used together
|
||||
with the <<analysis-edgengram-tokenfilter,`edgeNGram` token filter>>. The
|
||||
`edgeNGram` filter must be enclosed in `reverse` filters like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"filter" : ["reverse", "edgeNGram", "reverse"]
|
||||
--------------------------------------------------
|
||||
|
||||
which essentially reverses the token, builds front `EdgeNGrams` and reverses
|
||||
the ngram again. This has the same effect as the previous `"side" : "BACK"` setting.
|
||||
|
|
@ -0,0 +1,15 @@
|
|||
[[analysis-keyword-tokenizer]]
|
||||
=== Keyword Tokenizer
|
||||
|
||||
A tokenizer of type `keyword` that emits the entire input as a single
|
||||
input.
|
||||
|
||||
The following are settings that can be set for a `keyword` tokenizer
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================
|
||||
|Setting |Description
|
||||
|`buffer_size` |The term buffer size. Defaults to `256`.
|
||||
|=======================================================
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
[[analysis-letter-tokenizer]]
|
||||
=== Letter Tokenizer
|
||||
|
||||
A tokenizer of type `letter` that divides text at non-letters. That's to
|
||||
say, it defines tokens as maximal strings of adjacent letters. Note,
|
||||
this does a decent job for most European languages, but does a terrible
|
||||
job for some Asian languages, where words are not separated by spaces.
|
|
@ -0,0 +1,15 @@
|
|||
[[analysis-lowercase-tokenizer]]
|
||||
=== Lowercase Tokenizer
|
||||
|
||||
A tokenizer of type `lowercase` that performs the function of
|
||||
<<analysis-letter-tokenizer,Letter
|
||||
Tokenizer>> and
|
||||
<<analysis-lowercase-tokenfilter,Lower
|
||||
Case Token Filter>> together. It divides text at non-letters and converts
|
||||
them to lower case. While it is functionally equivalent to the
|
||||
combination of
|
||||
<<analysis-letter-tokenizer,Letter
|
||||
Tokenizer>> and
|
||||
<<analysis-lowercase-tokenizer,Lower
|
||||
Case Token Filter>>, there is a performance advantage to doing the two
|
||||
tasks at once, hence this (redundant) implementation.
|
|
@ -0,0 +1,57 @@
|
|||
[[analysis-ngram-tokenizer]]
|
||||
=== NGram Tokenizer
|
||||
|
||||
A tokenizer of type `nGram`.
|
||||
|
||||
The following are settings that can be set for a `nGram` tokenizer type:
|
||||
|
||||
[cols="<,<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description |Default value
|
||||
|`min_gram` |Minimum size in codepoints of a single n-gram |`1`.
|
||||
|
||||
|`max_gram` |Maximum size in codepoints of a single n-gram |`2`.
|
||||
|
||||
|`token_chars` |(Since `0.90.2`) Characters classes to keep in the
|
||||
tokens, Elasticsearch will split on characters that don't belong to any
|
||||
of these classes. |`[]` (Keep all characters)
|
||||
|=======================================================================
|
||||
|
||||
`token_chars` accepts the following character classes:
|
||||
|
||||
[horizontal]
|
||||
`letter`:: for example `a`, `b`, `ï` or `京`
|
||||
`digit`:: for example `3` or `7`
|
||||
`whitespace`:: for example `" "` or `"\n"`
|
||||
`punctuation`:: for example `!` or `"`
|
||||
`symbol`:: for example `$` or `â`
|
||||
|
||||
[float]
|
||||
==== Example
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XPUT 'localhost:9200/test' -d '
|
||||
{
|
||||
"settings" : {
|
||||
"analysis" : {
|
||||
"analyzer" : {
|
||||
"my_ngram_analyzer" : {
|
||||
"tokenizer" : "my_ngram_tokenizer"
|
||||
}
|
||||
},
|
||||
"tokenizer" : {
|
||||
"my_ngram_tokenizer" : {
|
||||
"type" : "nGram",
|
||||
"min_gram" : "2",
|
||||
"max_gram" : "3",
|
||||
"token_chars": [ "letter", "digit" ]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
|
||||
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'
|
||||
# FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,32 @@
|
|||
[[analysis-pathhierarchy-tokenizer]]
|
||||
=== Path Hierarchy Tokenizer
|
||||
|
||||
The `path_hierarchy` tokenizer takes something like this:
|
||||
|
||||
-------------------------
|
||||
/something/something/else
|
||||
-------------------------
|
||||
|
||||
And produces tokens:
|
||||
|
||||
-------------------------
|
||||
/something
|
||||
/something/something
|
||||
/something/something/else
|
||||
-------------------------
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`delimiter` |The character delimiter to use, defaults to `/`.
|
||||
|
||||
|`replacement` |An optional replacement character to use. Defaults to
|
||||
the `delimiter`.
|
||||
|
||||
|`buffer_size` |The buffer size to use, defaults to `1024`.
|
||||
|
||||
|`reverse` |Generates tokens in reverse order, defaults to `false`.
|
||||
|
||||
|`skip` |Controls initial tokens to skip, defaults to `0`.
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,29 @@
|
|||
[[analysis-pattern-tokenizer]]
|
||||
=== Pattern Tokenizer
|
||||
|
||||
A tokenizer of type `pattern` that can flexibly separate text into terms
|
||||
via a regular expression. Accepts the following settings:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|======================================================================
|
||||
|Setting |Description
|
||||
|`pattern` |The regular expression pattern, defaults to `\\W+`.
|
||||
|`flags` |The regular expression flags.
|
||||
|`group` |Which group to extract into tokens. Defaults to `-1` (split).
|
||||
|======================================================================
|
||||
|
||||
*IMPORTANT*: The regular expression should match the *token separators*,
|
||||
not the tokens themselves.
|
||||
|
||||
`group` set to `-1` (the default) is equivalent to "split". Using group
|
||||
>= 0 selects the matching group as the token. For example, if you have:
|
||||
|
||||
------------------------
|
||||
pattern = \\'([^\']+)\\'
|
||||
group = 0
|
||||
input = aaa 'bbb' 'ccc'
|
||||
------------------------
|
||||
|
||||
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks).
|
||||
With the same input but using group=1, the output would be: bbb and ccc
|
||||
(no ' marks).
|
|
@ -0,0 +1,18 @@
|
|||
[[analysis-standard-tokenizer]]
|
||||
=== Standard Tokenizer
|
||||
|
||||
A tokenizer of type `standard` providing grammar based tokenizer that is
|
||||
a good tokenizer for most European language documents. The tokenizer
|
||||
implements the Unicode Text Segmentation algorithm, as specified in
|
||||
http://unicode.org/reports/tr29/[Unicode Standard Annex #29].
|
||||
|
||||
The following are settings that can be set for a `standard` tokenizer
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`max_token_length` |The maximum token length. If a token is seen that
|
||||
exceeds this length then it is discarded. Defaults to `255`.
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,16 @@
|
|||
[[analysis-uaxurlemail-tokenizer]]
|
||||
=== UAX Email URL Tokenizer
|
||||
|
||||
A tokenizer of type `uax_url_email` which works exactly like the
|
||||
`standard` tokenizer, but tokenizes emails and urls as single tokens.
|
||||
|
||||
The following are settings that can be set for a `uax_url_email`
|
||||
tokenizer type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`max_token_length` |The maximum token length. If a token is seen that
|
||||
exceeds this length then it is discarded. Defaults to `255`.
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
[[analysis-whitespace-tokenizer]]
|
||||
=== Whitespace Tokenizer
|
||||
|
||||
A tokenizer of type `whitespace` that divides text at whitespace.
|
|
@ -0,0 +1,46 @@
|
|||
[[cluster]]
|
||||
= Cluster APIs
|
||||
|
||||
[partintro]
|
||||
--
|
||||
["float",id="cluster-nodes"]
|
||||
== Nodes
|
||||
|
||||
Most cluster level APIs allow to specify which nodes to execute on (for
|
||||
example, getting the node stats for a node). Nodes can be identified in
|
||||
the APIs either using their internal node id, the node name, address,
|
||||
custom attributes, or just the `_local` node receiving the request. For
|
||||
example, here are some sample executions of nodes info:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
# Local
|
||||
curl localhost:9200/_cluster/nodes/_local
|
||||
# Address
|
||||
curl localhost:9200/_cluster/nodes/10.0.0.3,10.0.0.4
|
||||
curl localhost:9200/_cluster/nodes/10.0.0.*
|
||||
# Names
|
||||
curl localhost:9200/_cluster/nodes/node_name_goes_here
|
||||
curl localhost:9200/_cluster/nodes/node_name_goes_*
|
||||
# Attributes (set something like node.rack: 2 in the config)
|
||||
curl localhost:9200/_cluster/nodes/rack:2
|
||||
curl localhost:9200/_cluster/nodes/ra*:2
|
||||
curl localhost:9200/_cluster/nodes/ra*:2*
|
||||
--------------------------------------------------
|
||||
--
|
||||
|
||||
include::cluster/health.asciidoc[]
|
||||
|
||||
include::cluster/state.asciidoc[]
|
||||
|
||||
include::cluster/reroute.asciidoc[]
|
||||
|
||||
include::cluster/update-settings.asciidoc[]
|
||||
|
||||
include::cluster/nodes-stats.asciidoc[]
|
||||
|
||||
include::cluster/nodes-info.asciidoc[]
|
||||
|
||||
include::cluster/nodes-hot-threads.asciidoc[]
|
||||
|
||||
include::cluster/nodes-shutdown.asciidoc[]
|
|
@ -0,0 +1,86 @@
|
|||
[[cluster-health]]
|
||||
== Cluster Health
|
||||
|
||||
The cluster health API allows to get a very simple status on the health
|
||||
of the cluster.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
|
||||
{
|
||||
"cluster_name" : "testcluster",
|
||||
"status" : "green",
|
||||
"timed_out" : false,
|
||||
"number_of_nodes" : 2,
|
||||
"number_of_data_nodes" : 2,
|
||||
"active_primary_shards" : 5,
|
||||
"active_shards" : 10,
|
||||
"relocating_shards" : 0,
|
||||
"initializing_shards" : 0,
|
||||
"unassigned_shards" : 0
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The API can also be executed against one or more indices to get just the
|
||||
specified indices health:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/_cluster/health/test1,test2'
|
||||
--------------------------------------------------
|
||||
|
||||
The cluster health status is: `green`, `yellow` or `red`. On the shard
|
||||
level, a `red` status indicates that the specific shard is not allocated
|
||||
in the cluster, `yellow` means that the primary shard is allocated but
|
||||
replicas are not, and `green` means that all shards are allocated. The
|
||||
index level status is controlled by the worst shard status. The cluster
|
||||
status is controlled by the worst index status.
|
||||
|
||||
One of the main benefits of the API is the ability to wait until the
|
||||
cluster reaches a certain high water-mark health level. For example, the
|
||||
following will wait till the cluster reaches the `yellow` level for 50
|
||||
seconds (if it reaches the `green` or `yellow` status beforehand, it
|
||||
will return):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=50s'
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Request Parameters
|
||||
|
||||
The cluster health API accepts the following request parameters:
|
||||
|
||||
`level`::
|
||||
Can be one of `cluster`, `indices` or `shards`. Controls the
|
||||
details level of the health information returned. Defaults to `cluster`.
|
||||
|
||||
`wait_for_status`::
|
||||
One of `green`, `yellow` or `red`. Will wait (until
|
||||
the timeout provided) until the status of the cluster changes to the one
|
||||
provided. By default, will not wait for any status.
|
||||
|
||||
`wait_for_relocating_shards`::
|
||||
A number controlling to how many relocating
|
||||
shards to wait for. Usually will be `0` to indicate to wait till all
|
||||
relocation have happened. Defaults to not to wait.
|
||||
|
||||
`wait_for_nodes`::
|
||||
The request waits until the specified number `N` of
|
||||
nodes is available. It also accepts `>=N`, `<=N`, `>N` and `<N`.
|
||||
Alternatively, it is possible to use `ge(N)`, `le(N)`, `gt(N)` and
|
||||
`lt(N)` notation.
|
||||
|
||||
`timeout`::
|
||||
A time based parameter controlling how long to wait if one of
|
||||
the wait_for_XXX are provided. Defaults to `30s`.
|
||||
|
||||
|
||||
The following is an example of getting the cluster health at the
|
||||
`shards` level:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/_cluster/health/twitter?level=shards'
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,16 @@
|
|||
[[cluster-nodes-hot-threads]]
|
||||
== Nodes hot_threads
|
||||
|
||||
An API allowing to get the current hot threads on each node in the
|
||||
cluster. Endpoints are `/_nodes/hot_threads`, and
|
||||
`/_nodes/{nodesIds}/hot_threads`. This API is experimental.
|
||||
|
||||
The output is plain text with a breakdown of each node's top hot
|
||||
threads. Parameters allowed are:
|
||||
|
||||
[horizontal]
|
||||
`threads`:: number of hot threads to provide, defaults to 3.
|
||||
`interval`:: the interval to do the second sampling of threads.
|
||||
Defaults to 500ms.
|
||||
`type`:: The type to sample, defaults to cpu, but supports wait and
|
||||
block to see hot threads that are in wait or block state.
|
|
@ -0,0 +1,98 @@
|
|||
[[cluster-nodes-info]]
|
||||
== Nodes Info
|
||||
|
||||
The cluster nodes info API allows to retrieve one or more (or all) of
|
||||
the cluster nodes information.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XGET 'http://localhost:9200/_cluster/nodes'
|
||||
curl -XGET 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2'
|
||||
|
||||
# Shorter Format
|
||||
curl -XGET 'http://localhost:9200/_nodes'
|
||||
curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2'
|
||||
--------------------------------------------------
|
||||
|
||||
The first command retrieves information of all the nodes in the cluster.
|
||||
The second command selectively retrieves nodes information of only
|
||||
`nodeId1` and `nodeId2`. All the nodes selective options are explained
|
||||
<<cluster-nodes,here>>.
|
||||
|
||||
By default, it just returns the attributes and core settings for a node.
|
||||
It also allows to get information on `settings`, `os`, `process`, `jvm`,
|
||||
`thread_pool`, `network`, `transport`, `http` and `plugin`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XGET 'http://localhost:9200/_nodes?os=true&process=true'
|
||||
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/?os=true&process=true'
|
||||
|
||||
# Or, specific type endpoint:
|
||||
|
||||
curl -XGET 'http://localhost:9200/_nodes/process'
|
||||
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/process'
|
||||
--------------------------------------------------
|
||||
|
||||
The `all` flag can be set to return all the information.
|
||||
|
||||
`plugin` - if set, the result will contain details about the loaded
|
||||
plugins per node:
|
||||
|
||||
* `name`: plugin name
|
||||
* `description`: plugin description if any
|
||||
* `site`: `true` if the plugin is a site plugin
|
||||
* `jvm`: `true` if the plugin is a plugin running in the JVM
|
||||
* `url`: URL if the plugin is a site plugin
|
||||
|
||||
The result will look similar to:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"ok" : true,
|
||||
"cluster_name" : "test-cluster-MacBook-Air-de-David.local",
|
||||
"nodes" : {
|
||||
"hJLXmY_NTrCytiIMbX4_1g" : {
|
||||
"name" : "node4",
|
||||
"transport_address" : "inet[/172.18.58.139:9303]",
|
||||
"hostname" : "MacBook-Air-de-David.local",
|
||||
"version" : "0.90.0.Beta2-SNAPSHOT",
|
||||
"http_address" : "inet[/172.18.58.139:9203]",
|
||||
"plugins" : [ {
|
||||
"name" : "test-plugin",
|
||||
"description" : "test-plugin description",
|
||||
"site" : true,
|
||||
"jvm" : false
|
||||
}, {
|
||||
"name" : "test-no-version-plugin",
|
||||
"description" : "test-no-version-plugin description",
|
||||
"site" : true,
|
||||
"jvm" : false
|
||||
}, {
|
||||
"name" : "dummy",
|
||||
"description" : "No description found for dummy.",
|
||||
"url" : "/_plugin/dummy/",
|
||||
"site" : false,
|
||||
"jvm" : true
|
||||
} ]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
if your `plugin` data is subject to change use
|
||||
`plugins.info_refresh_interval` to change or disable the caching
|
||||
interval:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
# Change cache to 20 seconds
|
||||
plugins.info_refresh_interval: 20s
|
||||
|
||||
# Infinite cache
|
||||
plugins.info_refresh_interval: -1
|
||||
|
||||
# Disable cache
|
||||
plugins.info_refresh_interval: 0
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,56 @@
|
|||
[[cluster-nodes-shutdown]]
|
||||
== Nodes Shutdown
|
||||
|
||||
The nodes shutdown API allows to shutdown one or more (or all) nodes in
|
||||
the cluster. Here is an example of shutting the `_local` node the
|
||||
request is directed to:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
|
||||
--------------------------------------------------
|
||||
|
||||
Specific node(s) can be shutdown as well using their respective node ids
|
||||
(or other selective options as explained
|
||||
<<cluster-nodes,here>> .):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/_shutdown'
|
||||
--------------------------------------------------
|
||||
|
||||
The master (of the cluster) can also be shutdown using:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_master/_shutdown'
|
||||
--------------------------------------------------
|
||||
|
||||
Finally, all nodes can be shutdown using one of the options below:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XPOST 'http://localhost:9200/_shutdown'
|
||||
|
||||
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'
|
||||
|
||||
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_all/_shutdown'
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Delay
|
||||
|
||||
By default, the shutdown will be executed after a 1 second delay (`1s`).
|
||||
The delay can be customized by setting the `delay` parameter in a time
|
||||
value format. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown?delay=10s'
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Disable Shutdown
|
||||
|
||||
The shutdown API can be disabled by setting `action.disable_shutdown` in
|
||||
the node configuration.
|
|
@ -0,0 +1,100 @@
|
|||
[[cluster-nodes-stats]]
|
||||
== Nodes Stats
|
||||
|
||||
[float]
|
||||
=== Nodes statistics
|
||||
|
||||
The cluster nodes stats API allows to retrieve one or more (or all) of
|
||||
the cluster nodes statistics.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XGET 'http://localhost:9200/_cluster/nodes/stats'
|
||||
curl -XGET 'http://localhost:9200/_cluster/nodes/nodeId1,nodeId2/stats'
|
||||
|
||||
# simplified
|
||||
curl -XGET 'http://localhost:9200/_nodes/stats'
|
||||
curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/stats'
|
||||
--------------------------------------------------
|
||||
|
||||
The first command retrieves stats of all the nodes in the cluster. The
|
||||
second command selectively retrieves nodes stats of only `nodeId1` and
|
||||
`nodeId2`. All the nodes selective options are explained
|
||||
<<cluster-nodes,here>>.
|
||||
|
||||
By default, `indices` stats are returned. With options for `indices`,
|
||||
`os`, `process`, `jvm`, `network`, `transport`, `http`, `fs`, and
|
||||
`thread_pool`. For example:
|
||||
|
||||
[horizontal]
|
||||
`indices`::
|
||||
Indices stats about size, document count, indexing and
|
||||
deletion times, search times, field cache size , merges and flushes
|
||||
|
||||
`fs`::
|
||||
File system information, data path, free disk space, read/write
|
||||
stats
|
||||
|
||||
`http`::
|
||||
HTTP connection information
|
||||
|
||||
`jvm`::
|
||||
JVM stats, memory pool information, garbage collection, buffer
|
||||
pools
|
||||
|
||||
`network`::
|
||||
TCP information
|
||||
|
||||
`os`::
|
||||
Operating system stats, load average, cpu, mem, swap
|
||||
|
||||
`process`::
|
||||
Process statistics, memory consumption, cpu usage, open
|
||||
file descriptors
|
||||
|
||||
`thread_pool`::
|
||||
Statistics about each thread pool, including current
|
||||
size, queue and rejected tasks
|
||||
|
||||
`transport`::
|
||||
Transport statistics about sent and received bytes in
|
||||
cluster communication
|
||||
|
||||
`clear`::
|
||||
Clears all the flags (first). Useful, if you only want to
|
||||
retrieve specific stats.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
# return indices and os
|
||||
curl -XGET 'http://localhost:9200/_nodes/stats?os=true'
|
||||
# return just os and process
|
||||
curl -XGET 'http://localhost:9200/_nodes/stats?clear=true&os=true&process=true'
|
||||
# specific type endpoint
|
||||
curl -XGET 'http://localhost:9200/_nodes/process/stats'
|
||||
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/process/stats'
|
||||
# or, if you like the other way
|
||||
curl -XGET 'http://localhost:9200/_nodes/stats/process'
|
||||
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/stats/process'
|
||||
--------------------------------------------------
|
||||
|
||||
The `all` flag can be set to return all the stats.
|
||||
|
||||
[float]
|
||||
=== Field data statistics
|
||||
|
||||
From 0.90, you can get information about field data memory usage on node
|
||||
level or on index level.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
# Node Stats
|
||||
curl localhost:9200/_nodes/stats/indices/fielddata/field1,field2?pretty
|
||||
|
||||
# Indices Stat
|
||||
curl localhost:9200/_stats/fielddata/field1,field2?pretty
|
||||
|
||||
# You can use wildcards for field names
|
||||
curl localhost:9200/_stats/fielddata/field*?pretty
|
||||
curl localhost:9200/_nodes/stats/indices/fielddata/field*?pretty
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,68 @@
|
|||
[[cluster-reroute]]
|
||||
== Cluster Reroute
|
||||
|
||||
The reroute command allows to explicitly execute a cluster reroute
|
||||
allocation command including specific commands. For example, a shard can
|
||||
be moved from one node to another explicitly, an allocation can be
|
||||
canceled, or an unassigned shard can be explicitly allocated on a
|
||||
specific node.
|
||||
|
||||
Here is a short example of how a simple reroute API call:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
||||
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
|
||||
"commands" : [ {
|
||||
"move" :
|
||||
{
|
||||
"index" : "test", "shard" : 0,
|
||||
"from_node" : "node1", "to_node" : "node2"
|
||||
}
|
||||
},
|
||||
{
|
||||
"allocate" : {
|
||||
"index" : "test", "shard" : 1, "node" : "node3"
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
An important aspect to remember is the fact that once when an allocation
|
||||
occurs, the cluster will aim at re-balancing its state back to an even
|
||||
state. For example, if the allocation includes moving a shard from
|
||||
`node1` to `node2`, in an `even` state, then another shard will be moved
|
||||
from `node2` to `node1` to even things out.
|
||||
|
||||
The cluster can be set to disable allocations, which means that only the
|
||||
explicitly allocations will be performed. Obviously, only once all
|
||||
commands has been applied, the cluster will aim to be re-balance its
|
||||
state.
|
||||
|
||||
Another option is to run the commands in `dry_run` (as a URI flag, or in
|
||||
the request body). This will cause the commands to apply to the current
|
||||
cluster state, and return the resulting cluster after the commands (and
|
||||
re-balancing) has been applied.
|
||||
|
||||
The commands supported are:
|
||||
|
||||
`move`::
|
||||
Move a started shard from one node to another node. Accepts
|
||||
`index` and `shard` for index name and shard number, `from_node` for the
|
||||
node to move the shard `from`, and `to_node` for the node to move the
|
||||
shard to.
|
||||
|
||||
`cancel`::
|
||||
Cancel allocation of a shard (or recovery). Accepts `index`
|
||||
and `shard` for index name and shard number, and `node` for the node to
|
||||
cancel the shard allocation on. It also accepts `allow_primary` flag to
|
||||
explicitly specify that it is allowed to cancel allocation for a primary
|
||||
shard.
|
||||
|
||||
`allocate`::
|
||||
Allocate an unassigned shard to a node. Accepts the
|
||||
`index` and `shard` for index name and shard number, and `node` to
|
||||
allocate the shard to. It also accepts `allow_primary` flag to
|
||||
explicitly specify that it is allowed to explicitly allocate a primary
|
||||
shard (might result in data loss).
|
|
@ -0,0 +1,48 @@
|
|||
[[cluster-state]]
|
||||
== Cluster State
|
||||
|
||||
The cluster state API allows to get a comprehensive state information of
|
||||
the whole cluster.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/_cluster/state'
|
||||
--------------------------------------------------
|
||||
|
||||
By default, the cluster state request is routed to the master node, to
|
||||
ensure that the latest cluster state is returned.
|
||||
For debugging purposes, you can retrieve the cluster state local to a
|
||||
particular node by adding `local=true` to the query string.
|
||||
|
||||
[float]
|
||||
=== Response Filters
|
||||
|
||||
It is possible to filter the cluster state response using the following
|
||||
REST parameters:
|
||||
|
||||
`filter_nodes`::
|
||||
Set to `true` to filter out the `nodes` part of the
|
||||
response.
|
||||
|
||||
`filter_routing_table`::
|
||||
Set to `true` to filter out the `routing_table`
|
||||
part of the response.
|
||||
|
||||
`filter_metadata`::
|
||||
Set to `true` to filter out the `metadata` part of the
|
||||
response.
|
||||
|
||||
`filter_blocks`::
|
||||
Set to `true` to filter out the `blocks` part of the
|
||||
response.
|
||||
|
||||
`filter_indices`::
|
||||
When not filtering metadata, a comma separated list of
|
||||
indices to include in the response.
|
||||
|
||||
Example follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XGET 'http://localhost:9200/_cluster/state?filter_nodes=true'
|
||||
--------------------------------------------------
|
|
@ -0,0 +1,198 @@
|
|||
[[cluster-update-settings]]
|
||||
== Cluster Update Settings
|
||||
|
||||
Allows to update cluster wide specific settings. Settings updated can
|
||||
either be persistent (applied cross restarts) or transient (will not
|
||||
survive a full cluster restart). Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XPUT localhost:9200/_cluster/settings -d '{
|
||||
"persistent" : {
|
||||
"discovery.zen.minimum_master_nodes" : 2
|
||||
}
|
||||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
Or:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XPUT localhost:9200/_cluster/settings -d '{
|
||||
"transient" : {
|
||||
"discovery.zen.minimum_master_nodes" : 2
|
||||
}
|
||||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
The cluster responds with the settings updated. So the response for the
|
||||
last example will be:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"persistent" : {},
|
||||
"transient" : {
|
||||
"discovery.zen.minimum_master_nodes" : "2"
|
||||
}
|
||||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
Cluster wide settings can be returned using:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XGET localhost:9200/_cluster/settings
|
||||
--------------------------------------------------
|
||||
|
||||
There is a specific list of settings that can be updated, those include:
|
||||
|
||||
[float]
|
||||
=== Cluster settings
|
||||
|
||||
[float]
|
||||
==== Routing allocation
|
||||
|
||||
[float]
|
||||
===== Awareness
|
||||
|
||||
`cluster.routing.allocation.awareness.attributes`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
`cluster.routing.allocation.awareness.force.*`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
[float]
|
||||
===== Balanced Shards
|
||||
|
||||
`cluster.routing.allocation.balance.shard`::
|
||||
Defines the weight factor for shards allocated on a node
|
||||
(float). Defaults to `0.45f`.
|
||||
|
||||
`cluster.routing.allocation.balance.index`::
|
||||
Defines a factor to the number of shards per index allocated
|
||||
on a specific node (float). Defaults to `0.5f`.
|
||||
|
||||
`cluster.routing.allocation.balance.primary`::
|
||||
defines a weight factor for the number of primaries of a specific index
|
||||
allocated on a node (float). `0.05f`.
|
||||
|
||||
`cluster.routing.allocation.balance.threshold`::
|
||||
minimal optimization value of operations that should be performed (non
|
||||
negative float). Defaults to `1.0f`.
|
||||
|
||||
[float]
|
||||
===== Concurrent Rebalance
|
||||
|
||||
`cluster.routing.allocation.cluster_concurrent_rebalance`::
|
||||
Allow to control how many concurrent rebalancing of shards are
|
||||
allowed cluster wide, and default it to `2` (integer). `-1` for
|
||||
unlimited. See also <<modules-cluster>>.
|
||||
|
||||
[float]
|
||||
===== Disable allocation
|
||||
|
||||
`cluster.routing.allocation.disable_allocation`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
`cluster.routing.allocation.disable_replica_allocation`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
`cluster.routing.allocation.disable_new_allocation`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
[float]
|
||||
===== Throttling allocation
|
||||
|
||||
`cluster.routing.allocation.node_initial_primaries_recoveries`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
`cluster.routing.allocation.node_concurrent_recoveries`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
[float]
|
||||
===== Filter allocation
|
||||
|
||||
`cluster.routing.allocation.include.*`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
`cluster.routing.allocation.exclude.*`::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
`cluster.routing.allocation.require.*` (from 0.90)::
|
||||
See <<modules-cluster>>.
|
||||
|
||||
[float]
|
||||
==== Metadata
|
||||
|
||||
`cluster.blocks.read_only`::
|
||||
Have the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices).
|
||||
|
||||
[float]
|
||||
==== Discovery
|
||||
|
||||
`discovery.zen.minimum_master_nodes`::
|
||||
See <<modules-discovery-zen>>
|
||||
|
||||
[float]
|
||||
==== Threadpools
|
||||
|
||||
`threadpool.*`::
|
||||
See <<modules-threadpool>>
|
||||
|
||||
[float]
|
||||
=== Index settings
|
||||
|
||||
[float]
|
||||
==== Index filter cache
|
||||
|
||||
`indices.cache.filter.size`::
|
||||
See <<index-modules-cache>>
|
||||
|
||||
`indices.cache.filter.expire` (time)::
|
||||
See <<index-modules-cache>>
|
||||
|
||||
[float]
|
||||
==== TTL interval
|
||||
|
||||
`indices.ttl.interval` (time)::
|
||||
See <<mapping-ttl-field>>
|
||||
|
||||
[float]
|
||||
==== Recovery
|
||||
|
||||
`indices.recovery.concurrent_streams`::
|
||||
See <<modules-indices>>
|
||||
|
||||
`indices.recovery.file_chunk_size`::
|
||||
See <<modules-indices>>
|
||||
|
||||
`indices.recovery.translog_ops`::
|
||||
See <<modules-indices>>
|
||||
|
||||
`indices.recovery.translog_size`::
|
||||
See <<modules-indices>>
|
||||
|
||||
`indices.recovery.compress`::
|
||||
See <<modules-indices>>
|
||||
|
||||
`indices.recovery.max_bytes_per_sec`::
|
||||
Since 0.90.1. See <<modules-indices>>
|
||||
|
||||
`indices.recovery.max_size_per_sec`::
|
||||
Deprecated since 0.90.1. See `max_bytes_per_sec` instead.
|
||||
|
||||
[float]
|
||||
==== Store level throttling
|
||||
|
||||
`indices.store.throttle.type`::
|
||||
See <<index-modules-store>>
|
||||
|
||||
`indices.store.throttle.max_bytes_per_sec`::
|
||||
See <<index-modules-store>>
|
||||
|
||||
[float]
|
||||
=== Logger
|
||||
|
||||
Logger values can also be updated by setting `logger.` prefix. More
|
||||
settings will be allowed to be updated.
|
|
@ -0,0 +1,45 @@
|
|||
[[search-common-options]]
|
||||
== Common Options
|
||||
|
||||
=== Pretty Results
|
||||
|
||||
When appending `?pretty=true` to any request made, the JSON returned
|
||||
will be pretty formatted (use it for debugging only!). Another option is
|
||||
to set `format=yaml` which will cause the result to be returned in the
|
||||
(sometimes) more readable yaml format.
|
||||
|
||||
=== Parameters
|
||||
|
||||
Rest parameters (when using HTTP, map to HTTP URL parameters) follow the
|
||||
convention of using underscore casing.
|
||||
|
||||
=== Boolean Values
|
||||
|
||||
All REST APIs parameters (both request parameters and JSON body) support
|
||||
providing boolean "false" as the values: `false`, `0`, `no` and `off`.
|
||||
All other values are considered "true". Note, this is not related to
|
||||
fields within a document indexed treated as boolean fields.
|
||||
|
||||
=== Number Values
|
||||
|
||||
All REST APIs support providing numbered parameters as `string` on top
|
||||
of supporting the native JSON number types.
|
||||
|
||||
=== Result Casing
|
||||
|
||||
All REST APIs accept the `case` parameter. When set to `camelCase`, all
|
||||
field names in the result will be returned in camel casing, otherwise,
|
||||
underscore casing will be used. Note, this does not apply to the source
|
||||
document indexed.
|
||||
|
||||
=== JSONP
|
||||
|
||||
All REST APIs accept a `callback` parameter resulting in a
|
||||
http://en.wikipedia.org/wiki/JSONP[JSONP] result.
|
||||
|
||||
=== Request body in query string
|
||||
|
||||
For libraries that don't accept a request body for non-POST requests,
|
||||
you can pass the request body as the `source` query string parameter
|
||||
instead.
|
||||
|
|
@ -0,0 +1,31 @@
|
|||
[[docs]]
|
||||
= Document APIs
|
||||
|
||||
[partintro]
|
||||
--
|
||||
|
||||
This section describes the REST APIs *elasticsearch* provides (mainly)
|
||||
using JSON. The API is exposed using
|
||||
<<modules-http,HTTP>>,
|
||||
<<modules-thrift,thrift>>,
|
||||
<<modules-memcached,memcached>>.
|
||||
|
||||
--
|
||||
|
||||
include::docs/index_.asciidoc[]
|
||||
|
||||
include::docs/get.asciidoc[]
|
||||
|
||||
include::docs/delete.asciidoc[]
|
||||
|
||||
include::docs/update.asciidoc[]
|
||||
|
||||
include::docs/multi-get.asciidoc[]
|
||||
|
||||
include::docs/bulk.asciidoc[]
|
||||
|
||||
include::docs/delete-by-query.asciidoc[]
|
||||
|
||||
include::docs/bulk-udp.asciidoc[]
|
||||
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
[[docs-bulk-udp]]
|
||||
== Bulk UDP API
|
||||
|
||||
A Bulk UDP service is a service listening over UDP for bulk format
|
||||
requests. The idea is to provide a low latency UDP service that allows
|
||||
to easily index data that is not of critical nature.
|
||||
|
||||
The Bulk UDP service is disabled by default, but can be enabled by
|
||||
setting `bulk.udp.enabled` to `true`.
|
||||
|
||||
The bulk UDP service performs internal bulk aggregation of the data and
|
||||
then flushes it based on several parameters:
|
||||
|
||||
`bulk.udp.bulk_actions`::
|
||||
The number of actions to flush a bulk after,
|
||||
defaults to `1000`.
|
||||
|
||||
`bulk.udp.bulk_size`::
|
||||
The size of the current bulk request to flush
|
||||
the request once exceeded, defaults to `5mb`.
|
||||
|
||||
`bulk.udp.flush_interval`::
|
||||
An interval after which the current
|
||||
request is flushed, regardless of the above limits. Defaults to `5s`.
|
||||
`bulk.udp.concurrent_requests`::
|
||||
The number on max in flight bulk
|
||||
requests allowed. Defaults to `4`.
|
||||
|
||||
The allowed network settings are:
|
||||
|
||||
`bulk.udp.host`::
|
||||
The host to bind to, defaults to `network.host`
|
||||
which defaults to any.
|
||||
|
||||
`bulk.udp.port`::
|
||||
The port to use, defaults to `9700-9800`.
|
||||
|
||||
`bulk.udp.receive_buffer_size`::
|
||||
The receive buffer size, defaults to `10mb`.
|
||||
|
||||
Here is an example of how it can be used:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
> cat bulk.txt
|
||||
{ "index" : { "_index" : "test", "_type" : "type1" } }
|
||||
{ "field1" : "value1" }
|
||||
{ "index" : { "_index" : "test", "_type" : "type1" } }
|
||||
{ "field1" : "value1" }
|
||||
--------------------------------------------------
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
> cat bulk.txt | nc -w 0 -u localhost 9700
|
||||
--------------------------------------------------
|
||||
|
||||
|
|
@ -0,0 +1,174 @@
|
|||
[[docs-bulk]]
|
||||
== Bulk API
|
||||
|
||||
The bulk API makes it possible to perform many index/delete operations
|
||||
in a single API call. This can greatly increase the indexing speed. The
|
||||
REST API endpoint is `/_bulk`, and it expects the following JSON
|
||||
structure:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
action_and_meta_data\n
|
||||
optional_source\n
|
||||
action_and_meta_data\n
|
||||
optional_source\n
|
||||
....
|
||||
action_and_meta_data\n
|
||||
optional_source\n
|
||||
--------------------------------------------------
|
||||
|
||||
*NOTE*: the final line of data must end with a newline character `\n`.
|
||||
|
||||
The possible actions are `index`, `create`, `delete` and since version
|
||||
`0.90.1` also `update`. `index` and `create` expect a source on the next
|
||||
line, and have the same semantics as the `op_type` parameter to the
|
||||
standard index API (i.e. create will fail if a document with the same
|
||||
index and type exists already, whereas index will add or replace a
|
||||
document as necessary). `delete` does not expect a source on the
|
||||
following line, and has the same semantics as the standard delete API.
|
||||
`update` expects that the partial doc, upsert and script and its options
|
||||
are specified on the next line.
|
||||
|
||||
If you're providing text file input to `curl`, you *must* use the
|
||||
`--data-binary` flag instead of plain `-d`. The latter doesn't preserve
|
||||
newlines. Example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ cat requests
|
||||
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
|
||||
{ "field1" : "value1" }
|
||||
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo
|
||||
{"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1,"ok":true}}]}
|
||||
--------------------------------------------------
|
||||
|
||||
Because this format uses literal `\n`'s as delimiters, please be sure
|
||||
that the JSON actions and sources are not pretty printed. Here is an
|
||||
example of a correct sequence of bulk commands:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
|
||||
{ "field1" : "value1" }
|
||||
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
|
||||
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
|
||||
{ "field1" : "value3" }
|
||||
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
|
||||
{ "doc" : {"field2" : "value2"} }
|
||||
--------------------------------------------------
|
||||
|
||||
In the above example `doc` for the `update` action is a partial
|
||||
document, that will be merged with the already stored document.
|
||||
|
||||
The endpoints are `/_bulk`, `/{index}/_bulk`, and `{index}/type/_bulk`.
|
||||
When the index or the index/type are provided, they will be used by
|
||||
default on bulk items that don't provide them explicitly.
|
||||
|
||||
A note on the format. The idea here is to make processing of this as
|
||||
fast as possible. As some of the actions will be redirected to other
|
||||
shards on other nodes, only `action_meta_data` is parsed on the
|
||||
receiving node side.
|
||||
|
||||
Client libraries using this protocol should try and strive to do
|
||||
something similar on the client side, and reduce buffering as much as
|
||||
possible.
|
||||
|
||||
The response to a bulk action is a large JSON structure with the
|
||||
individual results of each action that was performed. The failure of a
|
||||
single action does not affect the remaining actions.
|
||||
|
||||
There is no "correct" number of actions to perform in a single bulk
|
||||
call. You should experiment with different settings to find the optimum
|
||||
size for your particular workload.
|
||||
|
||||
If using the HTTP API, make sure that the client does not send HTTP
|
||||
chunks, as this will slow things down.
|
||||
|
||||
[float]
|
||||
=== Versioning
|
||||
|
||||
Each bulk item can include the version value using the
|
||||
`_version`/`version` field. It automatically follows the behavior of the
|
||||
index / delete operation based on the `_version` mapping. It also
|
||||
support the `version_type`/`_version_type` when using `external`
|
||||
versioning.
|
||||
|
||||
[float]
|
||||
=== Routing
|
||||
|
||||
Each bulk item can include the routing value using the
|
||||
`_routing`/`routing` field. It automatically follows the behavior of the
|
||||
index / delete operation based on the `_routing` mapping.
|
||||
|
||||
[float]
|
||||
=== Percolator
|
||||
|
||||
Each bulk index action can include a percolate value using the
|
||||
`_percolate`/`percolate` field.
|
||||
|
||||
[float]
|
||||
=== Parent
|
||||
|
||||
Each bulk item can include the parent value using the `_parent`/`parent`
|
||||
field. It automatically follows the behavior of the index / delete
|
||||
operation based on the `_parent` / `_routing` mapping.
|
||||
|
||||
[float]
|
||||
=== Timestamp
|
||||
|
||||
Each bulk item can include the timestamp value using the
|
||||
`_timestamp`/`timestamp` field. It automatically follows the behavior of
|
||||
the index operation based on the `_timestamp` mapping.
|
||||
|
||||
[float]
|
||||
=== TTL
|
||||
|
||||
Each bulk item can include the ttl value using the `_ttl`/`ttl` field.
|
||||
It automatically follows the behavior of the index operation based on
|
||||
the `_ttl` mapping.
|
||||
|
||||
[float]
|
||||
=== Write Consistency
|
||||
|
||||
When making bulk calls, you can require a minimum number of active
|
||||
shards in the partition through the `consistency` parameter. The values
|
||||
allowed are `one`, `quorum`, and `all`. It defaults to the node level
|
||||
setting of `action.write_consistency`, which in turn defaults to
|
||||
`quorum`.
|
||||
|
||||
For example, in a N shards with 2 replicas index, there will have to be
|
||||
at least 2 active shards within the relevant partition (`quorum`) for
|
||||
the operation to succeed. In a N shards with 1 replica scenario, there
|
||||
will need to be a single shard active (in this case, `one` and `quorum`
|
||||
is the same).
|
||||
|
||||
[float]
|
||||
=== Refresh
|
||||
|
||||
The `refresh` parameter can be set to `true` in order to refresh the
|
||||
relevant shards immediately after the bulk operation has occurred and
|
||||
make it searchable, instead of waiting for the normal refresh interval
|
||||
to expire. Setting it to `true` can trigger additional load, and may
|
||||
slow down indexing.
|
||||
|
||||
[float]
|
||||
=== Update
|
||||
|
||||
When using `update` action `_retry_on_conflict` can be used as field in
|
||||
the action itself (not in the extra payload line), to specify how many
|
||||
times an update should be retried in the case of a version conflict.
|
||||
|
||||
The `update` action payload, supports the following options: `doc`
|
||||
(partial document), `upsert`, `doc_as_upsert`, `script`, `params` (for
|
||||
script), `lang` (for script). See update documentation for details on
|
||||
the options. Curl example with update actions:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
|
||||
{ "doc" : {"field" : "value"} }
|
||||
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
|
||||
{ "script" : "ctx._source.counter += param1", "lang" : "js", "params" : {"param1" : 1}, "upsert" : {"counter" : 1}}
|
||||
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
|
||||
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
|
||||
--------------------------------------------------
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue