Merge pull request #12040 from clintongormley/plugin_docs

Docs: Prepare plugin and integration docs for 2.0
This commit is contained in:
Clinton Gormley 2015-08-15 18:04:43 +02:00
commit f635f7ec82
49 changed files with 4444 additions and 3503 deletions

View File

@ -1,165 +1,46 @@
[[clients]]
== Clients
= Community Contributed Clients
:client: https://www.elastic.co/guide/en/elasticsearch/client
Besides the link:/guide[officially supported Elasticsearch clients], there are
a number of clients that have been contributed by the community for various languages:
* <<clojure>>
* <<cold-fusion>>
* <<erlang>>
* <<go>>
* <<groovy>>
* <<haskell>>
* <<java>>
* <<javascript>>
* <<dotnet>>
* <<ocaml>>
* <<perl>>
* <<php>>
* <<python>>
* <<r>>
* <<ruby>>
* <<scala>>
* <<smalltalk>>
* <<vertx>>
[[community-perl]]
=== Perl
See the {client}/perl-api/current/index.html[official Elasticsearch Perl client].
[[community-python]]
=== Python
See the {client}/python-api/current/index.html[official Elasticsearch Python client].
* http://github.com/elasticsearch/elasticsearch-dsl-py[elasticsearch-dsl-py]
chainable query and filter construction built on top of official client.
* http://github.com/rhec/pyelasticsearch[pyelasticsearch]:
Python client.
* https://github.com/eriky/ESClient[ESClient]:
A lightweight and easy to use Python client for Elasticsearch.
* https://github.com/humangeo/rawes[rawes]:
Python low level client.
* https://github.com/mozilla/elasticutils/[elasticutils]:
A friendly chainable Elasticsearch interface for Python.
* http://intridea.github.io/surfiki-refine-elasticsearch/[Surfiki Refine]:
Python Map-Reduce engine targeting Elasticsearch indices.
* http://github.com/aparo/pyes[pyes]:
Python client.
[[community-ruby]]
=== Ruby
See the {client}/ruby-api/current/index.html[official Elasticsearch Ruby client].
* http://github.com/karmi/retire[Retire]:
Ruby API & DSL, with ActiveRecord/ActiveModel integration (retired since Sep 2013).
* https://github.com/PoseBiz/stretcher[stretcher]:
Ruby client.
* https://github.com/wireframe/elastic_searchable/[elastic_searchable]:
Ruby client + Rails integration.
* https://github.com/ddnexus/flex[Flex]:
Ruby Client.
* https://github.com/printercu/elastics-rb[elastics]:
Tiny client with built-in zero-downtime migrations and ActiveRecord integration.
* https://github.com/toptal/chewy[chewy]:
Chewy is ODM and wrapper for official elasticsearch client
* https://github.com/ankane/searchkick[Searchkick]:
Intelligent search made easy
[[community-php]]
=== PHP
See the {client}/php-api/current/index.html[official Elasticsearch PHP client].
* http://github.com/ruflin/Elastica[Elastica]:
PHP client.
* http://github.com/nervetattoo/elasticsearch[elasticsearch] PHP client.
* http://github.com/polyfractal/Sherlock[Sherlock]:
PHP client, one-to-one mapping with query DSL, fluid interface.
* https://github.com/nervetattoo/elasticsearch[elasticsearch]
PHP 5.3 client
[[community-java]]
=== Java
* https://github.com/searchbox-io/Jest[Jest]:
Java Rest client.
* There is of course the {client}/java-api/current/index.html[native ES Java client]
[[community-javascript]]
=== JavaScript
See the {client}/javascript-api/current/index.html[official Elasticsearch JavaScript client].
* https://github.com/fullscale/elastic.js[Elastic.js]:
A JavaScript implementation of the Elasticsearch Query DSL and Core API.
* https://github.com/phillro/node-elasticsearch-client[node-elasticsearch-client]:
A NodeJS client for Elasticsearch.
* https://github.com/ramv/node-elastical[node-elastical]:
Node.js client for the Elasticsearch REST API
* https://github.com/printercu/elastics[elastics]: Simple tiny client that just works
[[community-groovy]]
=== Groovy
See the {client}/groovy-api/current/index.html[official Elasticsearch Groovy client]
[[community-dotnet]]
=== .NET
See the {client}/net-api/current/index.html[official Elasticsearch .NET client].
* https://github.com/Yegoroff/PlainElastic.Net[PlainElastic.Net]:
.NET client.
* https://github.com/medcl/ElasticSearch.Net[ElasticSearch.NET]:
.NET client.
[[community-haskell]]
=== Haskell
* https://github.com/bitemyapp/bloodhound[bloodhound]:
Haskell client and DSL.
[[community-scala]]
=== Scala
* https://github.com/sksamuel/elastic4s[elastic4s]:
Scala DSL.
* https://github.com/scalastuff/esclient[esclient]:
Thin Scala client.
* https://github.com/bsadeh/scalastic[scalastic]:
Scala client.
* https://github.com/gphat/wabisabi[wabisabi]:
Asynchronous REST API Scala client.
[[community-clojure]]
=== Clojure
[[clojure]]
== Clojure
* http://github.com/clojurewerkz/elastisch[Elastisch]:
Clojure client.
[[cold-fusion]]
== Cold Fusion
[[community-go]]
=== Go
The following project appears to be abandoned:
* https://github.com/mattbaird/elastigo[elastigo]:
Go client.
* https://github.com/jasonfill/ColdFusion-ElasticSearch-Client[ColdFusion-Elasticsearch-Client]
Cold Fusion client for Elasticsearch
* https://github.com/belogik/goes[goes]:
Go lib.
* https://github.com/olivere/elastic[elastic]:
Elasticsearch client for Google Go.
[[community-erlang]]
=== Erlang
[[erlang]]
== Erlang
* http://github.com/tsloughter/erlastic_search[erlastic_search]:
Erlang client using HTTP.
@ -173,51 +54,181 @@ See the {client}/net-api/current/index.html[official Elasticsearch .NET client].
environment.
[[community-eventmachine]]
=== EventMachine
[[go]]
== Go
* http://github.com/vangberg/em-elasticsearch[em-elasticsearch]:
elasticsearch library for eventmachine.
* https://github.com/mattbaird/elastigo[elastigo]:
Go client.
* https://github.com/belogik/goes[goes]:
Go lib.
* https://github.com/olivere/elastic[elastic]:
Elasticsearch client for Google Go.
[[community-command-line]]
=== Command Line
[[groovy]]
== Groovy
* https://github.com/elasticsearch/es2unix[es2unix]:
Elasticsearch API consumable by the Linux command line.
See the {client}/groovy-api/current/index.html[official Elasticsearch Groovy client].
* https://github.com/javanna/elasticshell[elasticshell]:
command line shell for elasticsearch.
[[haskell]]
== Haskell
* https://github.com/bitemyapp/bloodhound[bloodhound]:
Haskell client and DSL.
[[community-ocaml]]
=== OCaml
[[java]]
== Java
Also see the {client}/java-api/current/index.html[official Elasticsearch Java client].
* https://github.com/searchbox-io/Jest[Jest]:
Java Rest client.
[[javascript]]
== JavaScript
Also see the {client}/javascript-api/current/index.html[official Elasticsearch JavaScript client].
* https://github.com/fullscale/elastic.js[Elastic.js]:
A JavaScript implementation of the Elasticsearch Query DSL and Core API.
* https://github.com/printercu/elastics[elastics]: Simple tiny client that just works
* https://github.com/roundscope/ember-data-elasticsearch-kit[ember-data-elasticsearch-kit]:
An ember-data kit for both pushing and querying objects to Elasticsearch cluster
The following project appears to be abandoned:
* https://github.com/ramv/node-elastical[node-elastical]:
Node.js client for the Elasticsearch REST API
[[dotnet]]
== .NET
Also see the {client}/net-api/current/index.html[official Elasticsearch .NET client].
* https://github.com/Yegoroff/PlainElastic.Net[PlainElastic.Net]:
.NET client.
[[ocaml]]
== OCaml
The following project appears to be abandoned:
* https://github.com/tovbinm/ocaml-elasticsearch[ocaml-elasticsearch]:
OCaml client for Elasticsearch
[[perl]]
== Perl
[[community-smalltalk]]
=== Smalltalk
Also see the {client}/perl-api/current/index.html[official Elasticsearch Perl client].
* https://metacpan.org/pod/Elastijk[Elastijk]: A low level minimal HTTP client.
[[php]]
== PHP
Also see the {client}/php-api/current/index.html[official Elasticsearch PHP client].
* http://github.com/ruflin/Elastica[Elastica]:
PHP client.
* http://github.com/nervetattoo/elasticsearch[elasticsearch] PHP client.
[[python]]
== Python
Also see the {client}/python-api/current/index.html[official Elasticsearch Python client].
* http://github.com/elasticsearch/elasticsearch-dsl-py[elasticsearch-dsl-py]
chainable query and filter construction built on top of official client.
* http://github.com/rhec/pyelasticsearch[pyelasticsearch]:
Python client.
* https://github.com/eriky/ESClient[ESClient]:
A lightweight and easy to use Python client for Elasticsearch.
* https://github.com/mozilla/elasticutils/[elasticutils]:
A friendly chainable Elasticsearch interface for Python.
* http://github.com/aparo/pyes[pyes]:
Python client.
The following projects appear to be abandoned:
* https://github.com/humangeo/rawes[rawes]:
Python low level client.
* http://intridea.github.io/surfiki-refine-elasticsearch/[Surfiki Refine]:
Python Map-Reduce engine targeting Elasticsearch indices.
[[r]]
== R
* https://github.com/Tomesch/elasticsearch[elasticsearch]
R client for Elasticsearch
* https://github.com/ropensci/elastic[elastic]:
A general purpose R client for Elasticsearch
[[ruby]]
== Ruby
Also see the {client}/ruby-api/current/index.html[official Elasticsearch Ruby client].
* https://github.com/PoseBiz/stretcher[stretcher]:
Ruby client.
* https://github.com/printercu/elastics-rb[elastics]:
Tiny client with built-in zero-downtime migrations and ActiveRecord integration.
* https://github.com/toptal/chewy[chewy]:
Chewy is ODM and wrapper for official elasticsearch client
* https://github.com/ankane/searchkick[Searchkick]:
Intelligent search made easy
The following projects appear to be abandoned:
* https://github.com/wireframe/elastic_searchable/[elastic_searchable]:
Ruby client + Rails integration.
* https://github.com/ddnexus/flex[Flex]:
Ruby Client.
[[scala]]
== Scala
* https://github.com/sksamuel/elastic4s[elastic4s]:
Scala DSL.
* https://github.com/scalastuff/esclient[esclient]:
Thin Scala client.
* https://github.com/gphat/wabisabi[wabisabi]:
Asynchronous REST API Scala client.
The following project appears to be abandoned:
* https://github.com/bsadeh/scalastic[scalastic]:
Scala client.
[[smalltalk]]
== Smalltalk
* http://ss3.gemstone.com/ss/Elasticsearch.html[Elasticsearch] -
Smalltalk client for Elasticsearch
[[community-cold-fusion]]
=== Cold Fusion
* https://github.com/jasonfill/ColdFusion-ElasticSearch-Client[ColdFusion-Elasticsearch-Client]
Cold Fusion client for Elasticsearch
[[vertx]]
== Vert.x
[[community-nodejs]]
=== NodeJS
* https://github.com/phillro/node-elasticsearch-client[Node-Elasticsearch-Client]
A node.js client for elasticsearch
[[community-r]]
=== R
* https://github.com/Tomesch/elasticsearch[elasticsearch]
R client for Elasticsearch
* https://github.com/ropensci/elastic[elastic]:
A general purpose R client for Elasticsearch
* https://github.com/goodow/realtime-search[realtime-search]:
Elasticsearch module for Vert.x

View File

@ -1,20 +0,0 @@
[[front-ends]]
== Front Ends
* https://github.com/mobz/elasticsearch-head[elasticsearch-head]:
A web front end for an Elasticsearch cluster.
* https://github.com/OlegKunitsyn/elasticsearch-browser[browser]:
Web front-end over elasticsearch data.
* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor]:
Front-end to help debug/diagnose queries and analyzers
* http://elastichammer.exploringelasticsearch.com/[Hammer]:
Web front-end for elasticsearch
* https://github.com/romansanchez/Calaca[Calaca]:
Simple search client for Elasticsearch
* https://github.com/rdpatil4/ESClient[ESClient]:
Simple search, update, delete client for Elasticsearch

View File

@ -1,6 +0,0 @@
[[github]]
== GitHub
GitHub is a place where a lot of development is done around
*elasticsearch*, here is a simple search for
https://github.com/search?q=elasticsearch&type=Repositories[repositories].

View File

@ -1,17 +0,0 @@
= Community Supported Clients
:client: http://www.elastic.co/guide/en/elasticsearch/client
include::clients.asciidoc[]
include::frontends.asciidoc[]
include::integrations.asciidoc[]
include::misc.asciidoc[]
include::monitoring.asciidoc[]
include::github.asciidoc[]

View File

@ -1,102 +0,0 @@
[[integrations]]
== Integrations
* http://grails.org/plugin/elasticsearch[Grails]:
Elasticsearch Grails plugin.
* https://github.com/carrot2/elasticsearch-carrot2[carrot2]:
Results clustering with carrot2
* https://github.com/angelf/escargot[escargot]:
Elasticsearch connector for Rails (WIP).
* https://metacpan.org/module/Catalyst::Model::Search::Elasticsearch[Catalyst]:
Elasticsearch and Catalyst integration.
* http://github.com/aparo/django-elasticsearch[django-elasticsearch]:
Django Elasticsearch Backend.
* http://github.com/Aconex/elasticflume[elasticflume]:
http://github.com/cloudera/flume[Flume] sink implementation.
* http://code.google.com/p/terrastore/wiki/Search_Integration[Terrastore Search]:
http://code.google.com/p/terrastore/[Terrastore] integration module with elasticsearch.
* https://github.com/infochimps-labs/wonderdog[Wonderdog]:
Hadoop bulk loader into elasticsearch.
* http://geeks.aretotally.in/play-framework-module-elastic-search-distributed-searching-with-json-http-rest-or-java[Play!Framework]:
Integrate with Play! Framework Application.
* https://github.com/Exercise/FOQElasticaBundle[ElasticaBundle]:
Symfony2 Bundle wrapping Elastica.
* https://drupal.org/project/elasticsearch_connector[Drupal]:
Drupal Elasticsearch integration (1.0.0 and later).
* http://drupal.org/project/search_api_elasticsearch[Drupal]:
Drupal Elasticsearch integration via Search API (1.0.0 and earlier).
* https://github.com/refuge/couch_es[couch_es]:
elasticsearch helper for couchdb based products (apache couchdb, bigcouch & refuge)
* https://github.com/sonian/elasticsearch-jetty[Jetty]:
Jetty HTTP Transport
* https://github.com/dadoonet/spring-elasticsearch[Spring Elasticsearch]:
Spring Factory for Elasticsearch
* https://github.com/spring-projects/spring-data-elasticsearch[Spring Data Elasticsearch]:
Spring Data implementation for Elasticsearch
* https://camel.apache.org/elasticsearch.html[Apache Camel Integration]:
An Apache camel component to integrate elasticsearch
* https://github.com/tlrx/elasticsearch-test[elasticsearch-test]:
Elasticsearch Java annotations for unit testing with
http://www.junit.org/[JUnit]
* http://searchbox-io.github.com/wp-elasticsearch/[Wp-Elasticsearch]:
Elasticsearch WordPress Plugin
* https://github.com/wallmanderco/elasticsearch-indexer[Elasticsearch Indexer]:
Elasticsearch WordPress Plugin
* https://github.com/OlegKunitsyn/eslogd[eslogd]:
Linux daemon that replicates events to a central Elasticsearch server in real-time
* https://github.com/drewr/elasticsearch-clojure-repl[elasticsearch-clojure-repl]:
Plugin that embeds nREPL for run-time introspective adventure! Also
serves as an nREPL transport.
* http://haystacksearch.org/[Haystack]:
Modular search for Django
* https://github.com/cleverage/play2-elasticsearch[play2-elasticsearch]:
Elasticsearch module for Play Framework 2.x
* https://github.com/goodow/realtime-search[realtime-search]:
Elasticsearch module for Vert.x
* https://github.com/fullscale/dangle[dangle]:
A set of AngularJS directives that provide common visualizations for elasticsearch based on
D3.
* https://github.com/roundscope/ember-data-elasticsearch-kit[ember-data-elasticsearch-kit]:
An ember-data kit for both pushing and querying objects to Elasticsearch cluster
* https://github.com/kzwang/elasticsearch-osem[elasticsearch-osem]:
A Java Object Search Engine Mapping (OSEM) for Elasticsearch
* https://github.com/twitter/storehaus[Twitter Storehaus]:
Thin asynchronous Scala client for Storehaus.
* https://doc.tiki.org/Elasticsearch[Tiki Wiki CMS Groupware]:
Tiki has native support for Elasticsearch. This provides faster & better search (facets, etc), along with some Natural Language Processing features (ex.: More like this)
* https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer[Kafka Standalone Consumer]:
Easily Scaleable & Extendable, Kafka Standalone Consumer that will read the messages from Kafka, processes and index them in ElasticSearch
* http://www.searchtechnologies.com/aspire-for-elasticsearch[Aspire for Elasticsearch]:
Aspire, from Search Technologies, is a powerful connector and processing framework designed for unstructured data. It has connectors to internal and external repositories including SharePoint, Documentum, Jive, RDB, file systems, websites and more, and can transform and normalize this data before indexing in Elasticsearch.

View File

@ -1,31 +0,0 @@
[[misc]]
== Misc
* https://github.com/elasticsearch/puppet-elasticsearch[Puppet]:
Elasticsearch puppet module.
* http://github.com/elasticsearch/cookbook-elasticsearch[Chef]:
Chef cookbook for Elasticsearch
* https://github.com/medcl/salt-elasticsearch[SaltStack]:
SaltStack Module for Elasticsearch
* http://www.github.com/neogenix/daikon[daikon]:
Daikon Elasticsearch CLI
* https://github.com/Aconex/scrutineer[Scrutineer]:
A high performance consistency checker to compare what you've indexed
with your source of truth content (e.g. DB)
* https://www.wireshark.org/[Wireshark]:
Protocol dissection for Zen discovery, HTTP and the binary protocol
* https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin[Readonly REST]:
High performance access control for Elasticsearch native REST API.
* https://github.com/kodcu/pes[Pes]:
A pluggable elastic query DSL builder for Elasticsearch
* https://github.com/ozlerhakan/mongolastic[Mongolastic]:
A tool that clone data from ElasticSearch to MongoDB and vice versa

View File

@ -1,40 +0,0 @@
[[health]]
== Health and Performance Monitoring
* https://github.com/lukas-vlcek/bigdesk[bigdesk]:
Live charts and statistics for elasticsearch cluster.
* https://github.com/lmenezes/elasticsearch-kopf/[Kopf]:
Live cluster health and shard allocation monitoring with administration toolset.
* https://github.com/karmi/elasticsearch-paramedic[paramedic]:
Live charts with cluster stats and indices/shards information.
* http://www.elastichq.org/[ElasticsearchHQ]:
Free cluster health monitoring tool
* http://sematext.com/spm/index.html[SPM for Elasticsearch]:
Performance monitoring with live charts showing cluster and node stats, integrated
alerts, email reports, etc.
* https://github.com/radu-gheorghe/check-es[check-es]:
Nagios/Shinken plugins for checking on elasticsearch
* https://github.com/anchor/nagios-plugin-elasticsearch[check_elasticsearch]:
An Elasticsearch availability and performance monitoring plugin for
Nagios.
* https://github.com/rbramley/Opsview-elasticsearch[opsview-elasticsearch]:
Opsview plugin written in Perl for monitoring Elasticsearch
* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy]:
Plugin to watch Lucene segment merges across your cluster
* https://github.com/mattweber/es2graphite[es2graphite]:
Send cluster and indices stats and status to Graphite for monitoring and graphing.
* https://scoutapp.com[Scout]: Provides plugins for monitoring Elasticsearch https://scoutapp.com/plugin_urls/1331-elasticsearch-node-status[nodes], https://scoutapp.com/plugin_urls/1321-elasticsearch-cluster-status[clusters], and https://scoutapp.com/plugin_urls/1341-elasticsearch-index-status[indices].
* https://itunes.apple.com/us/app/elasticocean/id955278030?ls=1&mt=8[ElasticOcean]:
Elasticsearch & DigitalOcean iOS Real-Time Monitoring tool to keep an eye on DigitalOcean Droplets or Elasticsearch instances or both of them on-a-go.

View File

@ -0,0 +1,18 @@
[[alerting]]
== Alerting Plugins
Alerting plugins allow Elasticsearch to monitor indices and to trigger alerts when thresholds are breached.
[float]
=== Core alerting plugins
The core alerting plugins are:
link:/products/watcher[Watcher]::
Watcher is the alerting and notification product for Elasticsearch that lets
you take action based on changes in your data. It is designed around the
principle that if you can query something in Elasticsearch, you can alert on
it. Simply define a query, condition, schedule, and the actions to take, and
Watcher will do the rest.

View File

@ -0,0 +1,438 @@
[[analysis-icu]]
=== ICU Analysis Plugin
The ICU Analysis plugin integrates the Lucene ICU module into elasticsearch,
adding extended Unicode support using the http://site.icu-project.org/[ICU]
libraries, including better analysis of Asian languages, Unicode
normalization, Unicode-aware case folding, collation support, and
transliteration.
[[analysis-icu-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install analysis-icu
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[analysis-icu-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove analysis-icu
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[analysis-icu-normalization-charfilter]]
==== ICU Normalization Character Filter
Normalizes characters as explained
http://userguide.icu-project.org/transforms/normalization[here].
It registers itself as the `icu_normalizer` character filter, which is
available to all indices without any further configuration. The type of
normalization can be specified with the `name` parameter, which accepts `nfc`,
`nfkc`, and `nfkc_cf` (default). Set the `mode` parameter to `decompose` to
convert `nfc` to `nfd` or `nfkc` to `nfkd` respectively:
Here are two examples, the default usage and a customised character filter:
[source,json]
--------------------------------------------------
PUT icu_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"nfkc_cf_normalized": { <1>
"tokenizer": "icu_tokenizer",
"char_filter": [
"icu_normalizer"
]
},
"nfd_normalized": { <2>
"tokenizer": "icu_tokenizer",
"char_filter": [
"nfd_normalizer"
]
}
},
"char_filter": {
"nfd_normalizer": {
"type": "icu_normalizer",
"name": "nfc",
"mode": "decompose"
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Uses the default `nfkc_cf` normalization.
<2> Uses the customized `nfd_normalizer` token filter, which is set to use `nfc` normalization with decomposition.
[[analysis-icu-tokenizer]]
==== ICU Tokenizer
Tokenizes text into words on word boundaries, as defined in
http://www.unicode.org/reports/tr29/[UAX #29: Unicode Text Segmentation].
It behaves much like the {ref}/analysis-standard-tokenizer.html[`standard` tokenizer],
but adds better support for some Asian languages by using a dictionary-based
approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and
using custom rules to break Myanmar and Khmer text into syllables.
[source,json]
--------------------------------------------------
PUT icu_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_icu_analyzer": {
"tokenizer": "icu_tokenizer"
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
[[analysis-icu-normalization]]
==== ICU Normalization Token Filter
Normalizes characters as explained
http://userguide.icu-project.org/transforms/normalization[here]. It registers
itself as the `icu_normalizer` token filter, which is available to all indices
without any further configuration. The type of normalization can be specified
with the `name` parameter, which accepts `nfc`, `nfkc`, and `nfkc_cf`
(default).
You should probably prefer the <<analysis-icu-normalization-charfilter,Normalization character filter>>.
Here are two examples, the default usage and a customised token filter:
[source,json]
--------------------------------------------------
PUT icu_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"nfkc_cf_normalized": { <1>
"tokenizer": "icu_tokenizer",
"filter": [
"icu_normalizer"
]
},
"nfc_normalized": { <2>
"tokenizer": "icu_tokenizer",
"filter": [
"nfc_normalizer"
]
}
},
"filter": {
"nfc_normalizer": {
"type": "icu_normalizer",
"name": "nfc"
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Uses the default `nfkc_cf` normalization.
<2> Uses the customized `nfc_normalizer` token filter, which is set to use `nfc` normalization.
[[analysis-icu-folding]]
==== ICU Folding Token Filter
Case folding of Unicode characters based on `UTR#30`, like the
{ref}/analysis-asciifolding-tokenfilter.html[ASCII-folding token filter]
on steroids. It registers itself as the `icu_folding` token filter and is
available to all indices:
[source,json]
--------------------------------------------------
PUT icu_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"folded": {
"tokenizer": "icu",
"filter": [
"icu_folding"
]
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
The ICU folding token filter already does Unicode normalization, so there is
no need to use Normalize character or token filter as well.
Which letters are folded can be controlled by specifying the
`unicodeSetFilter` parameter, which accepts a
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html[UnicodeSet].
The following example exempts Swedish characters from folding. It is important
to note that both upper and lowercase forms should be specified, and that
these filtered character are not lowercased which is why we add the
`lowercase` filter as well:
[source,json]
--------------------------------------------------
PUT icu_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"swedish_analyzer": {
"tokenizer": "icu_tokenizer",
"filter": [
"swedish_folding",
"lowercase"
]
}
},
"filter": {
"swedish_folding": {
"type": "icu_folding",
"unicodeSetFilter": "[^åäöÅÄÖ]"
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
[[analysis-icu-collation]]
==== ICU Collation Token Filter
Collations are used for sorting documents in a language-specific word order.
The `icu_collation` token filter is available to all indices and defaults to
using the
https://www.elastic.co/guide/en/elasticsearch/guide/current/sorting-collations.html#uca[DUCET collation],
which is a best-effort attempt at language-neutral sorting.
Below is an example of how to set up a field for sorting German names in
``phonebook'' order:
[source,json]
--------------------------------------------------
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"german_phonebook": {
"type": "icu_collation",
"language": "de",
"country": "DE",
"variant": "@collation=phonebook"
}
},
"analyzer": {
"german_phonebook": {
"tokenizer": "keyword",
"filter": [ "german_phonebook" ]
}
}
}
},
"mappings": {
"user": {
"properties": {
"name": { <1>
"type": "string",
"fields": {
"sort": { <2>
"type": "string",
"analyzer": "german_phonebook"
}
}
}
}
}
}
}
GET _search <3>
{
"query": {
"match": {
"name": "Fritz"
}
},
"sort": "name.sort"
}
--------------------------------------------------
// AUTOSENSE
<1> The `name` field uses the `standard` analyzer, and so support full text queries.
<2> The `name.sort` field uses the `keyword` analyzer to preserve the name as
a single token, and applies the `german_phonebook` token filter to index
the value in German phonebook sort order.
<3> An example query which searches the `name` field and sorts on the `name.sort` field.
===== Collation options
`strength`::
The strength property determines the minimum level of difference considered
significant during comparison. Possible values are : `primary`, `secondary`,
`tertiary`, `quaternary` or `identical`. See the
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Collator.html[ICU Collation documentation]
for a more detailed explanation for each value. Defaults to `tertiary`
unless otherwise specified in the collation.
`decomposition`::
Possible values: `no` (default, but collation-dependent) or `canonical`.
Setting this decomposition property to `canonical` allows the Collator to
handle unnormalized text properly, producing the same results as if the text
were normalized. If `no` is set, it is the user's responsibility to insure
that all text is already in the appropriate form before a comparison or before
getting a CollationKey. Adjusting decomposition mode allows the user to select
between faster and more complete collation behavior. Since a great many of the
world's languages do not require text normalization, most locales set `no` as
the default decomposition mode.
The following options are expert only:
`alternate`::
Possible values: `shifted` or `non-ignorable`. Sets the alternate handling for
strength `quaternary` to be either shifted or non-ignorable. Which boils down
to ignoring punctuation and whitespace.
`caseLevel`::
Possible values: `true` or `false` (default). Whether case level sorting is
required. When strength is set to `primary` this will ignore accent
differences.
`caseFirst`::
Possible values: `lower` or `upper`. Useful to control which case is sorted
first when case is not ignored for strength `tertiary`. The default depends on
the collation.
`numeric`::
Possible values: `true` or `false` (default) . Whether digits are sorted
according to their numeric representation. For example the value `egg-9` is
sorted before the value `egg-21`.
`variableTop`::
Single character or contraction. Controls what is variable for `alternate`.
`hiraganaQuaternaryMode`::
Possible values: `true` or `false`. Distinguishing between Katakana and
Hiragana characters in `quaternary` strength.
[[analysis-icu-transform]]
==== ICU Transform Token Filter
Transforms are used to process Unicode text in many different ways, such as
case mapping, normalization, transliteration and bidirectional text handling.
You can define which transformation you want to apply with the `id` parameter
(defaults to `Null`), and specify text direction with the `dir` parameter
which accepts `forward` (default) for LTR and `reverse` for RTL. Custom
rulesets are not yet supported.
For example:
[source,json]
--------------------------------------------------
PUT icu_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"latin": {
"tokenizer": "keyword",
"filter": [
"myLatinTransform"
]
}
},
"filter": {
"myLatinTransform": {
"type": "icu_transform",
"id": "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC" <1>
}
}
}
}
}
}
GET icu_sample/_analyze?analyzer=latin
{
"text": "你好" <2>
}
GET icu_sample/_analyze?analyzer=latin
{
"text": "здравствуйте" <3>
}
GET icu_sample/_analyze?analyzer=latin
{
"text": "こんにちは" <4>
}
--------------------------------------------------
// AUTOSENSE
<1> This transforms transliterates characters to Latin, and separates accents
from their base characters, removes the accents, and then puts the
remaining text into an unaccented form.
<2> Returns `ni hao`.
<3> Returns `zdravstvujte`.
<4> Returns `kon'nichiha`.
For more documentation, Please see the http://userguide.icu-project.org/transforms/general[user guide of ICU Transform].

View File

@ -0,0 +1,454 @@
[[analysis-kuromoji]]
=== Japanese (kuromoji) Analysis Plugin
The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis
module into elasticsearch.
[[analysis-kuromoji-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install analysis-kuromoji
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[analysis-kuromoji-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove analysis-kuromoji
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[analysis-kuromoji-analyzer]]
==== `kuromoji` analyzer
The `kuromoji` analyzer consists of the following tokenizer and token filters:
* <<analysis-kuromoji-tokenizer,`kuromoji_tokenizer`>>
* <<analysis-kuromoji-baseform,`kuromoji_baseform`>> token filter
* <<analysis-kuromoji-speech,`kuromoji_part_of_speech`>> token filter
* {ref}/analysis-cjk-width-tokenfilter.html[`cjk_width`] token filter
* <<analysis-kuromoji-stop,`ja_stop`>> token filter
* <<analysis-kuromoji-stemmer,`kuromoji_stemmer`>> token filter
* {ref}/analysis-lowercase-tokenfilter.html[`lowercase`] token filter
It supports the `mode` and `user_dictionary` settings from
<<analysis-kuromoji-tokenizer,`kuromoji_tokenizer`>>.
[[analysis-kuromoji-charfilter]]
==== `kuromoji_iteration_mark` character filter
The `kuromoji_iteration_mark` normalizes Japanese horizontal iteration marks
(_odoriji_) to their expanded form. It accepts the following settings:
`normalize_kanji`::
Indicates whether kanji iteration marks should be normalize. Defaults to `true`.
`normalize_kana`::
Indicates whether kana iteration marks should be normalized. Defaults to `true`
[[analysis-kuromoji-tokenizer]]
==== `kuromoji_tokenizer`
The `kuromoji_tokenizer` accepts the following settings:
`mode`::
+
--
The tokenization mode determines how the tokenizer handles compound and
unknown words. It can be set to:
`normal`::
Normal segmentation, no decomposition for compounds. Example output:
関西国際空港
アブラカダブラ
`search`::
Segmentation geared towards search. This includes a decompounding process
for long nouns, also including the full compound token as a synonym.
Example output:
関西, 関西国際空港, 国際, 空港
アブラカダブラ
`extended`::
Extended mode outputs unigrams for unknown words. Example output:
関西, 国際, 空港
ア, ブ, ラ, カ, ダ, ブ, ラ
--
`discard_punctuation`::
Whether punctuation should be discarded from the output. Defaults to `true`.
`user_dictionary`::
+
--
The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A `user_dictionary`
may be appended to the default dictionary. The dictionary should have the following CSV format:
[source,csv]
-----------------------
<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
-----------------------
--
As a demonstration of how the user dictionary can be used, save the following
dictionary to `$ES_HOME/config/userdict_ja.txt`:
[source,csv]
-----------------------
東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞
-----------------------
Then create an analyzer as follows:
[source,json]
--------------------------------------------------
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"tokenizer": {
"kuromoji_user_dict": {
"type": "kuromoji_tokenizer",
"mode": "extended",
"discard_punctuation": "false",
"user_dictionary": "userdict_ja.txt"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "kuromoji_user_dict"
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=東京スカイツリー
--------------------------------------------------
// AUTOSENSE
The above `analyze` request returns the following:
[source,json]
--------------------------------------------------
# Result
{
"tokens" : [ {
"token" : "東京",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "スカイツリー",
"start_offset" : 2,
"end_offset" : 8,
"type" : "word",
"position" : 2
} ]
}
--------------------------------------------------
[[analysis-kuromoji-baseform]]
==== `kuromoji_baseform` token filter
The `kuromoji_baseform` token filter replaces terms with their
BaseFormAttribute. This acts as a lemmatizer for verbs and adjectives.
[source,json]
--------------------------------------------------
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"kuromoji_baseform"
]
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=飲み
--------------------------------------------------
// AUTOSENSE
[source,text]
--------------------------------------------------
# Result
{
"tokens" : [ {
"token" : "飲む",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
} ]
}
--------------------------------------------------
[[analysis-kuromoji-speech]]
==== `kuromoji_part_of_speech` token filter
The `kuromoji_part_of_speech` token filter removes tokens that match a set of
part-of-speech tags. It accepts the following setting:
`stoptags`::
An array of part-of-speech tags that should be removed. It defaults to the
`stoptags.txt` file embedded in the `lucene-analyzer-kuromoji.jar`.
[source,json]
--------------------------------------------------
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"my_posfilter"
]
}
},
"filter": {
"my_posfilter": {
"type": "kuromoji_part_of_speech",
"stoptags": [
"助詞-格助詞-一般",
"助詞-終助詞"
]
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=寿司がおいしいね
--------------------------------------------------
// AUTOSENSE
[source,text]
--------------------------------------------------
# Result
{
"tokens" : [ {
"token" : "寿司",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "おいしい",
"start_offset" : 3,
"end_offset" : 7,
"type" : "word",
"position" : 3
} ]
}
--------------------------------------------------
[[analysis-kuromoji-readingform]]
==== `kuromoji_readingform` token filter
The `kuromoji_readingform` token filter replaces the token with its reading
form in either katakana or romaji. It accepts the following setting:
`use_romaji`::
Whether romaji reading form should be output instead of katakana. Defaults to `false`.
When using the pre-defined `kuromoji_readingform` filter, `use_romaji` is set
to `true`. The default when defining a custom `kuromoji_readingform`, however,
is `false`. The only reason to use the custom form is if you need the
katakana reading form:
[source,json]
--------------------------------------------------
PUT kuromoji_sample
{
"settings": {
"index":{
"analysis":{
"analyzer" : {
"romaji_analyzer" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["romaji_readingform"]
},
"katakana_analyzer" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["katakana_readingform"]
}
},
"filter" : {
"romaji_readingform" : {
"type" : "kuromoji_readingform",
"use_romaji" : true
},
"katakana_readingform" : {
"type" : "kuromoji_readingform",
"use_romaji" : false
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=katakana_analyzer&text=寿司 <1>
POST kuromoji_sample/_analyze?analyzer=romaji_analyzer&text=寿司 <2>
--------------------------------------------------
// AUTOSENSE
<1> Returns `スシ`.
<2> Returns `sushi`.
[[analysis-kuromoji-stemmer]]
==== `kuromoji_stemmer` token filter
The `kuromoji_stemmer` token filter normalizes common katakana spelling
variations ending in a long sound character by removing this character
(U+30FC). Only full-width katakana characters are supported.
This token filter accepts the following setting:
`minimum_length`::
Katakana words shorter than the `minimum length` are not stemmed (default
is `4`).
[source,json]
--------------------------------------------------
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"my_katakana_stemmer"
]
}
},
"filter": {
"my_katakana_stemmer": {
"type": "kuromoji_stemmer",
"minimum_length": 4
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=コピー <1>
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=サーバー <2>
--------------------------------------------------
// AUTOSENSE
<1> Returns `コピー`.
<2> Return `サーバ`.
[[analysis-kuromoji-stop]]
===== `ja_stop` token filter
The `ja_stop` token filter filters out Japanese stopwords (`_japanese_`), and
any other custom stopwords specified by the user. This filter only supports
the predefined `_japanese_` stopwords list. If you want to use a different
predefined list, then use the
{ref}/analysis-stop-tokenfilter.html[`stop` token filter] instead.
[source,json]
--------------------------------------------------
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_with_ja_stop": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"ja_stop"
]
}
},
"filter": {
"ja_stop": {
"type": "ja_stop",
"stopwords": [
"_japanese_",
"ストップ"
]
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=ストップは消える
--------------------------------------------------
// AUTOSENSE
The above request returns:
[source,text]
--------------------------------------------------
# Result
{
"tokens" : [ {
"token" : "消える",
"start_offset" : 5,
"end_offset" : 8,
"type" : "word",
"position" : 3
} ]
}
--------------------------------------------------

View File

@ -0,0 +1,120 @@
[[analysis-phonetic]]
=== Phonetic Analysis Plugin
The Phonetic Analysis plugin provides token filters which convert tokens to
their phonetic representation using Soundex, Metaphone, and a variety of other
algorithms.
[[analysis-phonetic-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install analysis-phonetic
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[analysis-phonetic-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove analysis-phonetic
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[analysis-phonetic-token-filter]]
==== `phonetic` token filter
The `phonetic` token filter takes the following settings:
`encoder`::
Which phonetic encoder to use. Accepts `metaphone` (default),
`doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
`beidermorse`.
`replace`::
Whether or not the original token should be replaced by the phonetic
token. Accepts `true` (default) and `false`. Not supported by
`beidermorse` encoding.
[source,json]
--------------------------------------------------
PUT phonetic_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
}
}
}
}
}
POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
--------------------------------------------------
// AUTOSENSE
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
[float]
===== Double metaphone settings
If the `double_metaphone` encoder is used, then this additional setting is
supported:
`max_code_len`::
The maximum length of the emitted metaphone token. Defaults to `4`.
[float]
===== Beider Morse settings
If the `beider_morse` encoder is used, then these additional settings are
supported:
`rule_type`::
Whether matching should be `exact` or `approx` (default).
`name_type`::
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
`languageset`::
An array of languages to check. If not specified, then the language will
be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
`spanish`.

View File

@ -0,0 +1,48 @@
[[analysis-smartcn]]
=== Smart Chinese Analysis Plugin
The Smart Chinese Analysis plugin integrates Lucene's Smart Chinese analysis
module into elasticsearch.
It provides an analyzer for Chinese or mixed Chinese-English text. This
analyzer uses probabilistic knowledge to find the optimal word segmentation
for Simplified Chinese text. The text is first broken into sentences, then
each sentence is segmented into words.
[[analysis-smartcn-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install analysis-smartcn
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[analysis-smartcn-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove analysis-smartcn
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[analysis-smartcn-tokenizer]]
[float]
==== `smartcn` tokenizer and token filter
The plugin provides the `smartcn` analyzer and `smartcn_tokenizer` tokenizer,
which are not configurable.
NOTE: The `smartcn_word` token filter and `smartcn_sentence` have been deprecated.

View File

@ -0,0 +1,43 @@
[[analysis-stempel]]
=== Stempel Polish Analysis Plugin
The Stempel Analysis plugin integrates Lucene's Stempel analysis
module for Polish into elasticsearch.
It provides high quality stemming for Polish, based on the
http://www.egothor.org/[Egothor project].
[[analysis-stempel-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install analysis-stempel
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[analysis-stempel-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove analysis-stempel
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[analysis-stempel-tokenizer]]
[float]
==== `stempel` tokenizer and token filter
The plugin provides the `polish` analyzer and `polish_stem` token filter,
which are not configurable.

View File

@ -0,0 +1,69 @@
[[analysis]]
== Analysis Plugins
Analysis plugins extend Elasticsearch by adding new analyzers, tokenizers,
token filters, or character filters to Elasticsearch.
[float]
==== Core analysis plugins
The core analysis plugins are:
<<analysis-icu,ICU>>::
Adds extended Unicode support using the http://site.icu-project.org/[ICU]
libraries, including better analysis of Asian languages, Unicode
normalization, Unicode-aware case folding, collation support, and
transliteration.
<<analysis-kuromoji,Kuromoji>>::
Advanced analysis of Japanese using the http://www.atilika.org/[Kuromoji analyzer].
<<analysis-phonetic,Phonetic>>::
Analyzes tokens into their phonetic equivalent using Soundex, Metaphone,
Caverphone, and other codecs.
<<analysis-smartcn,SmartCN>>::
An analyzer for Chinese or mixed Chinese-English text. This analyzer uses
probabilistic knowledge to find the optimal word segmentation for Simplified
Chinese text. The text is first broken into sentences, then each sentence is
segmented into words.
<<analysis-stempel,Stempel>>::
Provides high quality stemming for Polish.
[float]
==== Community contributed analysis plugins
A number of analysis plugins have been contributed by our community:
* https://github.com/yakaz/elasticsearch-analysis-combo/[Combo Analysis Plugin] (by Olivier Favre, Yakaz)
* https://github.com/synhershko/elasticsearch-analysis-hebrew[Hebrew Analysis Plugin] (by Itamar Syn-Hershko)
* https://github.com/medcl/elasticsearch-analysis-ik[IK Analysis Plugin] (by Medcl)
* https://github.com/medcl/elasticsearch-analysis-mmseg[Mmseg Analysis Plugin] (by Medcl)
* https://github.com/chytreg/elasticsearch-analysis-morfologik[Morfologik (Polish) Analysis plugin] (by chytreg)
* https://github.com/imotov/elasticsearch-analysis-morphology[Russian and English Morphological Analysis Plugin] (by Igor Motov)
* https://github.com/medcl/elasticsearch-analysis-pinyin[Pinyin Analysis Plugin] (by Medcl)
* https://github.com/duydo/elasticsearch-analysis-vietnamese[Vietnamese Analysis Plugin] (by Duy Do)
These community plugins appear to have been abandoned:
* https://github.com/barminator/elasticsearch-analysis-annotation[Annotation Analysis Plugin] (by Michal Samek)
* https://github.com/medcl/elasticsearch-analysis-string2int[String2Integer Analysis Plugin] (by Medcl)
include::analysis-icu.asciidoc[]
include::analysis-kuromoji.asciidoc[]
include::analysis-phonetic.asciidoc[]
include::analysis-smartcn.asciidoc[]
include::analysis-stempel.asciidoc[]

65
docs/plugins/api.asciidoc Normal file
View File

@ -0,0 +1,65 @@
[[api]]
== API Extension Plugins
API extension plugins add new functionality to Elasticsearch by adding new APIs or features, usually to do with search or mapping.
[float]
=== Core API extension plugins
The core API extension plugins are:
<<plugins-delete-by-query,Delete by Query>>::
The delete by query plugin adds support for deleting all of the documents
(from one or more indices) which match the specified query. It is a
replacement for the problematic _delete-by-query_ functionality which has been
removed from Elasticsearch core.
https://github.com/elasticsearch/elasticsearch-mapper-attachments[Mapper Attachments Type plugin]::
Integrates http://lucene.apache.org/tika/[Apache Tika] to provide a new field
type `attachment` to allow indexing of documents such as PDFs and Microsoft
Word.
[float]
=== Community contributed API extension plugins
A number of plugins have been contributed by our community:
* https://github.com/carrot2/elasticsearch-carrot2[carrot2 Plugin]:
Results clustering with http://project.carrot2.org/[carrot2] (by Dawid Weiss)
* https://github.com/wikimedia/search-extra[Elasticsearch Trigram Accelerated Regular Expression Filter]:
(by Wikimedia Foundation/Nik Everett)
* https://github.com/kzwang/elasticsearch-image[Elasticsearch Image Plugin]:
Uses https://code.google.com/p/lire/[Lire (Lucene Image Retrieval)] to allow users
to index images and search for similar images (by Kevin Wang)
* https://github.com/wikimedia/search-highlighter[Elasticsearch Experimental Highlighter]:
(by Wikimedia Foundation/Nik Everett)
* https://github.com/YannBrrd/elasticsearch-entity-resolution[Entity Resolution Plugin]:
Uses http://github.com/larsga/Duke[Duke] for duplication detection (by Yann Barraud)
* https://github.com/NLPchina/elasticsearch-sql/[SQL language Plugin]:
Allows Elasticsearch to be queried with SQL (by nlpcn)
* https://github.com/codelibs/elasticsearch-taste[Elasticsearch Taste Plugin]:
Mahout Taste-based Collaborative Filtering implementation (by CodeLibs Project)
* https://github.com/hadashiA/elasticsearch-flavor[Elasticsearch Flavor Plugin] using
http://mahout.apache.org/[Mahout] Collaboration filtering (by hadashiA)
These community plugins appear to have been abandoned:
* https://github.com/derryx/elasticsearch-changes-plugin[Elasticsearch Changes Plugin] (by Thomas Peuss)
* https://github.com/mattweber/elasticsearch-mocksolrplugin[Elasticsearch Mock Solr Plugin] (by Matt Weber)
* http://siren.solutions/siren/downloads/[Elasticsearch SIREn Plugin]: Nested data search (by SIREn Solutions)
* https://github.com/endgameinc/elasticsearch-term-plugin[Terms Component Plugin] (by Endgame Inc.)
include::delete-by-query.asciidoc[]

View File

@ -0,0 +1,62 @@
[[plugin-authors]]
== Help for plugin authors
The Elasticsearch repository contains examples of:
* a https://github.com/elastic/elasticsearch/tree/master/plugins/site-example[site plugin]
for serving static HTML, JavaScript, and CSS.
* a https://github.com/elastic/elasticsearch/tree/master/plugins/jvm-example[Java plugin]
which contains Java code.
These examples provide the bare bones needed to get started. For more
information about how to write a plugin, we recommend looking at the plugins
listed in this documentation for inspiration.
[NOTE]
.Site plugins
====================================
The example site plugin mentioned above contains all of the scaffolding needed
for integrating with Maven builds. If you don't plan on using Maven, then all
you really need in your plugin is:
* The `plugin-descriptor.properties` file
* The `_site/` directory
* The `_site/index.html` file
====================================
[float]
=== Plugin descriptor file
All plugins, be they site or Java plugins, must contain a file called
`plugin-descriptor.properties` in the root directory. The format for this file
is described in detail here:
https://github.com/elastic/elasticsearch/blob/master/dev-tools/src/main/resources/plugin-metadata/plugin-descriptor.properties[`dev-tools/src/main/resources/plugin-metadata/plugin-descriptor.properties`].
Either fill in this template yourself (see
https://github.com/lmenezes/elasticsearch-kopf/blob/master/plugin-descriptor.properties[elasticsearch-kopf]
as an example) or, if you are using Elasticsearch's Maven build system, you
can fill in the necessary values in the `pom.xml` for your plugin. For
instance, see
https://github.com/elastic/elasticsearch/blob/master/plugins/site-example/pom.xml[`plugins/site-example/pom.xml`].
[float]
=== Loading plugins from the classpath
When testing a Java plugin, it will only be auto-loaded if it is in the
`plugins/` directory. If, instead, it is in your classpath, you can tell
Elasticsearch to load it with the `plugin.types` setting:
[source,java]
--------------------------
settingsBuilder()
.put("cluster.name", cluster)
.put("path.home", getHome())
.put("plugin.types", MyCustomPlugin.class.getName()) <1>
.build();
--------------------------
<1> Tells Elasticsearch to load your plugin.

View File

@ -0,0 +1,468 @@
[[cloud-aws]]
=== AWS Cloud Plugin
The Amazon Web Service (AWS) Cloud plugin uses the
https://github.com/aws/aws-sdk-java[AWS API] for unicast discovery, and adds
support for using S3 as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].
[[cloud-aws-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install cloud-aws
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[cloud-aws-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove cloud-aws
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[cloud-aws-usage]]
==== Getting started with AWS
The plugin will default to using
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[IAM Role]
credentials for authentication. These can be overridden by, in increasing
order of precedence, system properties `aws.accessKeyId` and `aws.secretKey`,
environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_KEY`, or the
elasticsearch config using `cloud.aws.access_key` and `cloud.aws.secret_key`:
[source,yaml]
----
cloud:
aws:
access_key: AKVAIQBF2RECL7FJWGJQ
secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
----
[[cloud-aws-usage-security]]
===== Transport security
By default this plugin uses HTTPS for all API calls to AWS endpoints. If you wish to configure HTTP you can set
`cloud.aws.protocol` in the elasticsearch config. You can optionally override this setting per individual service
via: `cloud.aws.ec2.protocol` or `cloud.aws.s3.protocol`.
[source,yaml]
----
cloud:
aws:
protocol: https
s3:
protocol: http
ec2:
protocol: https
----
In addition, a proxy can be configured with the `proxy_host` and `proxy_port` settings (note that protocol can be
`http` or `https`):
[source,yaml]
----
cloud:
aws:
protocol: https
proxy_host: proxy1.company.com
proxy_port: 8083
----
You can also set different proxies for `ec2` and `s3`:
[source,yaml]
----
cloud:
aws:
s3:
proxy_host: proxy1.company.com
proxy_port: 8083
ec2:
proxy_host: proxy2.company.com
proxy_port: 8083
----
[[cloud-aws-usage-region]]
===== Region
The `cloud.aws.region` can be set to a region and will automatically use the relevant settings for both `ec2` and `s3`.
The available values are:
* `us-east` (`us-east-1`)
* `us-west` (`us-west-1`)
* `us-west-1`
* `us-west-2`
* `ap-southeast` (`ap-southeast-1`)
* `ap-southeast-1`
* `ap-southeast-2`
* `ap-northeast` (`ap-northeast-1`)
* `eu-west` (`eu-west-1`)
* `eu-central` (`eu-central-1`)
* `sa-east` (`sa-east-1`)
* `cn-north` (`cn-north-1`)
[[cloud-aws-usage-signer]]
===== EC2/S3 Signer API
If you are using a compatible EC2 or S3 service, they might be using an older API to sign the requests.
You can set your compatible signer API using `cloud.aws.signer` (or `cloud.aws.ec2.signer` and `cloud.aws.s3.signer`)
with the right signer to use. Defaults to `AWS4SignerType`.
[[cloud-aws-discovery]]
==== EC2 Discovery
ec2 discovery allows to use the ec2 APIs to perform automatic discovery (similar to multicast in non hostile multicast
environments). Here is a simple sample configuration:
[source,yaml]
----
discovery:
type: ec2
----
The ec2 discovery is using the same credentials as the rest of the AWS services provided by this plugin (`repositories`).
See <<cloud-aws-usage>> for details.
The following are a list of settings (prefixed with `discovery.ec2`) that can further control the discovery:
`groups`::
Either a comma separated list or array based list of (security) groups.
Only instances with the provided security groups will be used in the
cluster discovery. (NOTE: You could provide either group NAME or group
ID.)
`host_type`::
The type of host type to use to communicate with other instances. Can be
one of `private_ip`, `public_ip`, `private_dns`, `public_dns`. Defaults to
`private_ip`.
`availability_zones`::
Either a comma separated list or array based list of availability zones.
Only instances within the provided availability zones will be used in the
cluster discovery.
`any_group`::
If set to `false`, will require all security groups to be present for the
instance to be used for the discovery. Defaults to `true`.
`ping_timeout`::
How long to wait for existing EC2 nodes to reply during discovery.
Defaults to `3s`. If no unit like `ms`, `s` or `m` is specified,
milliseconds are used.
[[cloud-aws-discovery-permissions]]
===== Recommended EC2 Permissions
EC2 discovery requires making a call to the EC2 service. You'll want to setup
an IAM policy to allow this. You can create a custom policy via the IAM
Management Console. It should look similar to this.
[source,js]
----
{
"Statement": [
{
"Action": [
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": [
"*"
]
}
],
"Version": "2012-10-17"
}
----
[[cloud-aws-discovery-filtering]]
===== Filtering by Tags
The ec2 discovery can also filter machines to include in the cluster based on tags (and not just groups). The settings
to use include the `discovery.ec2.tag.` prefix. For example, setting `discovery.ec2.tag.stage` to `dev` will only
filter instances with a tag key set to `stage`, and a value of `dev`. Several tags set will require all of those tags
to be set for the instance to be included.
One practical use for tag filtering is when an ec2 cluster contains many nodes that are not running elasticsearch. In
this case (particularly with high `ping_timeout` values) there is a risk that a new node's discovery phase will end
before it has found the cluster (which will result in it declaring itself master of a new cluster with the same name
- highly undesirable). Tagging elasticsearch ec2 nodes and then filtering by that tag will resolve this issue.
[[cloud-aws-discovery-attributes]]
===== Automatic Node Attributes
Though not dependent on actually using `ec2` as discovery (but still requires the cloud aws plugin installed), the
plugin can automatically add node attributes relating to ec2 (for example, availability zone, that can be used with
the awareness allocation feature). In order to enable it, set `cloud.node.auto_attributes` to `true` in the settings.
[[cloud-aws-discovery-endpoint]]
===== Using other EC2 endpoint
If you are using any EC2 api compatible service, you can set the endpoint you want to use by setting
`cloud.aws.ec2.endpoint` to your URL provider.
[[cloud-aws-repository]]
==== S3 Repository
The S3 repository is using S3 to store snapshots. The S3 repository can be created using the following command:
[source,json]
----
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"bucket": "my_bucket_name",
"region": "us-west"
}
}
----
// AUTOSENSE
The following settings are supported:
`bucket`::
The name of the bucket to be used for snapshots. (Mandatory)
`region`::
The region where bucket is located. Defaults to US Standard
`endpoint`::
The endpoint to the S3 API. Defaults to AWS's default S3 endpoint. Note
that setting a region overrides the endpoint setting.
`protocol`::
The protocol to use (`http` or `https`). Defaults to value of
`cloud.aws.protocol` or `cloud.aws.s3.protocol`.
`base_path`::
Specifies the path within bucket to repository data. Defaults to root
directory.
`access_key`::
The access key to use for authentication. Defaults to value of
`cloud.aws.access_key`.
`secret_key`::
The secret key to use for authentication. Defaults to value of
`cloud.aws.secret_key`.
`chunk_size`::
Big files can be broken down into chunks during snapshotting if needed.
The chunk size can be specified in bytes or by using size value notation,
i.e. `1g`, `10m`, `5k`. Defaults to `100m`.
`compress`::
When set to `true` metadata files are stored in compressed format. This
setting doesn't affect index files that are already compressed by default.
Defaults to `false`.
`server_side_encryption`::
When set to `true` files are encrypted on server side using AES256
algorithm. Defaults to `false`.
`buffer_size`::
Minimum threshold below which the chunk is uploaded using a single
request. Beyond this threshold, the S3 repository will use the
http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS Multipart Upload API]
to split the chunk into several parts, each of `buffer_size` length, and
to upload each part in its own request. Note that positioning a buffer
size lower than `5mb` is not allowed since it will prevents the use of the
Multipart API and may result in upload errors. Defaults to `5mb`.
`max_retries`::
Number of retries in case of S3 errors. Defaults to `3`.
The S3 repositories use the same credentials as the rest of the AWS services
provided by this plugin (`discovery`). See <<cloud-aws-usage>> for details.
Multiple S3 repositories can be created. If the buckets require different
credentials, then define them as part of the repository settings.
[[cloud-aws-repository-permissions]]
===== Recommended S3 Permissions
In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon
IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an
S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy,
and using a Policy Document similar to this (changing snaps.example.com to your bucket name).
[source,js]
----
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com/*"
]
}
],
"Version": "2012-10-17"
}
----
You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".
[source,js]
----
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Condition": {
"StringLike": {
"s3:prefix": [
"foo/*"
]
}
},
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com/foo/*"
]
}
],
"Version": "2012-10-17"
}
----
The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository
registration will fail. If you want elasticsearch to create the bucket instead, you can add the permission to create a
specific bucket like this:
[source,js]
----
{
"Action": [
"s3:CreateBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
}
----
[[cloud-aws-repository-endpoint]]
===== Using other S3 endpoint
If you are using any S3 api compatible service, you can set a global endpoint by setting `cloud.aws.s3.endpoint`
to your URL provider. Note that this setting will be used for all S3 repositories.
Different `endpoint`, `region` and `protocol` settings can be set on a per-repository basis
See <<cloud-aws-repository>> for details.
[[cloud-aws-testing]]
==== Testing AWS
Integrations tests in this plugin require working AWS configuration and therefore disabled by default. Three buckets
and two iam users have to be created. The first iam user needs access to two buckets in different regions and the final
bucket is exclusive for the other iam user. To enable tests prepare a config file elasticsearch.yml with the following
content:
[source,yaml]
----
cloud:
aws:
access_key: AKVAIQBF2RECL7FJWGJQ
secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
repositories:
s3:
bucket: "bucket_name"
region: "us-west-2"
private-bucket:
bucket: <bucket not accessible by default key>
access_key: <access key>
secret_key: <secret key>
remote-bucket:
bucket: <bucket in other region>
region: <region>
external-bucket:
bucket: <bucket>
access_key: <access key>
secret_key: <secret key>
endpoint: <endpoint>
protocol: <protocol>
----
Replace all occurrences of `access_key`, `secret_key`, `endpoint`, `protocol`, `bucket` and `region` with your settings.
Please, note that the test will delete all snapshot/restore related files in the specified buckets.
To run test:
[source,sh]
----
mvn -Dtests.aws=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test
----

View File

@ -0,0 +1,667 @@
[[cloud-azure]]
=== Azure Cloud Plugin
The Azure Cloud plugin uses the Azure API for unicast discovery, and adds
support for using Azure as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].
[[cloud-azure-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install cloud-aws
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[cloud-azure-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove cloud-aws
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[cloud-azure-discovery]]
==== Azure Virtual Machine Discovery
Azure VM discovery allows to use the azure APIs to perform automatic discovery (similar to multicast in non hostile
multicast environments). Here is a simple sample configuration:
[source,yaml]
----
cloud:
azure:
management:
subscription.id: XXX-XXX-XXX-XXX
cloud.service.name: es-demo-app
keystore:
path: /path/to/azurekeystore.pkcs12
password: WHATEVER
type: pkcs12
discovery:
type: azure
----
[[cloud-azure-discovery-short]]
===== How to start (short story)
* Create Azure instances
* Install Elasticsearch
* Install Azure plugin
* Modify `elasticsearch.yml` file
* Start Elasticsearch
[[cloud-azure-discovery-settings]]
===== Azure credential API settings
The following are a list of settings that can further control the credential API:
[horizontal]
`cloud.azure.management.keystore.path`::
/path/to/keystore
`cloud.azure.management.keystore.type`::
`pkcs12`, `jceks` or `jks`. Defaults to `pkcs12`.
`cloud.azure.management.keystore.password`::
your_password for the keystore
`cloud.azure.management.subscription.id`::
your_azure_subscription_id
`cloud.azure.management.cloud.service.name`::
your_azure_cloud_service_name
[[cloud-azure-discovery-settings-advanced]]
===== Advanced settings
The following are a list of settings that can further control the discovery:
`discovery.azure.host.type`::
Either `public_ip` or `private_ip` (default). Azure discovery will use the
one you set to ping other nodes.
`discovery.azure.endpoint.name`::
When using `public_ip` this setting is used to identify the endpoint name
used to forward requests to elasticsearch (aka transport port name).
Defaults to `elasticsearch`. In Azure management console, you could define
an endpoint `elasticsearch` forwarding for example requests on public IP
on port 8100 to the virtual machine on port 9300.
`discovery.azure.deployment.name`::
Deployment name if any. Defaults to the value set with
`cloud.azure.management.cloud.service.name`.
`discovery.azure.deployment.slot`::
Either `staging` or `production` (default).
For example:
[source,yaml]
----
discovery:
type: azure
azure:
host:
type: private_ip
endpoint:
name: elasticsearch
deployment:
name: your_azure_cloud_service_name
slot: production
----
[[cloud-azure-discovery-long]]
==== Setup process for Azure Discovery
We will expose here one strategy which is to hide our Elasticsearch cluster from outside.
With this strategy, only VMs behind the same virtual port can talk to each
other. That means that with this mode, you can use elasticsearch unicast
discovery to build a cluster, using the Azure API to retrieve information
about your nodes.
[[cloud-azure-discovery-long-prerequisites]]
===== Prerequisites
Before starting, you need to have:
* A http://www.windowsazure.com/[Windows Azure account]
* OpenSSL that isn't from MacPorts, specifically `OpenSSL 1.0.1f 6 Jan
2014` doesn't seem to create a valid keypair for ssh. FWIW,
`OpenSSL 1.0.1c 10 May 2012` on Ubuntu 12.04 LTS is known to work.
* SSH keys and certificate
+
--
You should follow http://azure.microsoft.com/en-us/documentation/articles/linux-use-ssh-key/[this guide] to learn
how to create or use existing SSH keys. If you have already did it, you can skip the following.
Here is a description on how to generate SSH keys using `openssl`:
[source,sh]
----
# You may want to use another dir than /tmp
cd /tmp
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout azure-private.key -out azure-certificate.pem
chmod 600 azure-private.key azure-certificate.pem
openssl x509 -outform der -in azure-certificate.pem -out azure-certificate.cer
----
Generate a keystore which will be used by the plugin to authenticate with a certificate
all Azure API calls.
[source,sh]
----
# Generate a keystore (azurekeystore.pkcs12)
# Transform private key to PEM format
openssl pkcs8 -topk8 -nocrypt -in azure-private.key -inform PEM -out azure-pk.pem -outform PEM
# Transform certificate to PEM format
openssl x509 -inform der -in azure-certificate.cer -out azure-cert.pem
cat azure-cert.pem azure-pk.pem > azure.pem.txt
# You MUST enter a password!
openssl pkcs12 -export -in azure.pem.txt -out azurekeystore.pkcs12 -name azure -noiter -nomaciter
----
Upload the `azure-certificate.cer` file both in the elasticsearch Cloud Service (under `Manage Certificates`),
and under `Settings -> Manage Certificates`.
IMPORTANT: When prompted for a password, you need to enter a non empty one.
See this http://www.windowsazure.com/en-us/manage/linux/how-to-guides/ssh-into-linux/[guide] for
more details about how to create keys for Azure.
Once done, you need to upload your certificate in Azure:
* Go to the https://account.windowsazure.com/[management console].
* Sign in using your account.
* Click on `Portal`.
* Go to Settings (bottom of the left list)
* On the bottom bar, click on `Upload` and upload your `azure-certificate.cer` file.
You may want to use
http://www.windowsazure.com/en-us/develop/nodejs/how-to-guides/command-line-tools/[Windows Azure Command-Line Tool]:
--
* Install https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager[NodeJS], for example using
homebrew on MacOS X:
+
[source,sh]
----
brew install node
----
* Install Azure tools
+
[source,sh]
----
sudo npm install azure-cli -g
----
* Download and import your azure settings:
+
[source,sh]
----
# This will open a browser and will download a .publishsettings file
azure account download
# Import this file (we have downloaded it to /tmp)
# Note, it will create needed files in ~/.azure. You can remove azure.publishsettings when done.
azure account import /tmp/azure.publishsettings
----
[[cloud-azure-discovery-long-instance]]
===== Creating your first instance
You need to have a storage account available. Check http://www.windowsazure.com/en-us/develop/net/how-to-guides/blob-storage/#create-account[Azure Blob Storage documentation]
for more information.
You will need to choose the operating system you want to run on. To get a list of official available images, run:
[source,sh]
----
azure vm image list
----
Let's say we are going to deploy an Ubuntu image on an extra small instance in West Europe:
[horizontal]
Azure cluster name::
`azure-elasticsearch-cluster`
Image::
`b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20130808-alpha3-en-us-30GB`
VM Name::
`myesnode1`
VM Size::
`extrasmall`
Location::
`West Europe`
Login::
`elasticsearch`
Password::
`password1234!!`
Using command line:
[source,sh]
----
azure vm create azure-elasticsearch-cluster \
b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20130808-alpha3-en-us-30GB \
--vm-name myesnode1 \
--location "West Europe" \
--vm-size extrasmall \
--ssh 22 \
--ssh-cert /tmp/azure-certificate.pem \
elasticsearch password1234\!\!
----
You should see something like:
[source,text]
----
info: Executing command vm create
+ Looking up image
+ Looking up cloud service
+ Creating cloud service
+ Retrieving storage accounts
+ Configuring certificate
+ Creating VM
info: vm create command OK
----
Now, your first instance is started.
[TIP]
.Working with SSH
===============================================
You need to give the private key and username each time you log on your instance:
[source,sh]
----
ssh -i ~/.ssh/azure-private.key elasticsearch@myescluster.cloudapp.net
----
But you can also define it once in `~/.ssh/config` file:
[source,text]
----
Host *.cloudapp.net
User elasticsearch
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
IdentityFile ~/.ssh/azure-private.key
----
===============================================
Next, you need to install Elasticsearch on your new instance. First, copy your
keystore to the instance, then connect to the instance using SSH:
[source,sh]
----
scp /tmp/azurekeystore.pkcs12 azure-elasticsearch-cluster.cloudapp.net:/home/elasticsearch
ssh azure-elasticsearch-cluster.cloudapp.net
----
Once connected, install Elasticsearch:
[source,sh]
----
# Install Latest Java version
# Read http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html for details
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
# If you want to install OpenJDK instead
# sudo apt-get update
# sudo apt-get install openjdk-7-jre-headless
# Download Elasticsearch
curl -s https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-2.0.0.deb -o elasticsearch-2.0.0.deb
# Prepare Elasticsearch installation
sudo dpkg -i elasticsearch-2.0.0.deb
----
Check that elasticsearch is running:
[source,sh]
----
curl http://localhost:9200/
----
This command should give you a JSON result:
[source,javascript]
----
{
"status" : 200,
"name" : "Living Colossus",
"version" : {
"number" : "2.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "5.1"
},
"tagline" : "You Know, for Search"
}
----
[[cloud-azure-discovery-long-plugin]]
===== Install elasticsearch cloud azure plugin
[source,sh]
----
# Stop elasticsearch
sudo service elasticsearch stop
# Install the plugin
sudo /usr/share/elasticsearch/bin/plugin install elasticsearch/elasticsearch-cloud-azure/2.6.1
# Configure it
sudo vi /etc/elasticsearch/elasticsearch.yml
----
And add the following lines:
[source,yaml]
----
# If you don't remember your account id, you may get it with `azure account list`
cloud:
azure:
management:
subscription.id: your_azure_subscription_id
cloud.service.name: your_azure_cloud_service_name
keystore:
path: /home/elasticsearch/azurekeystore.pkcs12
password: your_password_for_keystore
discovery:
type: azure
# Recommended (warning: non durable disk)
# path.data: /mnt/resource/elasticsearch/data
----
Restart elasticsearch:
[source,sh]
----
sudo service elasticsearch start
----
If anything goes wrong, check your logs in `/var/log/elasticsearch`.
[[cloud-azure-discovery-scale]]
==== Scaling Out!
You need first to create an image of your previous machine.
Disconnect from your machine and run locally the following commands:
[source,sh]
----
# Shutdown the instance
azure vm shutdown myesnode1
# Create an image from this instance (it could take some minutes)
azure vm capture myesnode1 esnode-image --delete
# Note that the previous instance has been deleted (mandatory)
# So you need to create it again and BTW create other instances.
azure vm create azure-elasticsearch-cluster \
esnode-image \
--vm-name myesnode1 \
--location "West Europe" \
--vm-size extrasmall \
--ssh 22 \
--ssh-cert /tmp/azure-certificate.pem \
elasticsearch password1234\!\!
----
[TIP]
=========================================
It could happen that azure changes the endpoint public IP address.
DNS propagation could take some minutes before you can connect again using
name. You can get from azure the IP address if needed, using:
[source,sh]
----
# Look at Network `Endpoints 0 Vip`
azure vm show myesnode1
----
=========================================
Let's start more instances!
[source,sh]
----
for x in $(seq 2 10)
do
echo "Launching azure instance #$x..."
azure vm create azure-elasticsearch-cluster \
esnode-image \
--vm-name myesnode$x \
--vm-size extrasmall \
--ssh $((21 + $x)) \
--ssh-cert /tmp/azure-certificate.pem \
--connect \
elasticsearch password1234\!\!
done
----
If you want to remove your running instances:
[source,sh]
----
azure vm delete myesnode1
----
[[cloud-azure-repository]]
==== Azure Repository
To enable Azure repositories, you have first to set your azure storage settings in `elasticsearch.yml` file:
[source,yaml]
----
cloud:
azure:
storage:
account: your_azure_storage_account
key: your_azure_storage_key
----
For information, in previous version of the azure plugin, settings were:
[source,yaml]
----
cloud:
azure:
storage_account: your_azure_storage_account
storage_key: your_azure_storage_key
----
The Azure repository supports following settings:
`container`::
Container name. Defaults to `elasticsearch-snapshots`
`base_path`::
Specifies the path within container to repository data. Defaults to empty
(root directory).
`chunk_size`::
Big files can be broken down into chunks during snapshotting if needed.
The chunk size can be specified in bytes or by using size value notation,
i.e. `1g`, `10m`, `5k`. Defaults to `64m` (64m max)
`compress`::
When set to `true` metadata files are stored in compressed format. This
setting doesn't affect index files that are already compressed by default.
Defaults to `false`.
Some examples, using scripts:
[source,json]
----
# The simpliest one
PUT _snapshot/my_backup1
{
"type": "azure"
}
# With some settings
PUT _snapshot/my_backup2
{
"type": "azure",
"settings": {
"container": "backup_container",
"base_path": "backups",
"chunk_size": "32m",
"compress": true
}
}
----
// AUTOSENSE
Example using Java:
[source,java]
----
client.admin().cluster().preparePutRepository("my_backup3")
.setType("azure").setSettings(Settings.settingsBuilder()
.put(Storage.CONTAINER, "backup_container")
.put(Storage.CHUNK_SIZE, new ByteSizeValue(32, ByteSizeUnit.MB))
).get();
----
[[cloud-azure-repository-validation]]
===== Repository validation rules
According to the http://msdn.microsoft.com/en-us/library/dd135715.aspx[containers naming guide], a container name must
be a valid DNS name, conforming to the following naming rules:
* Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
* Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not
permitted in container names.
* All letters in a container name must be lowercase.
* Container names must be from 3 through 63 characters long.
[[cloud-azure-testing]]
==== Testing Azure
Integrations tests in this plugin require working Azure configuration and therefore disabled by default.
To enable tests prepare a config file `elasticsearch.yml` with the following content:
[source,yaml]
----
cloud:
azure:
storage:
account: "YOUR-AZURE-STORAGE-NAME"
key: "YOUR-AZURE-STORAGE-KEY"
----
Replaces `account`, `key` with your settings. Please, note that the test will delete all snapshot/restore related
files in the specified bucket.
To run test:
[source,sh]
----
mvn -Dtests.azure=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test
----
[[cloud-azure-smb-workaround]]
==== Working around a bug in Windows SMB and Java on windows
When using a shared file system based on the SMB protocol (like Azure File Service) to store indices, the way Lucene
open index segment files is with a write only flag. This is the _correct_ way to open the files, as they will only be
used for writes and allows different FS implementations to optimize for it. Sadly, in windows with SMB, this disables
the cache manager, causing writes to be slow. This has been described in
https://issues.apache.org/jira/browse/LUCENE-6176[LUCENE-6176], but it affects each and every Java program out there!.
This need and must be fixed outside of ES and/or Lucene, either in windows or OpenJDK. For now, we are providing an
experimental support to open the files with read flag, but this should be considered experimental and the correct way
to fix it is in OpenJDK or Windows.
The Azure Cloud plugin provides two storage types optimized for SMB:
`smb_mmap_fs`::
a SMB specific implementation of the default
{ref}/index-modules-store.html#mmapfs[mmap fs]
`smb_simple_fs`::
a SMB specific implementation of the default
{ref}/index-modules-store.html#simplefs[simple fs]
To use one of these specific storage types, you need to install the Azure Cloud plugin and restart the node.
Then configure Elasticsearch to set the storage type you want.
This can be configured for all indices by adding this to the `elasticsearch.yml` file:
[source,yaml]
----
index.store.type: smb_simple_fs
----
Note that setting will be applied for newly created indices.
It can also be set on a per-index basis at index creation time:
[source,json]
----
PUT my_index
{
"settings": {
"index.store.type": "smb_mmap_fs"
}
}
----
// AUTOSENSE

View File

@ -0,0 +1,479 @@
[[cloud-gce]]
=== GCE Cloud Plugin
The Google Compute Engine Cloud plugin uses the GCE API for unicast discovery.
[[cloud-gce-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install cloud-gce
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[cloud-gce-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove cloud-gce
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[cloud-gce-usage-discovery]]
==== GCE Virtual Machine Discovery
Google Compute Engine VM discovery allows to use the google APIs to perform automatic discovery (similar to multicast
in non hostile multicast environments). Here is a simple sample configuration:
[source,yaml]
--------------------------------------------------
cloud:
gce:
project_id: <your-google-project-id>
zone: <your-zone>
discovery:
type: gce
--------------------------------------------------
[[cloud-gce-usage-discovery-short]]
===== How to start (short story)
* Create Google Compute Engine instance (with compute rw permissions)
* Install Elasticsearch
* Install Google Compute Engine Cloud plugin
* Modify `elasticsearch.yml` file
* Start Elasticsearch
[[cloud-gce-usage-discovery-long]]
==== Setting up GCE Discovery
[[cloud-gce-usage-discovery-long-prerequisites]]
===== Prerequisites
Before starting, you need:
* Your project ID, e.g. `es-cloud`. Get it from https://code.google.com/apis/console/[Google API Console].
* To install https://developers.google.com/cloud/sdk/[Google Cloud SDK]
If you did not set it yet, you can define your default project you will work on:
[source,sh]
--------------------------------------------------
gcloud config set project es-cloud
--------------------------------------------------
[[cloud-gce-usage-discovery-long-first-instance]]
===== Creating your first instance
[source,sh]
--------------------------------------------------
gcutil addinstance myesnode1 \
--service_account_scope=compute-rw,storage-full \
--persistent_boot_disk
--------------------------------------------------
Then follow these steps:
* You will be asked to open a link in your browser. Login and allow access to listed services.
* You will get back a verification code. Copy and paste it in your terminal.
* You should see an `Authentication successful.` message.
* Choose your zone, e.g. `europe-west1-a`.
* Choose your compute instance size, e.g. `f1-micro`.
* Choose your OS, e.g. `projects/debian-cloud/global/images/debian-7-wheezy-v20140606`.
* You may be asked to create a ssh key. Follow the instructions to create one.
When done, a report like this one should appears:
[source,text]
--------------------------------------------------
Table of resources:
+-----------+--------------+-------+---------+--------------+----------------+----------------+----------------+---------+----------------+
| name | machine-type | image | network | network-ip | external-ip | disks | zone | status | status-message |
+-----------+--------------+-------+---------+--------------+----------------+----------------+----------------+---------+----------------+
| myesnode1 | f1-micro | | default | 10.240.20.57 | 192.158.29.199 | boot-myesnode1 | europe-west1-a | RUNNING | |
+-----------+--------------+-------+---------+--------------+----------------+----------------+----------------+---------+----------------+
--------------------------------------------------
You can now connect to your instance:
[source,sh]
--------------------------------------------------
# Connect using google cloud SDK
gcloud compute ssh myesnode1 --zone europe-west1-a
# Or using SSH with external IP address
ssh -i ~/.ssh/google_compute_engine 192.158.29.199
--------------------------------------------------
[IMPORTANT]
.Service Account Permissions
==============================================
It's important when creating an instance that the correct permissions are set. At a minimum, you must ensure you have:
[source,text]
--------------------------------------------------
service_account_scope=compute-rw
--------------------------------------------------
Failing to set this will result in unauthorized messages when starting Elasticsearch.
See [Machine Permissions](#machine-permissions).
==============================================
Once connected, install Elasticsearch:
[source,sh]
--------------------------------------------------
sudo apt-get update
# Download Elasticsearch
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-2.0.0.deb
# Prepare Java installation
sudo apt-get install java7-runtime-headless
# Prepare Elasticsearch installation
sudo dpkg -i elasticsearch-2.0.0.deb
--------------------------------------------------
[[cloud-gce-usage-discovery-long-install-plugin]]
===== Install elasticsearch cloud gce plugin
Install the plugin:
[source,sh]
--------------------------------------------------
# Use Plugin Manager to install it
sudo bin/plugin install cloud-gce
--------------------------------------------------
Open the `elasticsearch.yml` file:
[source,sh]
--------------------------------------------------
sudo vi /etc/elasticsearch/elasticsearch.yml
--------------------------------------------------
And add the following lines:
[source,yaml]
--------------------------------------------------
cloud:
gce:
project_id: es-cloud
zone: europe-west1-a
discovery:
type: gce
--------------------------------------------------
Start elasticsearch:
[source,sh]
--------------------------------------------------
sudo /etc/init.d/elasticsearch start
--------------------------------------------------
If anything goes wrong, you should check logs:
[source,sh]
--------------------------------------------------
tail -f /var/log/elasticsearch/elasticsearch.log
--------------------------------------------------
If needed, you can change log level to `TRACE` by opening `logging.yml`:
[source,sh]
--------------------------------------------------
sudo vi /etc/elasticsearch/logging.yml
--------------------------------------------------
and adding the following line:
[source,yaml]
--------------------------------------------------
# discovery
discovery.gce: TRACE
--------------------------------------------------
[[cloud-gce-usage-discovery-cloning]]
==== Cloning your existing machine
In order to build a cluster on many nodes, you can clone your configured instance to new nodes.
You won't have to reinstall everything!
First create an image of your running instance and upload it to Google Cloud Storage:
[source,sh]
--------------------------------------------------
# Create an image of yur current instance
sudo /usr/bin/gcimagebundle -d /dev/sda -o /tmp/
# An image has been created in `/tmp` directory:
ls /tmp
e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz
# Upload your image to Google Cloud Storage:
# Create a bucket to hold your image, let's say `esimage`:
gsutil mb gs://esimage
# Copy your image to this bucket:
gsutil cp /tmp/e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz gs://esimage
# Then add your image to images collection:
gcutil addimage elasticsearch-1-2-1 gs://esimage/e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz
# If the previous command did not work for you, logout from your instance
# and launch the same command from your local machine.
--------------------------------------------------
[[cloud-gce-usage-discovery-start-new-instances]]
===== Start new instances
As you have now an image, you can create as many instances as you need:
[source,sh]
--------------------------------------------------
# Just change node name (here myesnode2)
gcutil addinstance --image=elasticsearch-1-2-1 myesnode2
# If you want to provide all details directly, you can use:
gcutil addinstance --image=elasticsearch-1-2-1 \
--kernel=projects/google/global/kernels/gce-v20130603 myesnode2 \
--zone europe-west1-a --machine_type f1-micro --service_account_scope=compute-rw \
--persistent_boot_disk
--------------------------------------------------
[[cloud-gce-usage-discovery-remove-instance]]
===== Remove an instance (aka shut it down)
You can use https://cloud.google.com/console[Google Cloud Console] or CLI to manage your instances:
[source,sh]
--------------------------------------------------
# Stopping and removing instances
gcutil deleteinstance myesnode1 myesnode2 \
--zone=europe-west1-a
# Consider removing disk as well if you don't need them anymore
gcutil deletedisk boot-myesnode1 boot-myesnode2 \
--zone=europe-west1-a
--------------------------------------------------
[[cloud-gce-usage-discovery-zones]]
==== Using GCE zones
`cloud.gce.zone` helps to retrieve instances running in a given zone. It should be one of the
https://developers.google.com/compute/docs/zones#available[GCE supported zones].
The GCE discovery can support multi zones although you need to be aware of network latency between zones.
To enable discovery across more than one zone, just enter add your zone list to `cloud.gce.zone` setting:
[source,yaml]
--------------------------------------------------
cloud:
gce:
project_id: <your-google-project-id>
zone: ["<your-zone1>", "<your-zone2>"]
discovery:
type: gce
--------------------------------------------------
[[cloud-gce-usage-discovery-tags]]
==== Filtering by tags
The GCE discovery can also filter machines to include in the cluster based on tags using `discovery.gce.tags` settings.
For example, setting `discovery.gce.tags` to `dev` will only filter instances having a tag set to `dev`. Several tags
set will require all of those tags to be set for the instance to be included.
One practical use for tag filtering is when an GCE cluster contains many nodes that are not running
elasticsearch. In this case (particularly with high ping_timeout values) there is a risk that a new node's discovery
phase will end before it has found the cluster (which will result in it declaring itself master of a new cluster
with the same name - highly undesirable). Adding tag on elasticsearch GCE nodes and then filtering by that
tag will resolve this issue.
Add your tag when building the new instance:
[source,sh]
--------------------------------------------------
gcutil --project=es-cloud addinstance myesnode1 \
--service_account_scope=compute-rw \
--persistent_boot_disk \
--tags=elasticsearch,dev
--------------------------------------------------
Then, define it in `elasticsearch.yml`:
[source,yaml]
--------------------------------------------------
cloud:
gce:
project_id: es-cloud
zone: europe-west1-a
discovery:
type: gce
gce:
tags: elasticsearch, dev
--------------------------------------------------
[[cloud-gce-usage-discovery-port]]
==== Changing default transport port
By default, elasticsearch GCE plugin assumes that you run elasticsearch on 9300 default port.
But you can specify the port value elasticsearch is meant to use using google compute engine metadata `es_port`:
[[cloud-gce-usage-discovery-port-create]]
===== When creating instance
Add `--metadata=es_port:9301` option:
[source,sh]
--------------------------------------------------
# when creating first instance
gcutil addinstance myesnode1 \
--service_account_scope=compute-rw,storage-full \
--persistent_boot_disk \
--metadata=es_port:9301
# when creating an instance from an image
gcutil addinstance --image=elasticsearch-1-0-0-RC1 \
--kernel=projects/google/global/kernels/gce-v20130603 myesnode2 \
--zone europe-west1-a --machine_type f1-micro --service_account_scope=compute-rw \
--persistent_boot_disk --metadata=es_port:9301
--------------------------------------------------
[[cloud-gce-usage-discovery-port-run]]
===== On a running instance
[source,sh]
--------------------------------------------------
# Get metadata fingerprint
gcutil getinstance myesnode1 --zone=europe-west1-a
+------------------------+---------------------+
| property | value |
+------------------------+---------------------+
| metadata | |
| fingerprint | 42WmSpB8rSM= |
+------------------------+---------------------+
# Use that fingerprint
gcutil setinstancemetadata myesnode1 \
--zone=europe-west1-a \
--metadata=es_port:9301 \
--fingerprint=42WmSpB8rSM=
--------------------------------------------------
[[cloud-gce-usage-discovery-tips]]
==== GCE Tips
[[cloud-gce-usage-discovery-tips-projectid]]
===== Store project id locally
If you don't want to repeat the project id each time, you can save it in `~/.gcutil.flags` file using:
[source,sh]
--------------------------------------------------
gcutil getproject --project=es-cloud --cache_flag_values
--------------------------------------------------
`~/.gcutil.flags` file now contains:
[source,text]
--------------------------------------------------
--project=es-cloud
--------------------------------------------------
[[cloud-gce-usage-discovery-tips-permissions]]
===== Machine Permissions
If you have created a machine without the correct permissions, you will see `403 unauthorized` error messages. The only
way to alter these permissions is to delete the instance (NOT THE DISK). Then create another with the correct permissions.
Creating machines with gcutil::
+
--
Ensure the following flags are set:
[source,text]
--------------------------------------------------
--service_account_scope=compute-rw
--------------------------------------------------
--
Creating with console (web)::
+
--
When creating an instance using the web portal, click _Show advanced options_.
At the bottom of the page, under `PROJECT ACCESS`, choose `>> Compute >> Read Write`.
--
Creating with knife google::
+
--
Set the service account scopes when creating the machine:
[source,sh]
--------------------------------------------------
knife google server create www1 \
-m n1-standard-1 \
-I debian-7-wheezy-v20131120 \
-Z us-central1-a \
-i ~/.ssh/id_rsa \
-x jdoe \
--gce-service-account-scopes https://www.googleapis.com/auth/compute.full_control
--------------------------------------------------
Or, you may use the alias:
[source,sh]
--------------------------------------------------
--gce-service-account-scopes compute-rw
--------------------------------------------------
--
[[cloud-gce-usage-discovery-testing]]
==== Testing GCE
Integrations tests in this plugin require working GCE configuration and
therefore disabled by default. To enable tests prepare a config file
elasticsearch.yml with the following content:
[source,yaml]
--------------------------------------------------
cloud:
gce:
project_id: es-cloud
zone: europe-west1-a
discovery:
type: gce
--------------------------------------------------
Replaces `project_id` and `zone` with your settings.
To run test:
[source,sh]
--------------------------------------------------
mvn -Dtests.gce=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test
--------------------------------------------------

View File

@ -1,89 +1,107 @@
[[plugins-delete-by-query]]
== Delete By Query Plugin
=== Delete By Query Plugin
The delete by query plugin adds support for deleting all of the documents
The delete-by-query plugin adds support for deleting all of the documents
(from one or more indices) which match the specified query. It is a
replacement for the problematic _delete-by-query_ functionality which has been
removed from Elasticsearch core.
Internally, it uses the <<scroll-scan, Scan/Scroll>> and <<docs-bulk, Bulk>>
APIs to delete documents in an efficient and safe manner. It is slower than
the old _delete-by-query_ functionality, but fixes the problems with the
previous implementation.
Internally, it uses the {ref}/search-request-scroll.html#scroll-scan[Scan/Scroll]
and {ref}/docs-bulk.html[Bulk] APIs to delete documents in an efficient and
safe manner. It is slower than the old _delete-by-query_ functionality, but
fixes the problems with the previous implementation.
TIP: Queries which match large numbers of documents may run for a long time,
To understand more about why we removed delete-by-query from core and about
the semantics of the new implementation, see
<<delete-by-query-plugin-reason>>.
[TIP]
============================================
Queries which match large numbers of documents may run for a long time,
as every document has to be deleted individually. Don't use _delete-by-query_
to clean out all or most documents in an index. Rather create a new index and
perhaps reindex the documents you want to keep.
============================================
=== Installation
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
bin/plugin install elasticsearch/elasticsearch-delete-by-query
sudo bin/plugin install delete-by-query
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
=== Removal
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
bin/plugin remove elasticsearch/elasticsearch-delete-by-query
sudo bin/plugin remove delete-by-query
----------------------------------------------------------------
The node must be stopped before removing the plugin.
=== Usage
[[delete-by-query-usage]]
==== Using Delete-by-Query
The query can either be provided using a simple query string as
a parameter:
[source,shell]
--------------------------------------------------
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query?q=user:kimchy'
DELETE /twitter/tweet/_query?q=user:kimchy
--------------------------------------------------
// AUTOSENSE
or using the <<query-dsl,Query DSL>> defined within the request body:
or using the {ref}/query-dsl.html[Query DSL] defined within the request body:
[source,js]
--------------------------------------------------
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : { <1>
"term" : { "user" : "kimchy" }
DELETE /twitter/tweet/_query
{
"query": { <1>
"term": {
"user": "kimchy"
}
}
}
'
--------------------------------------------------
// AUTOSENSE
<1> The query must be passed as a value to the `query` key, in the same way as
the <<search-search,search api>>.
the {ref}/search-search.html[search api].
Both of the above examples end up doing the same thing, which is to delete all
tweets from the twitter index for the user `kimchy`.
Delete-by-query supports deletion across <<search-multi-index-type,multiple indices and multiple types>>.
Delete-by-query supports deletion across
{ref}/search-search.html#search-multi-index-type[multiple indices and multiple types].
==== Query-string parameters
[float]
=== Query-string parameters
The following query string parameters are supported:
`q`::
Instead of using the <<query-dsl,Query DSL>> to pass a `query` in the request
Instead of using the {ref}/query-dsl.html[Query DSL] to pass a `query` in the request
body, you can use the `q` query string parameter to specify a query using
<<query-string-syntax,`query_string` syntax>>. In this case, the following
additional parameters are supported: `df`, `analyzer`, `default_operator`,
`lowercase_expanded_terms`, `analyze_wildcard` and `lenient`.
See <<search-uri-request>> for details.
{ref}/query-dsl-query-string-query.html#query-string-syntax[`query_string` syntax].
In this case, the following additional parameters are supported: `df`,
`analyzer`, `default_operator`, `lowercase_expanded_terms`,
`analyze_wildcard` and `lenient`.
See {ref}/search-uri-request.html[URI search request] for details.
`size`::
The number of hits returned *per shard* by the <<scroll-scan,scroll/scan>>
The number of hits returned *per shard* by the {ref}/search-request-scroll.html#scroll-scan[scan]
request. Defaults to 10. May also be specified in the request body.
`timeout`::
@ -97,11 +115,12 @@ A comma separated list of routing values to control which shards the delete by
query request should be executed on.
When using the `q` parameter, the following additional parameters are
supported (as explained in <<search-uri-request>>): `df`, `analyzer`,
supported (as explained in {ref}/search-uri-request.html[URI search request]): `df`, `analyzer`,
`default_operator`.
==== Response body
[float]
=== Response body
The JSON response looks like this:
@ -129,8 +148,9 @@ The JSON response looks like this:
--------------------------------------------------
Internally, the query is used to execute an initial
<<scroll-scan,scroll/scan>> request. As hits are pulled from the scroll API,
they are passed to the <<bulk,Bulk API>> for deletion.
{ref}/search-request-scroll.html#scroll-scan[scroll/scan] request. As hits are
pulled from the scroll API, they are passed to the {ref}/docs-bulk.html[Bulk
API] for deletion.
IMPORTANT: Delete by query will only delete the version of the document that
was visible to search at the time the request was executed. Any documents
@ -161,3 +181,90 @@ The number of documents that failed to be deleted for the given index. A
document may fail to be deleted if it has been updated to a new version by
another process, or if the shard containing the document has gone missing due
to hardware failure, for example.
[[delete-by-query-plugin-reason]]
==== Why Delete-By-Query is a plugin
The old delete-by-query API in Elasticsearch 1.x was fast but problematic. We
decided to remove the feature from Elasticsearch for these reasons:
Forward compatibility::
The old implementation wrote a delete-by-query request, including the
query, to the transaction log. This meant that, when upgrading to a new
version, old unsupported queries which cannot be executed might exist in
the translog, thus causing data corruption.
Consistency and correctness::
The old implementation executed the query and deleted all matching docs on
the primary first. It then repeated this procedure on each replica shard.
There was no guarantee that the queries on the primary and the replicas
matched the same document, so it was quite possible to end up with
different documents on each shard copy.
Resiliency::
The old implementation could cause out-of-memory exceptions, merge storms,
and dramatic slow downs if used incorrectly.
[float]
=== New delete-by-query implementation
The new implementation, provided by this plugin, is built internally
using {ref}/search-request-scroll.html#scroll-scan[scan and scroll] to return
the document IDs and versions of all the documents that need to be deleted.
It then uses the {ref}/docs-bulk.html[`bulk` API] to do the actual deletion.
This can have performance as well as visibility implications. Delete-by-query
now has the following semantics:
non-atomic::
A delete-by-query may fail at any time while some documents matching the
query have already been deleted.
try-once::
A delete-by-query may fail at any time and will not retry it's execution.
All retry logic is left to the user.
syntactic sugar::
A delete-by-query is equivalent to a scan/scroll search and corresponding
bulk-deletes by ID.
point-in-time::
A delete-by-query will only delete the documents that are visible at the
point in time the delete-by-query was started, equivalent to the
scan/scroll API.
consistent::
A delete-by-query will yield consistent results across all replicas of a
shard.
forward-compatible::
A delete-by-query will only send IDs to the shards as deletes such that no
queries are stored in the transaction logs that might not be supported in
the future.
visibility::
The effect of a delete-by-query request will not be visible to search
until the user refreshes the index, or the index is refreshed
automatically.
The new implementation suffers from two issues, which is why we decided to
move the functionality to a plugin instead of replacing the feautre in core:
* It is not as fast as the previous implementation. For most use cases, this
difference should not be noticeable but users running delete-by-query on
many matching documents may be affected.
* There is currently no way to monitor or cancel a running delete-by-query
request, except for the `timeout` parameter.
We have plans to solve both of these issues in a later version of Elasticsearch.

View File

@ -0,0 +1,45 @@
[[discovery]]
== Discovery Plugins
Discovery plugins extend Elasticsearch by adding new discovery mechanisms that
can be used instead of {ref}/modules-discovery-zen.html[Zen Discovery].
[float]
==== Core discovery plugins
The core discovery plugins are:
<<cloud-aws,AWS Cloud>>::
The Amazon Web Service (AWS) Cloud plugin uses the
https://github.com/aws/aws-sdk-java[AWS API] for unicast discovery, and adds
support for using S3 as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].
<<cloud-azure,Azure Cloud>>::
The Azure Cloud plugin uses the Azure API for unicast discovery, and adds
support for using Azure as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].
<<cloud-gce,GCE Cloud>>::
The Google Compute Engine Cloud plugin uses the GCE API for unicast discovery.
[float]
==== Community contributed discovery plugins
A number of discovery plugins have been contributed by our community:
* https://github.com/grantr/elasticsearch-srv-discovery[DNS SRV Discovery Plugin] (by Grant Rodgers)
* https://github.com/shikhar/eskka[eskka Discovery Plugin] (by Shikhar Bhushan)
* https://github.com/grmblfrz/elasticsearch-zookeeper[ZooKeeper Discovery Plugin] (by Sonian Inc.)
include::cloud-aws.asciidoc[]
include::cloud-azure.asciidoc[]
include::cloud-gce.asciidoc[]

View File

@ -0,0 +1,65 @@
= Elasticsearch Plugins and Integrations
:ref: https://www.elastic.co/guide/en/elasticsearch/reference/master
:guide: https://www.elastic.co/guide
[[intro]]
== Introduction to plugins
Plugins are a way to enhance the core Elasticsearch functionality in a custom
manner. They range from adding custom mapping types, custom analyzers, native
scripts, custom discovery and more.
There are three types of plugins:
Java plugins::
These plugins contain only JAR files, and must be installed on every node
in the cluster. After installation, each node must be restarted before
the plugin becomes visible.
Site plugins::
+
--
These plugins contain static web content like Javascript, HTML, and CSS files,
that can be served directly from Elasticsearch. Site plugins may only need to
be installed on one node, and do not require a restart to become visible. The
content of site plugins is accessible via a URL like:
http://yournode:9200/_plugin/[plugin name]
--
Mixed plugins::
Mixed plugins contain both JAR files and web content.
For advice on writing your own plugin, see <<plugin-authors>>.
include::plugin-script.asciidoc[]
include::api.asciidoc[]
include::alerting.asciidoc[]
include::analysis.asciidoc[]
include::discovery.asciidoc[]
include::management.asciidoc[]
include::mapper.asciidoc[]
include::scripting.asciidoc[]
include::security.asciidoc[]
include::repository.asciidoc[]
include::transport.asciidoc[]
include::integrations.asciidoc[]
include::authors.asciidoc[]

View File

@ -0,0 +1,220 @@
[[integrations]]
== Integrations
Integrations are not plugins, instead they are external tools or modules which
make it easier to work with Elasticsearch.
[float]
[[cms-integrations]]
=== CMS integrations
[float]
==== Supported by the community:
* http://drupal.org/project/search_api_elasticsearch[Drupal]:
Drupal Elasticsearch integration via Search API.
* https://drupal.org/project/elasticsearch_connector[Drupal]:
Drupal Elasticsearch integration.
* http://searchbox-io.github.com/wp-elasticsearch/[Wp-Elasticsearch]:
Elasticsearch WordPress Plugin
* https://github.com/wallmanderco/elasticsearch-indexer[Elasticsearch Indexer]:
Elasticsearch WordPress Plugin
* https://doc.tiki.org/Elasticsearch[Tiki Wiki CMS Groupware]:
Tiki has native support for Elasticsearch. This provides faster & better
search (facets, etc), along with some Natural Language Processing features
(ex.: More like this)
[float]
[[data-integrations]]
=== Data import/export and validation
NOTE: Rivers were used to import data from external systems into
Elasticsearch, but they are no longer supported in Elasticsearch 2.0.
[float]
==== Supported by the community:
* https://github.com/jprante/elasticsearch-jdbc[JDBC importer]:
The Java Database Connection (JDBC) importer allows to fetch data from JDBC sources for indexing into Elasticsearch (by Jörg Prante)
* https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer[Kafka Standalone Consumer]:
Easily Scaleable & Extendable, Kafka Standalone Consumer that will read the messages from Kafka, processes and index them in ElasticSearch
* https://github.com/ozlerhakan/mongolastic[Mongolastic]:
A tool that clone data from ElasticSearch to MongoDB and vice versa
* https://github.com/Aconex/scrutineer[Scrutineer]:
A high performance consistency checker to compare what you've indexed
with your source of truth content (e.g. DB)
[float]
[[deployment]]
=== Deployment
[float]
==== Supported by Elasticsearch:
* https://github.com/elasticsearch/puppet-elasticsearch[Puppet]:
Elasticsearch puppet module.
[float]
==== Supported by the community:
* http://github.com/elasticsearch/cookbook-elasticsearch[Chef]:
Chef cookbook for Elasticsearch
This project appears to have been abandoned:
* https://github.com/medcl/salt-elasticsearch[SaltStack]:
SaltStack Module for Elasticsearch
[float]
[[framework-integrations]]
=== Framework integrations
[float]
==== Supported by the community:
* http://www.searchtechnologies.com/aspire-for-elasticsearch[Aspire for Elasticsearch]:
Aspire, from Search Technologies, is a powerful connector and processing
framework designed for unstructured data. It has connectors to internal and
external repositories including SharePoint, Documentum, Jive, RDB, file
systems, websites and more, and can transform and normalize this data before
indexing in Elasticsearch.
* https://camel.apache.org/elasticsearch.html[Apache Camel Integration]:
An Apache camel component to integrate elasticsearch
* https://metacpan.org/release/Catmandu-Store-ElasticSearch[Catmanadu]:
An Elasticsearch backend for the Catmandu framework.
* https://github.com/tlrx/elasticsearch-test[elasticsearch-test]:
Elasticsearch Java annotations for unit testing with
http://www.junit.org/[JUnit]
* https://github.com/FriendsOfSymfony/FOSElasticaBundle[FOSElasticaBundle]:
Symfony2 Bundle wrapping Elastica.
* http://grails.org/plugin/elasticsearch[Grails]:
Elasticsearch Grails plugin.
* http://haystacksearch.org/[Haystack]:
Modular search for Django
* https://github.com/cleverage/play2-elasticsearch[play2-elasticsearch]:
Elasticsearch module for Play Framework 2.x
* https://github.com/spring-projects/spring-data-elasticsearch[Spring Data Elasticsearch]:
Spring Data implementation for Elasticsearch
* https://github.com/dadoonet/spring-elasticsearch[Spring Elasticsearch]:
Spring Factory for Elasticsearch
* https://github.com/twitter/storehaus[Twitter Storehaus]:
Thin asynchronous Scala client for Storehaus.
These projects appear to have been abandoned:
* https://metacpan.org/module/Catalyst::Model::Search::Elasticsearch[Catalyst]:
Elasticsearch and Catalyst integration.
* http://github.com/aparo/django-elasticsearch[django-elasticsearch]:
Django Elasticsearch Backend.
* https://github.com/kzwang/elasticsearch-osem[elasticsearch-osem]:
A Java Object Search Engine Mapping (OSEM) for Elasticsearch
* http://geeks.aretotally.in/play-framework-module-elastic-search-distributed-searching-with-json-http-rest-or-java[Play!Framework]:
Integrate with Play! Framework Application.
* http://code.google.com/p/terrastore/wiki/Search_Integration[Terrastore Search]:
http://code.google.com/p/terrastore/[Terrastore] integration module with elasticsearch.
[float]
[[hadoop-integrations]]
=== Hadoop integrations
[float]
==== Supported by Elasticsearch:
* link:/guide/en/elasticsearch/hadoop/current/[es-hadoop]: Elasticsearch real-time
search and analytics natively integrated with Hadoop. Supports Map/Reduce,
Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.
[float]
==== Supported by the community:
These projects appear to have been abandoned:
* http://github.com/Aconex/elasticflume[elasticflume]:
http://github.com/cloudera/flume[Flume] sink implementation.
* https://github.com/infochimps-labs/wonderdog[Wonderdog]:
Hadoop bulk loader into elasticsearch.
[float]
[[monitoring-integrations]]
=== Health and Performance Monitoring
[float]
==== Supported by the community:
* https://github.com/anchor/nagios-plugin-elasticsearch[check_elasticsearch]:
An Elasticsearch availability and performance monitoring plugin for
Nagios.
* https://github.com/radu-gheorghe/check-es[check-es]:
Nagios/Shinken plugins for checking on elasticsearch
* https://github.com/mattweber/es2graphite[es2graphite]:
Send cluster and indices stats and status to Graphite for monitoring and graphing.
* https://itunes.apple.com/us/app/elasticocean/id955278030?ls=1&mt=8[ElasticOcean]:
Elasticsearch & DigitalOcean iOS Real-Time Monitoring tool to keep an eye on DigitalOcean Droplets or Elasticsearch instances or both of them on-a-go.
* https://github.com/rbramley/Opsview-elasticsearch[opsview-elasticsearch]:
Opsview plugin written in Perl for monitoring Elasticsearch
* https://scoutapp.com[Scout]: Provides plugins for monitoring Elasticsearch https://scoutapp.com/plugin_urls/1331-elasticsearch-node-status[nodes], https://scoutapp.com/plugin_urls/1321-elasticsearch-cluster-status[clusters], and https://scoutapp.com/plugin_urls/1341-elasticsearch-index-status[indices].
* http://sematext.com/spm/index.html[SPM for Elasticsearch]:
Performance monitoring with live charts showing cluster and node stats, integrated
alerts, email reports, etc.
[[other-integrations]]
[float]
=== Other integrations
[float]
==== Supported by the community:
* https://github.com/kodcu/pes[Pes]:
A pluggable elastic Javascript query DSL builder for Elasticsearch
* https://www.wireshark.org/[Wireshark]:
Protocol dissection for Zen discovery, HTTP and the binary protocol
These projects appears to have been abandoned:
* http://www.github.com/neogenix/daikon[daikon]:
Daikon Elasticsearch CLI
* https://github.com/fullscale/dangle[dangle]:
A set of AngularJS directives that provide common visualizations for elasticsearch based on
D3.
* https://github.com/OlegKunitsyn/eslogd[eslogd]:
Linux daemon that replicates events to a central Elasticsearch server in real-time

View File

@ -0,0 +1,193 @@
[[lang-javascript]]
=== JavaScript Language Plugin
The JavaScript language plugin enables the use of JavaScript in Elasticsearch
scripts, via Mozilla's
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Rhino[Rhino JavaScript] engine.
[[lang-javascript-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install lang-javascript
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[lang-javascript-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove lang-javascript
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[lang-javascript-usage]]
==== Using JavaScript in Elasticsearch
Once the plugin has been installed, JavaScript can be used at a scripting
language by setting the `lang` parameter to `javascript` or `js`.
Scripting is available in many APIs, but we will use an example with the
`function_score` for demonstration purposes:
[[lang-javascript-inline]]
[float]
=== Inline scripts
WARNING: Enabling inline scripting on an unprotected Elasticsearch cluster is dangerous.
See <<lang-javascript-file>> for a safer option.
If you have enabled {ref}/modules-scripting.html#enable-dynamic-scripting[inline scripts],
you can use JavaScript as follows:
[source,json]
----
DELETE test
PUT test/doc/1
{
"num": 1.0
}
PUT test/doc/2
{
"num": 2.0
}
GET test/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"inline": "doc[\"num\"].value * factor",
"lang": "javascript",
"params": {
"factor": 2
}
}
}
}
}
}
----
// AUTOSENSE
[[lang-javascript-indexed]]
[float]
=== Indexed scripts
WARNING: Enabling indexed scripting on an unprotected Elasticsearch cluster is dangerous.
See <<lang-javascript-file>> for a safer option.
If you have enabled {ref}/modules-scripting.html#enable-dynamic-scripting[indexed scripts],
you can use JavaScript as follows:
[source,json]
----
DELETE test
PUT test/doc/1
{
"num": 1.0
}
PUT test/doc/2
{
"num": 2.0
}
POST _scripts/javascript/my_script <1>
{
"script": "doc[\"num\"].value * factor"
}
GET test/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"id": "my_script", <2>
"lang": "javascript",
"params": {
"factor": 2
}
}
}
}
}
}
----
// AUTOSENSE
<1> We index the script under the id `my_script`.
<2> The function score query retrieves the script with id `my_script`.
[[lang-javascript-file]]
[float]
=== File scripts
You can save your scripts to a file in the `config/scripts/` directory on
every node. The `.javascript` file suffix identifies the script as containing
JavaScript:
First, save this file as `config/scripts/my_script.javascript` on every node
in the cluster:
[source,js]
----
doc["num"].value * factor
----
then use the script as follows:
[source,json]
----
DELETE test
PUT test/doc/1
{
"num": 1.0
}
PUT test/doc/2
{
"num": 2.0
}
GET test/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"file": "my_script", <1>
"lang": "javascript",
"params": {
"factor": 2
}
}
}
}
}
}
----
// AUTOSENSE
<1> The function score query retrieves the script with filename `my_script.javascript`.

View File

@ -0,0 +1,192 @@
[[lang-python]]
=== Python Language Plugin
The Python language plugin enables the use of Python in Elasticsearch
scripts, via the http://www.jython.org/[Jython] Java implementation of Python.
[[lang-python-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install lang-python
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[lang-python-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove lang-python
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[lang-python-usage]]
==== Using Python in Elasticsearch
Once the plugin has been installed, Python can be used at a scripting
language by setting the `lang` parameter to `python`.
Scripting is available in many APIs, but we will use an example with the
`function_score` for demonstration purposes:
[[lang-python-inline]]
[float]
=== Inline scripts
WARNING: Enabling inline scripting on an unprotected Elasticsearch cluster is dangerous.
See <<lang-python-file>> for a safer option.
If you have enabled {ref}/modules-scripting.html#enable-dynamic-scripting[inline scripts],
you can use Python as follows:
[source,json]
----
DELETE test
PUT test/doc/1
{
"num": 1.0
}
PUT test/doc/2
{
"num": 2.0
}
GET test/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"inline": "doc[\"num\"].value * factor",
"lang": "python",
"params": {
"factor": 2
}
}
}
}
}
}
----
// AUTOSENSE
[[lang-python-indexed]]
[float]
=== Indexed scripts
WARNING: Enabling indexed scripting on an unprotected Elasticsearch cluster is dangerous.
See <<lang-python-file>> for a safer option.
If you have enabled {ref}/modules-scripting.html#enable-dynamic-scripting[indexed scripts],
you can use Python as follows:
[source,json]
----
DELETE test
PUT test/doc/1
{
"num": 1.0
}
PUT test/doc/2
{
"num": 2.0
}
POST _scripts/python/my_script <1>
{
"script": "doc[\"num\"].value * factor"
}
GET test/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"id": "my_script", <2>
"lang": "python",
"params": {
"factor": 2
}
}
}
}
}
}
----
// AUTOSENSE
<1> We index the script under the id `my_script`.
<2> The function score query retrieves the script with id `my_script`.
[[lang-python-file]]
[float]
=== File scripts
You can save your scripts to a file in the `config/scripts/` directory on
every node. The `.python` file suffix identifies the script as containing
Python:
First, save this file as `config/scripts/my_script.python` on every node
in the cluster:
[source,python]
----
doc["num"].value * factor
----
then use the script as follows:
[source,json]
----
DELETE test
PUT test/doc/1
{
"num": 1.0
}
PUT test/doc/2
{
"num": 2.0
}
GET test/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"file": "my_script", <1>
"lang": "python",
"params": {
"factor": 2
}
}
}
}
}
}
----
// AUTOSENSE
<1> The function score query retrieves the script with filename `my_script.python`.

View File

@ -0,0 +1,46 @@
[[management]]
== Management and Site Plugins
Management and site plugins offer UIs for managing and interacting with
Elasticsearch.
[float]
=== Core management plugins
The core management plugins are:
link:/products/marvel[Marvel]::
Marvel is a management and monitoring product for Elasticsearch. Marvel
aggregates cluster wide statistics and events and offers a single interface to
view and analyze them. Marvel is free for development use but requires a
license to run in production.
https://github.com/elastic/elasticsearch-migration[Migration]::
This plugin will help you to check whether you can upgrade directly to
Elasticsearch version 2.x, or whether you need to make changes to your data
before doing so. It will run on Elasticsearch versions 0.90.x to 1.x.
[float]
=== Community contributed management and site plugins
A number of plugins have been contributed by our community:
* https://github.com/lukas-vlcek/bigdesk[BigDesk Plugin] (by Lukáš Vlček)
* https://github.com/spinscale/elasticsearch-graphite-plugin[Elasticsearch Graphite Plugin]:
Regularly updates a graphite host with indices stats and nodes stats (by Alexander Reelsen)
* https://github.com/mobz/elasticsearch-head[Elasticsearch Head Plugin] (by Ben Birch)
* https://github.com/royrusso/elasticsearch-HQ[Elasticsearch HQ] (by Roy Russo)
* https://github.com/andrewvc/elastic-hammer[Hammer Plugin] (by Andrew Cholakian)
* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor Plugin] (by Zachary Tong)
* https://github.com/lmenezes/elasticsearch-kopf[Kopf Plugin] (by lmenezes)
These community plugins appear to have been abandoned:
* https://github.com/karmi/elasticsearch-paramedic[Paramedic Plugin] (by Karel Minařík)
* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy Plugin] (by Zachary Tong)
* https://github.com/xyu/elasticsearch-whatson[Whatson Plugin] (by Xiao Yu)

View File

@ -1,9 +1,41 @@
[[mapping-size-field]]
=== `_size` field
[[mapper-size]]
=== Mapper Size Plugin
The `_size` field, when enabled, indexes the size in bytes of the original
<<mapping-source-field,`_source`>>. In order to enable it, set
the mapping as follows:
The mapper-size plugin provides the `_size` meta field which, when enabled,
indexes the size in bytes of the original
{ref}/mapping-source-field.html[`_source`] field.
[[mapper-size-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install mapper-size
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[mapper-size-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove mapper-size
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[mapper-size-usage]]
==== Using the `_size` field
In order to enable the `_size` field, set the mapping as follows:
[source,js]
--------------------------

View File

@ -0,0 +1,18 @@
[[mapper]]
== Mapper Plugins
Mapper plugins allow new field datatypes to be added to Elasticsearch.
[float]
=== Core mapper plugins
The core mapper plugins are:
<<mapper-size>>::
The mapper-size plugin provides the `_size` meta field which, when enabled,
indexes the size in bytes of the original
{ref}/mapping-source-field.html[`_source`] field.
include::mapper-size.asciidoc[]

View File

@ -0,0 +1,240 @@
[[plugin-management]]
== Plugin Management
The `plugin` script is used to install, list, and remove plugins. It is
located in the `$ES_HOME/bin` directory by default but it may be in a
{ref}/setup-dir-layout.html[different location] if you installed Elasticsearch
with an RPM or deb package.
Run the following command to get usage instructions:
[source,shell]
-----------------------------------
sudo bin/plugin -h
-----------------------------------
[[installation]]
=== Installing Plugins
The documentation for each plugin usually includes specific installation
instructions for that plugin, but below we document the various available
options:
[float]
=== Core Elasticsearch plugins
Core Elasticsearch plugins can be installed as follows:
[source,shell]
-----------------------------------
sudo bin/plugin install [plugin_name]
-----------------------------------
For instance, to install the core <<analysis-icu,ICU plugin>>, just run the
following command:
[source,shell]
-----------------------------------
sudo bin/plugin install analysis-icu
-----------------------------------
This command will install the version of the plugin that matches your
Elasticsearch version.
[float]
=== Community and non-core plugins
Non-core plugins provided by Elasticsearch, or plugins provided by the
community, can be installed from `download.elastic.co`, from Maven (Central
and Sonatype), or from GitHub. In this case, the command is as follows:
[source,shell]
-----------------------------------
sudo bin/plugin install [org]/[user|component]/[version]
-----------------------------------
For instance, to install the https://github.com/lmenezes/elasticsearch-kopf[Kopf]
plugin from GitHub, run one of the following commands:
[source,shell]
-----------------------------------
sudo bin/plugin install lmenezes/elasticsearch-kopf <1>
sudo bin/plugin install lmenezes/elasticsearch-kopf/1.x <2>
-----------------------------------
<1> Installs the latest version from GitHub.
<2> Installs the 1.x version from GitHub.
When installing from Maven Central/Sonatype, `[org]` should be replaced by
the artifact `groupId`, and `[user|component]` by the `artifactId`. For
instance, to install the
https://github.com/elastic/elasticsearch-mapper-attachments[mapper attachment]
plugin from Sonatype, run:
[source,shell]
-----------------------------------
sudo bin/plugin install org.elasticsearch/elasticsearch-mapper-attachments/2.6.0 <1>
-----------------------------------
<1> When installing from `download.elastic.co` or from Maven Central/Sonatype, the
version is required.
[float]
=== Custom URL or file system
A plugin can also be downloaded directly from a custom location by specifying the URL:
[source,shell]
-----------------------------------
sudo bin/plugin install [plugin-name] --url [url] <1>
-----------------------------------
<1> Both the URL and the plugin name must be specified.
For instance, to install a plugin from your local file system, you could run:
[source,shell]
-----------------------------------
sudo bin/plugin install my_plugin --url file:/path/to/plugin.zip
-----------------------------------
[[listing-removing]]
=== Listing and Removing Installed Plugins
[float]
=== Listing plugins
A list of the currently loaded plugins can be retrieved with the `list` option:
[source,shell]
-----------------------------------
sudo bin/plugin list
-----------------------------------
Alternatively, use the {ref}/cluster-nodes-info.html[node-info API] to find
out which plugins are installed on each node in the cluster
[float]
=== Removing plugins
Plugins can be removed manually, by deleting the appropriate directory under
`plugins/`, or using the public script:
[source,shell]
-----------------------------------
sudo bin/plugin remove [pluginname]
-----------------------------------
=== Other command line parameters
The `plugin` scripts supports a number of other command line parameters:
[float]
=== Silent/Verbose mode
The `--verbose` parameter outputs more debug information, while the `--silent`
parameter turns off all output. The script may return the following exit
codes:
[horizontal]
`0`:: everything was OK
`64`:: unknown command or incorrect option parameter
`74`:: IO error
`70`:: any other error
[float]
=== Custom config directory
If your `elasticsearch.yml` config file is in a custom location, you will need
to specify the path to the config file when using the `plugin` script. You
can do this as follows:
[source,sh]
---------------------
sudo bin/plugin -Des.path.conf=/path/to/custom/config/dir install <plugin name>
---------------------
You can also set the `CONF_DIR` environment variable to the custom config
directory path.
[float]
=== Timeout settings
By default, the `plugin` script will wait indefinitely when downloading before
failing. The timeout parameter can be used to explicitly specify how long it
waits. Here is some examples of setting it to different values:
[source,shell]
-----------------------------------
# Wait for 30 seconds before failing
sudo bin/plugin install mobz/elasticsearch-head --timeout 30s
# Wait for 1 minute before failing
sudo bin/plugin install mobz/elasticsearch-head --timeout 1m
# Wait forever (default)
sudo bin/plugin install mobz/elasticsearch-head --timeout 0
-----------------------------------
[float]
=== Proxy settings
To install a plugin via a proxy, you can pass the proxy details in with the
Java settings `proxyHost` and `proxyPort`. On Unix based systems, these
options can be set on the command line:
[source,shell]
-----------------------------------
sudo bin/plugin install mobz/elasticsearch-head -DproxyHost=host_name -DproxyPort=port_number
-----------------------------------
On Windows, they need to be added to the `JAVA_OPTS` environment variable:
[source,shell]
-----------------------------------
set JAVA_OPTS="-DproxyHost=host_name -DproxyPort=port_number"
bin/plugin install mobz/elasticsearch-head
-----------------------------------
=== Settings related to plugins
[float]
=== Custom plugins directory
The `plugins` directory can be changed from the default by adding the
following to the `elasticsearch.yml` config file:
[source,yml]
---------------------
path.plugins: /path/to/custom/plugins/dir
---------------------
The default location of the `plugins` directory depends on
{ref}/setup-dir-layout.html[which package you install].
[float]
=== Mandatory Plugins
If you rely on some plugins, you can define mandatory plugins by adding
`plugin.mandatory` setting to the `config/elasticsearch.yml` file, for
example:
[source,yaml]
--------------------------------------------------
plugin.mandatory: mapper-attachments,lang-groovy
--------------------------------------------------
For safety reasons, a node will not start if it is missing a mandatory plugin.
[float]
=== Lucene version dependent plugins
For some plugins, such as analysis plugins, a specific major Lucene version is
required to run. In that case, the plugin provides in its
`es-plugin.properties` file the Lucene version for which the plugin was built for.
If present at startup the node will check the Lucene version before loading
the plugin. You can disable that check using
[source,yaml]
--------------------------------------------------
plugins.check_lucene: false
--------------------------------------------------

View File

@ -0,0 +1,37 @@
[[repository]]
== Snapshot/Restore Repository Plugins
Repository plugins extend the {ref}/modules-snapshots.html[Snapshot/Restore]
functionality in Elasticsearch by adding repositories backed by the cloud or
by distributed file systems:
[float]
==== Core repository plugins
The core repository plugins are:
<<cloud-aws,AWS Cloud>>::
The Amazon Web Service (AWS) Cloud plugin adds support for using S3 as a
repository.
<<cloud-azure,Azure Cloud>>::
The Azure Cloud plugin adds support for using Azure as a repository.
https://github.com/elastic/elasticsearch-hadoop/tree/master/repository-hdfs[Hadoop HDFS Repository]::
The Hadoop HDFS Repository plugin adds support for using an HDFS file system
as a repository.
[float]
=== Community contributed repository plugins
The following plugin has been contributed by our community:
* https://github.com/wikimedia/search-repository-swift[Openstack Swift] (by http://en.cam4.es/youngqcmeat/Wikimedia Foundation)
This community plugin appears to have been abandoned:
* https://github.com/kzwang/elasticsearch-repository-gridfs[GridFS] Repository (by Kevin Wang)

View File

@ -0,0 +1,32 @@
[[scripting]]
== Scripting Plugins
Scripting plugins extend the scripting functionality in Elasticsearch to allow
the use of other scripting languages.
[float]
=== Core scripting plugins
The core scripting plugins are:
<<lang-javascript,JavaScript Language>>::
The JavaScript language plugin enables the use of JavaScript in Elasticsearch
scripts, via Mozilla's
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Rhino[Rhino JavaScript] engine.
<<lang-python,Python Language>>::
The Python language plugin enables the use of Python in Elasticsearch
scripts, via the http://www.jython.org/[Jython] Java implementation of Python.
[float]
=== Abandoned community scripting plugins
This plugin has been contributed by our community, but appears to be abandoned:
* https://github.com/hiredman/elasticsearch-lang-clojure[Clojure Language Plugin] (by Kevin Downey)
include::lang-javascript.asciidoc[]
include::lang-python.asciidoc[]

View File

@ -0,0 +1,29 @@
[[security]]
== Security Plugins
Security plugins add a security layer to Elasticsearch.
[float]
=== Core security plugins
The core security plugins are:
link:/products/shield[Shield]::
Shield is the Elastic product that makes it easy for anyone to add
enterprise-grade security to their ELK stack. Designed to address the growing security
needs of thousands of enterprises using ELK today, Shield provides peace of
mind when it comes to protecting your data.
[float]
=== Community contributed security plugins
The following plugin has been contributed by our community:
* https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin[Readonly REST]:
High performance access control for Elasticsearch native REST API (by Simone Scarduzio)
This community plugin appears to have been abandoned:
* https://github.com/sonian/elasticsearch-jetty[Jetty HTTP transport plugin]:
Uses Jetty to provide SSL connections, basic authentication, and request logging (by Sonian Inc.)

View File

@ -0,0 +1,22 @@
[[transport]]
== Transport Plugins
Transport plugins offer alternatives to HTTP.
[float]
=== Core transport plugins
The core transport plugins are:
https://github.com/elasticsearch/elasticsearch-transport-wares::[Servlet transport]::
Use the REST interface over servlets.
[float]
=== Community contributed transport plugins
The following community plugins appear to have been abandoned:
* https://github.com/kzwang/elasticsearch-transport-redis[Redis transport plugin] (by Kevin Wang)
* https://github.com/tlrx/transport-zeromq[ØMQ transport plugin] (by Tanguy Leroux)

View File

@ -5,6 +5,7 @@
:branch: 2.0
:jdk: 1.8.0_25
:defguide: https://www.elastic.co/guide/en/elasticsearch/guide/current
:plugins: https://www.elastic.co/guide/en/elasticsearch/plugins/master
include::getting-started.asciidoc[]

View File

@ -32,9 +32,10 @@ can be customised when a mapping type is created.
The original JSON representing the body of the document.
<<mapping-size-field,`_size`>>::
{plugins}/mapping-size.html[`_size`]::
The size of the `_source` field in bytes.
The size of the `_source` field in bytes, provided by the
{plugins}/mapping-size.html[`mapper-size` plugin].
[float]
=== Indexing meta-fields

View File

@ -9,288 +9,4 @@ custom manner. They range from adding custom mapping types, custom
analyzers (in a more built in fashion), native scripts, custom discovery
and more.
[float]
[[installing]]
==== Installing plugins
Installing plugins can either be done manually by placing them under the
`plugins` directory, or using the `plugin` script.
Installing plugins typically take the following form:
[source,sh]
-----------------------------------
bin/plugin install plugin_name
-----------------------------------
The plugin will be automatically downloaded in this case from `download.elastic.co` download service using the
same version as your elasticsearch version.
For older version of elasticsearch (prior to 2.0.0) or community plugins, you would use the following form:
[source,sh]
-----------------------------------
bin/plugin install <org>/<user/component>/<version>
-----------------------------------
The plugins will be automatically downloaded in this case from `download.elastic.co` (for older plugins),
and in case they don't exist there, from maven (central and sonatype).
Note that when the plugin is located in maven central or sonatype
repository, `<org>` is the artifact `groupId` and `<user/component>` is
the `artifactId`.
A plugin can also be installed directly by specifying the URL for it,
for example:
[source,sh]
-----------------------------------
bin/plugin install plugin-name --url file:///path/to/plugin
-----------------------------------
You can run `bin/plugin -h`, or `bin/plugin install -h` for help on the install command
as well as `bin/plugin remove -h` for help on the remove command..
[float]
[[site-plugins]]
==== Site Plugins
Plugins can have "sites" in them, any plugin that exists under the
`plugins` directory with a `_site` directory, its content will be
statically served when hitting `/_plugin/[plugin_name]/` url. Those can
be added even after the process has started.
Installed plugins that do not contain any java related content, will
automatically be detected as site plugins, and their content will be
moved under `_site`.
The ability to install plugins from Github allows to easily install site
plugins hosted there by downloading the actual repo, for example,
running:
[source,js]
--------------------------------------------------
bin/plugin install mobz/elasticsearch-head
bin/plugin install lukas-vlcek/bigdesk
--------------------------------------------------
Will install both of those site plugins, with `elasticsearch-head`
available under `http://localhost:9200/_plugin/head/` and `bigdesk`
available under `http://localhost:9200/_plugin/bigdesk/`.
[float]
==== Mandatory Plugins
If you rely on some plugins, you can define mandatory plugins using the
`plugin.mandatory` attribute, for example, here is a sample config:
[source,js]
--------------------------------------------------
plugin.mandatory: mapper-attachments,lang-groovy
--------------------------------------------------
For safety reasons, if a mandatory plugin is not installed, the node
will not start.
[float]
==== Installed Plugins
A list of the currently loaded plugins can be retrieved using the
<<cluster-nodes-info,Nodes Info API>>.
[float]
==== Removing plugins
Removing plugins can either be done manually by removing them under the
`plugins` directory, or using the `plugin` script.
Removing plugins typically take the following form:
[source,sh]
-----------------------------------
plugin remove <pluginname>
-----------------------------------
[float]
==== Silent/Verbose mode
When running the `plugin` script, you can get more information (debug mode) using `--verbose`.
On the opposite, if you want `plugin` script to be silent, use `--silent` option.
Note that exit codes could be:
* `0`: everything was OK
* `64`: unknown command or incorrect option parameter
* `74`: IO error
* `70`: other errors
[source,sh]
-----------------------------------
bin/plugin install mobz/elasticsearch-head --verbose
plugin remove head --silent
-----------------------------------
[float]
==== Timeout settings
By default, the `plugin` script will wait indefinitely when downloading before failing.
The timeout parameter can be used to explicitly specify how long it waits. Here is some examples of setting it to
different values:
[source,sh]
-----------------------------------
# Wait for 30 seconds before failing
bin/plugin install mobz/elasticsearch-head --timeout 30s
# Wait for 1 minute before failing
bin/plugin install mobz/elasticsearch-head --timeout 1m
# Wait forever (default)
bin/plugin install mobz/elasticsearch-head --timeout 0
-----------------------------------
[float]
==== Proxy settings
To install a plugin via a proxy, you can pass the proxy details using the environment variables `proxyHost` and `proxyPort`.
[source,sh]
-----------------------------------
set JAVA_OPTS="-DproxyHost=host_name -DproxyPort=port_number"
bin/plugin install mobz/elasticsearch-head
-----------------------------------
[float]
==== Lucene version dependent plugins
For some plugins, such as analysis plugins, a specific major Lucene version is
required to run. In that case, the plugin provides in its `es-plugin.properties`
file the Lucene version for which the plugin was built for.
If present at startup the node will check the Lucene version before loading the plugin.
You can disable that check using `plugins.check_lucene: false`.
[float]
[[known-plugins]]
=== Known Plugins
[float]
[[analysis-plugins]]
==== Analysis Plugins
.Supported by Elasticsearch
* https://github.com/elasticsearch/elasticsearch-analysis-icu[ICU Analysis plugin]
* https://github.com/elasticsearch/elasticsearch-analysis-kuromoji[Japanese (Kuromoji) Analysis plugin].
* https://github.com/elasticsearch/elasticsearch-analysis-smartcn[Smart Chinese Analysis Plugin]
* https://github.com/elasticsearch/elasticsearch-analysis-stempel[Stempel (Polish) Analysis plugin]
.Supported by the community
* https://github.com/barminator/elasticsearch-analysis-annotation[Annotation Analysis Plugin] (by Michal Samek)
* https://github.com/yakaz/elasticsearch-analysis-combo/[Combo Analysis Plugin] (by Olivier Favre, Yakaz)
* https://github.com/jprante/elasticsearch-analysis-hunspell[Hunspell Analysis Plugin] (by Jörg Prante)
* https://github.com/medcl/elasticsearch-analysis-ik[IK Analysis Plugin] (by Medcl)
* https://github.com/suguru/elasticsearch-analysis-japanese[Japanese Analysis plugin] (by suguru).
* https://github.com/medcl/elasticsearch-analysis-mmseg[Mmseg Analysis Plugin] (by Medcl)
* https://github.com/chytreg/elasticsearch-analysis-morfologik[Morfologik (Polish) Analysis plugin] (by chytreg)
* https://github.com/imotov/elasticsearch-analysis-morphology[Russian and English Morphological Analysis Plugin] (by Igor Motov)
* https://github.com/synhershko/elasticsearch-analysis-hebrew[Hebrew Analysis Plugin] (by Itamar Syn-Hershko)
* https://github.com/medcl/elasticsearch-analysis-pinyin[Pinyin Analysis Plugin] (by Medcl)
* https://github.com/medcl/elasticsearch-analysis-string2int[String2Integer Analysis Plugin] (by Medcl)
* https://github.com/duydo/elasticsearch-analysis-vietnamese[Vietnamese Analysis Plugin] (by Duy Do)
[float]
[[discovery-plugins]]
==== Discovery Plugins
.Supported by Elasticsearch
* https://github.com/elasticsearch/elasticsearch-cloud-aws[AWS Cloud Plugin] - EC2 discovery and S3 Repository
* https://github.com/elasticsearch/elasticsearch-cloud-azure[Azure Cloud Plugin] - Azure discovery
* https://github.com/elasticsearch/elasticsearch-cloud-gce[Google Compute Engine Cloud Plugin] - GCE discovery
.Supported by the community
* https://github.com/shikhar/eskka[eskka Discovery Plugin] (by Shikhar Bhushan)
* https://github.com/grantr/elasticsearch-srv-discovery[DNS SRV Discovery Plugin] (by Grant Rodgers)
[float]
[[transport]]
==== Transport Plugins
.Supported by Elasticsearch
* https://github.com/elasticsearch/elasticsearch-transport-wares[Servlet transport]
.Supported by the community
* https://github.com/tlrx/transport-zeromq[ZeroMQ transport layer plugin] (by Tanguy Leroux)
* https://github.com/sonian/elasticsearch-jetty[Jetty HTTP transport plugin] (by Sonian Inc.)
* https://github.com/kzwang/elasticsearch-transport-redis[Redis transport plugin] (by Kevin Wang)
[float]
[[scripting]]
==== Scripting Plugins
.Supported by Elasticsearch
* https://github.com/elasticsearch/elasticsearch-lang-groovy[Groovy lang Plugin]
* https://github.com/elasticsearch/elasticsearch-lang-javascript[JavaScript language Plugin]
* https://github.com/elasticsearch/elasticsearch-lang-python[Python language Plugin]
.Supported by the community
* https://github.com/hiredman/elasticsearch-lang-clojure[Clojure Language Plugin] (by Kevin Downey)
* https://github.com/NLPchina/elasticsearch-sql/[SQL language Plugin] (by nlpcn)
[float]
[[site]]
==== Site Plugins
.Supported by the community
* https://github.com/lukas-vlcek/bigdesk[BigDesk Plugin] (by Lukáš Vlček)
* https://github.com/mobz/elasticsearch-head[Elasticsearch Head Plugin] (by Ben Birch)
* https://github.com/royrusso/elasticsearch-HQ[Elasticsearch HQ] (by Roy Russo)
* https://github.com/andrewvc/elastic-hammer[Hammer Plugin] (by Andrew Cholakian)
* https://github.com/polyfractal/elasticsearch-inquisitor[Inquisitor Plugin] (by Zachary Tong)
* https://github.com/karmi/elasticsearch-paramedic[Paramedic Plugin] (by Karel Minařík)
* https://github.com/polyfractal/elasticsearch-segmentspy[SegmentSpy Plugin] (by Zachary Tong)
* https://github.com/xyu/elasticsearch-whatson[Whatson Plugin] (by Xiao Yu)
* https://github.com/lmenezes/elasticsearch-kopf[Kopf Plugin] (by lmenezes)
[float]
[[repository-plugins]]
==== Snapshot/Restore Repository Plugins
.Supported by Elasticsearch
* https://github.com/elasticsearch/elasticsearch-hadoop/tree/master/repository-hdfs[Hadoop HDFS] Repository
* https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository[AWS S3] Repository
.Supported by the community
* https://github.com/kzwang/elasticsearch-repository-gridfs[GridFS] Repository (by Kevin Wang)
* https://github.com/wikimedia/search-repository-swift[Openstack Swift]
[float]
[[misc]]
==== Misc Plugins
.Supported by Elasticsearch
* https://github.com/elasticsearch/elasticsearch-mapper-attachments[Mapper Attachments Type plugin]
.Supported by the community
* https://github.com/carrot2/elasticsearch-carrot2[carrot2 Plugin]: Results clustering with carrot2 (by Dawid Weiss)
* https://github.com/derryx/elasticsearch-changes-plugin[Elasticsearch Changes Plugin] (by Thomas Peuss)
* https://github.com/johtani/elasticsearch-extended-analyze[Extended Analyze Plugin] (by Jun Ohtani)
* https://github.com/YannBrrd/elasticsearch-entity-resolution[Entity Resolution Plugin] using http://github.com/larsga/Duke[Duke] for duplication detection (by Yann Barraud)
* https://github.com/spinscale/elasticsearch-graphite-plugin[Elasticsearch Graphite Plugin] (by Alexander Reelsen)
* https://github.com/mattweber/elasticsearch-mocksolrplugin[Elasticsearch Mock Solr Plugin] (by Matt Weber)
* https://github.com/viniciusccarvalho/elasticsearch-newrelic[Elasticsearch New Relic Plugin] (by Vinicius Carvalho)
* https://github.com/swoop-inc/elasticsearch-statsd-plugin[Elasticsearch Statsd Plugin] (by Swoop Inc.)
* https://github.com/endgameinc/elasticsearch-term-plugin[Terms Component Plugin] (by Endgame Inc.)
* http://tlrx.github.com/elasticsearch-view-plugin[Elasticsearch View Plugin] (by Tanguy Leroux)
* https://github.com/sonian/elasticsearch-zookeeper[ZooKeeper Discovery Plugin] (by Sonian Inc.)
* https://github.com/kzwang/elasticsearch-image[Elasticsearch Image Plugin] (by Kevin Wang)
* https://github.com/wikimedia/search-highlighter[Elasticsearch Experimental Highlighter] (by Wikimedia Foundation/Nik Everett)
* https://github.com/wikimedia/search-extra[Elasticsearch Trigram Accelerated Regular Expression Filter] (by Wikimedia Foundation/Nik Everett)
* https://github.com/salyh/elasticsearch-security-plugin[Elasticsearch Security Plugin] (by Hendrik Saly)
* https://github.com/codelibs/elasticsearch-taste[Elasticsearch Taste Plugin] (by CodeLibs Project)
* http://siren.solutions/siren/downloads/[Elasticsearch SIREn Plugin]: Nested data search (by SIREn Solutions)
See the {plugins}/index.html[Plugins documentation] for more.

View File

@ -1,290 +0,0 @@
ICU Analysis for Elasticsearch
==================================
The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components.
In order to install the plugin, simply run:
```sh
bin/plugin install elasticsearch/elasticsearch-analysis-icu/2.5.0
```
You need to install a version matching your Elasticsearch version:
| elasticsearch | ICU Analysis Plugin | Docs |
|---------------|-----------------------|------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elastic/elasticsearch-analysis-icu/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-analysis-icu/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.3 | [2.4.3](https://github.com/elasticsearch/elasticsearch-analysis-icu/tree/v2.4.3/#version-243-for-elasticsearch-14) |
| < 1.4.5 | 2.4.2 | [2.4.2](https://github.com/elastic/elasticsearch-analysis-icu/tree/v2.4.2/#version-242-for-elasticsearch-14) |
| < 1.4.3 | 2.4.1 | [2.4.1](https://github.com/elastic/elasticsearch-analysis-icu/tree/v2.4.1/#version-241-for-elasticsearch-14) |
| es-1.3 | 2.3.0 | [2.3.0](https://github.com/elastic/elasticsearch-analysis-icu/tree/v2.3.0/#icu-analysis-for-elasticsearch) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elastic/elasticsearch-analysis-icu/tree/v2.2.0/#icu-analysis-for-elasticsearch) |
| es-1.1 | 2.1.0 | [2.1.0](https://github.com/elastic/elasticsearch-analysis-icu/tree/v2.1.0/#icu-analysis-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elastic/elasticsearch-analysis-icu/tree/v2.0.0/#icu-analysis-for-elasticsearch) |
| es-0.90 | 1.13.0 | [1.13.0](https://github.com/elastic/elasticsearch-analysis-icu/tree/v1.13.0/#icu-analysis-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install analysis-icu \
--url file:target/releases/elasticsearch-analysis-icu-X.X.X-SNAPSHOT.zip
```
ICU Normalization
-----------------
Normalizes characters as explained [here](http://userguide.icu-project.org/transforms/normalization). It registers itself by default under `icu_normalizer` or `icuNormalizer` using the default settings. Allows for the name parameter to be provided which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`. Here is a sample settings:
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"normalized" : {
"tokenizer" : "keyword",
"filter" : ["icu_normalizer"]
}
}
}
}
}
```
ICU Folding
-----------
Folding of unicode characters based on `UTR#30`. It registers itself under `icu_folding` and `icuFolding` names. Sample setting:
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"folded" : {
"tokenizer" : "keyword",
"filter" : ["icu_folding"]
}
}
}
}
}
```
ICU Filtering
-------------
The folding can be filtered by a set of unicode characters with the parameter `unicodeSetFilter`. This is useful for a
non-internationalized search engine where retaining a set of national characters which are primary letters in a specific
language is wanted. See syntax for the UnicodeSet [here](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html).
The Following example exempts Swedish characters from the folding. Note that the filtered characters are NOT lowercased which is why we add that filter below.
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"folding" : {
"tokenizer" : "standard",
"filter" : ["my_icu_folding", "lowercase"]
}
}
"filter" : {
"my_icu_folding" : {
"type" : "icu_folding"
"unicodeSetFilter" : "[^åäöÅÄÖ]"
}
}
}
}
}
```
ICU Collation
-------------
Uses collation token filter. Allows to either specify the rules for collation
(defined [here](http://www.icu-project.org/userguide/Collate_Customization.html)) using the `rules` parameter
(can point to a location or expressed in the settings, location can be relative to config location), or using the
`language` parameter (further specialized by country and variant). By default registers under `icu_collation` or
`icuCollation` and uses the default locale.
Here is a sample settings:
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"collation" : {
"tokenizer" : "keyword",
"filter" : ["icu_collation"]
}
}
}
}
}
```
And here is a sample of custom collation:
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"collation" : {
"tokenizer" : "keyword",
"filter" : ["myCollator"]
}
},
"filter" : {
"myCollator" : {
"type" : "icu_collation",
"language" : "en"
}
}
}
}
}
```
Optional options:
* `strength` - The strength property determines the minimum level of difference considered significant during comparison.
The default strength for the Collator is `tertiary`, unless specified otherwise by the locale used to create the Collator.
Possible values: `primary`, `secondary`, `tertiary`, `quaternary` or `identical`.
See [ICU Collation](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Collator.html) documentation for a more detailed
explanation for the specific values.
* `decomposition` - Possible values: `no` or `canonical`. Defaults to `no`. Setting this decomposition property with
`canonical` allows the Collator to handle un-normalized text properly, producing the same results as if the text were
normalized. If `no` is set, it is the user's responsibility to insure that all text is already in the appropriate form
before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between
faster and more complete collation behavior. Since a great many of the world's languages do not require text
normalization, most locales set `no` as the default decomposition mode.
Expert options:
* `alternate` - Possible values: `shifted` or `non-ignorable`. Sets the alternate handling for strength `quaternary`
to be either shifted or non-ignorable. What boils down to ignoring punctuation and whitespace.
* `caseLevel` - Possible values: `true` or `false`. Default is `false`. Whether case level sorting is required. When
strength is set to `primary` this will ignore accent differences.
* `caseFirst` - Possible values: `lower` or `upper`. Useful to control which case is sorted first when case is not ignored
for strength `tertiary`.
* `numeric` - Possible values: `true` or `false`. Whether digits are sorted according to numeric representation. For
example the value `egg-9` is sorted before the value `egg-21`. Defaults to `false`.
* `variableTop` - Single character or contraction. Controls what is variable for `alternate`.
* `hiraganaQuaternaryMode` - Possible values: `true` or `false`. Defaults to `false`. Distinguishing between Katakana
and Hiragana characters in `quaternary` strength .
ICU Tokenizer
-------------
Breaks text into words according to [UAX #29: Unicode Text Segmentation](http://www.unicode.org/reports/tr29/).
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"tokenized" : {
"tokenizer" : "icu_tokenizer",
}
}
}
}
}
```
ICU Normalization CharFilter
-----------------
Normalizes characters as explained [here](http://userguide.icu-project.org/transforms/normalization).
It registers itself by default under `icu_normalizer` or `icuNormalizer` using the default settings.
Allows for the name parameter to be provided which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`.
Allows for the mode parameter to be provided which can include the following values: `compose` and `decompose`.
Use `decompose` with `nfc` or `nfkc`, to get `nfd` or `nfkd`, respectively.
Here is a sample settings:
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"normalized" : {
"tokenizer" : "keyword",
"char_filter" : ["icu_normalizer"]
}
}
}
}
}
```
ICU Transform
-------------
Transforms are used to process Unicode text in many different ways. Some include case mapping, normalization,
transliteration and bidirectional text handling.
You can defined transliterator identifiers by using `id` property, and specify direction to `forward` or `reverse` by
using `dir` property, The default value of both properties are `Null` and `forward`.
For example:
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"latin" : {
"tokenizer" : "keyword",
"filter" : ["myLatinTransform"]
}
},
"filter" : {
"myLatinTransform" : {
"type" : "icu_transform",
"id" : "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC"
}
}
}
}
}
```
This transform transliterated characters to latin, and separates accents from their base characters, removes the accents,
and then puts the remaining text into an unaccented form.
The results are:
`你好` to `ni hao`
`здравствуйте` to `zdravstvujte`
`こんにちは` to `kon'nichiha`
Currently the filter only supports identifier and direction, custom rulesets are not yet supported.
For more documentation, Please see the [user guide of ICU Transform](http://userguide.icu-project.org/transforms/general).
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,552 +0,0 @@
Japanese (kuromoji) Analysis for Elasticsearch
==================================
The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.
In order to install the plugin, run:
```sh
bin/plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.5.0
```
You need to install a version matching your Elasticsearch version:
| elasticsearch | Kuromoji Analysis Plugin | Docs |
|---------------|-----------------------------|------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-analysis-kuromoji/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.3 | [2.4.3](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.3/#version-243-for-elasticsearch-14) |
| < 1.4.5 | 2.4.2 | [2.4.2](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.2/#version-242-for-elasticsearch-14) |
| < 1.4.3 | 2.4.1 | [2.4.1](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.1/#version-241-for-elasticsearch-14) |
| es-1.3 | 2.3.0 | [2.3.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.3.0/#japanese-kuromoji-analysis-for-elasticsearch) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.2.0/#japanese-kuromoji-analysis-for-elasticsearch) |
| es-1.1 | 2.1.0 | [2.1.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.1.0/#japanese-kuromoji-analysis-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.0.0/#japanese-kuromoji-analysis-for-elasticsearch) |
| es-0.90 | 1.8.0 | [1.8.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v1.8.0/#japanese-kuromoji-analysis-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install analysis-kuromoji \
--url file:target/releases/elasticsearch-analysis-kuromoji-X.X.X-SNAPSHOT.zip
```
Includes Analyzer, Tokenizer, TokenFilter, CharFilter
-----------------------------------------------
The plugin includes these analyzer and tokenizer, tokenfilter.
| name | type |
|-------------------------|-------------|
| kuromoji_iteration_mark | charfilter |
| kuromoji | analyzer |
| kuromoji_tokenizer | tokenizer |
| kuromoji_baseform | tokenfilter |
| kuromoji_part_of_speech | tokenfilter |
| kuromoji_readingform | tokenfilter |
| kuromoji_stemmer | tokenfilter |
| ja_stop | tokenfilter |
Usage
-----
## Analyzer : kuromoji
An analyzer of type `kuromoji`.
This analyzer is the following tokenizer and tokenfilter combination.
* `kuromoji_tokenizer` : Kuromoji Tokenizer
* `kuromoji_baseform` : Kuromoji BasicFormFilter (TokenFilter)
* `kuromoji_part_of_speech` : Kuromoji Part of Speech Stop Filter (TokenFilter)
* `cjk_width` : CJK Width Filter (TokenFilter)
* `stop` : Stop Filter (TokenFilter)
* `kuromoji_stemmer` : Kuromoji Katakana Stemmer Filter(TokenFilter)
* `lowercase` : LowerCase Filter (TokenFilter)
## CharFilter : kuromoji_iteration_mark
A charfilter of type `kuromoji_iteration_mark`.
This charfilter is Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
The following ar setting that can be set for a `kuromoji_iteration_mark` charfilter type:
| **Setting** | **Description** | **Default value** |
|:----------------|:-------------------------------------------------------------|:------------------|
| normalize_kanji | indicates whether kanji iteration marks should be normalized | `true` |
| normalize_kana | indicates whether kanji iteration marks should be normalized | `true` |
## Tokenizer : kuromoji_tokenizer
A tokenizer of type `kuromoji_tokenizer`.
The following are settings that can be set for a `kuromoji_tokenizer` tokenizer type:
| **Setting** | **Description** | **Default value** |
|:--------------------|:--------------------------------------------------------------------------------------------------------------------------|:------------------|
| mode | Tokenization mode: this determines how the tokenizer handles compound and unknown words. `normal` and `search`, `extended`| `search` |
| discard_punctuation | `true` if punctuation tokens should be dropped from the output. | `true` |
| user_dictionary | set User Dictionary file | |
### Tokenization mode
The mode is three types.
* `normal` : Ordinary segmentation: no decomposition for compounds
* `search` : Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.
* `extended` : Extended mode outputs unigrams for unknown words.
#### Difference tokenization mode outputs
Input text is `関西国際空港` and `アブラカダブラ`.
| **mode** | `関西国際空港` | `アブラカダブラ` |
|:-----------|:-------------|:-------|
| `normal` | `関西国際空港` | `アブラカダブラ` |
| `search` | `関西` `関西国際空港` `国際` `空港` | `アブラカダブラ` |
| `extended` | `関西` `国際` `空港` | `ア` `ブ` `ラ` `カ` `ダ` `ブ` `ラ` |
### User Dictionary
Kuromoji tokenizer use MeCab-IPADIC dictionary by default.
And Kuromoji is added an entry of dictionary to define by user; this is User Dictionary.
User Dictionary entries are defined using the following CSV format:
```
<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
```
Dictionary Example
```
東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞
```
To use User Dictionary set file path to `user_dict` attribute.
User Dictionary file is placed `ES_HOME/config` directory.
### example
_Example Settings:_
```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"settings": {
"index":{
"analysis":{
"tokenizer" : {
"kuromoji_user_dict" : {
"type" : "kuromoji_tokenizer",
"mode" : "extended",
"discard_punctuation" : "false",
"user_dictionary" : "userdict_ja.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "kuromoji_user_dict"
}
}
}
}
}
}
'
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '東京スカイツリー'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "東京",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "スカイツリー",
"start_offset" : 2,
"end_offset" : 8,
"type" : "word",
"position" : 2
} ]
}
```
## TokenFilter : kuromoji_baseform
A token filter of type `kuromoji_baseform` that replaces term text with BaseFormAttribute.
This acts as a lemmatizer for verbs and adjectives.
### example
_Example Settings:_
```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"settings": {
"index":{
"analysis":{
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["kuromoji_baseform"]
}
}
}
}
}
}
'
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '飲み'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "飲む",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
} ]
}
```
## TokenFilter : kuromoji_part_of_speech
A token filter of type `kuromoji_part_of_speech` that removes tokens that match a set of part-of-speech tags.
The following are settings that can be set for a stop token filter type:
| **Setting** | **Description** |
|:------------|:-----------------------------------------------------|
| stoptags | A list of part-of-speech tags that should be removed |
Note that default setting is stoptags.txt include lucene-analyzer-kuromoji.jar.
### example
_Example Settings:_
```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"settings": {
"index":{
"analysis":{
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["my_posfilter"]
}
},
"filter" : {
"my_posfilter" : {
"type" : "kuromoji_part_of_speech",
"stoptags" : [
"助詞-格助詞-一般",
"助詞-終助詞"
]
}
}
}
}
}
}
'
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '寿司がおいしいね'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "寿司",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "おいしい",
"start_offset" : 3,
"end_offset" : 7,
"type" : "word",
"position" : 3
} ]
}
```
## TokenFilter : kuromoji_readingform
A token filter of type `kuromoji_readingform` that replaces the term attribute with the reading of a token in either katakana or romaji form.
The default reading form is katakana.
The following are settings that can be set for a `kuromoji_readingform` token filter type:
| **Setting** | **Description** | **Default value** |
|:------------|:----------------------------------------------------------|:------------------|
| use_romaji | `true` if romaji reading form output instead of katakana. | `false` |
Note that elasticsearch-analysis-kuromoji built-in `kuromoji_readingform` set default `true` to `use_romaji` attribute.
### example
_Example Settings:_
```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"settings": {
"index":{
"analysis":{
"analyzer" : {
"romaji_analyzer" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["romaji_readingform"]
},
"katakana_analyzer" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["katakana_readingform"]
}
},
"filter" : {
"romaji_readingform" : {
"type" : "kuromoji_readingform",
"use_romaji" : true
},
"katakana_readingform" : {
"type" : "kuromoji_readingform",
"use_romaji" : false
}
}
}
}
}
}
'
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&pretty' -d '寿司'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "スシ",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
} ]
}
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=romaji_analyzer&pretty' -d '寿司'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "sushi",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
} ]
}
```
## TokenFilter : kuromoji_stemmer
A token filter of type `kuromoji_stemmer` that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).
Only katakana words longer than a minimum length are stemmed (default is four).
Note that only full-width katakana characters are supported.
The following are settings that can be set for a `kuromoji_stemmer` token filter type:
| **Setting** | **Description** | **Default value** |
|:----------------|:---------------------------|:------------------|
| minimum_length | The minimum length to stem | `4` |
### example
_Example Settings:_
```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"settings": {
"index":{
"analysis":{
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["my_katakana_stemmer"]
}
},
"filter" : {
"my_katakana_stemmer" : {
"type" : "kuromoji_stemmer",
"minimum_length" : 4
}
}
}
}
}
}
'
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'コピー'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "コピー",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
} ]
}
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'サーバー'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "サーバ",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
} ]
}
```
## TokenFilter : ja_stop
A token filter of type `ja_stop` that provide a predefined "_japanese_" stop words.
*Note: It is only provide "_japanese_". If you want to use other predefined stop words, you can use `stop` token filter.*
_Example Settings:_
### example
```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
"settings": {
"index":{
"analysis":{
"analyzer" : {
"analyzer_with_ja_stop" : {
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["ja_stop"]
}
},
"filter" : {
"ja_stop" : {
"type" : "ja_stop",
"stopwords" : ["_japanese_", "ストップ"]
}
}
}
}
}
}'
```
_Example Request using `_analyze` API :_
```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&pretty' -d 'ストップは消える'
```
_Response :_
```json
{
"tokens" : [ {
"token" : "消える",
"start_offset" : 5,
"end_offset" : 8,
"type" : "word",
"position" : 3
} ]
}
```
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,93 +0,0 @@
Phonetic Analysis for Elasticsearch
===================================
The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch.
In order to install the plugin, simply run:
```sh
bin/plugin install elasticsearch/elasticsearch-analysis-phonetic/2.5.0
```
| elasticsearch |Phonetic Analysis Plugin| Docs |
|---------------|-----------------------|------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.3 | [2.4.3](https://github.com/elasticsearch/elasticsearch-analysis-phonetic/tree/v2.4.3/#version-243-for-elasticsearch-14) |
| < 1.4.5 | 2.4.2 | [2.4.2](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v2.4.2/#version-242-for-elasticsearch-14) |
| < 1.4.3 | 2.4.1 | [2.4.1](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v2.4.1/#version-241-for-elasticsearch-14) |
| es-1.3 | 2.3.0 | [2.3.0](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v2.3.0/#phonetic-analysis-for-elasticsearch) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v2.2.0/#phonetic-analysis-for-elasticsearch) |
| es-1.1 | 2.1.0 | [2.1.0](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v2.1.0/#phonetic-analysis-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v2.0.0/#phonetic-analysis-for-elasticsearch) |
| es-0.90 | 1.8.0 | [1.8.0](https://github.com/elastic/elasticsearch-analysis-phonetic/tree/v1.8.0/#phonetic-analysis-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install analysis-phonetic \
--url file:target/releases/elasticsearch-analysis-phonetic-X.X.X-SNAPSHOT.zip
```
## User guide
A `phonetic` token filter that can be configured with different `encoder` types:
`metaphone`, `doublemetaphone`, `soundex`, `refinedsoundex`,
`caverphone1`, `caverphone2`, `cologne`, `nysiis`,
`koelnerphonetik`, `haasephonetik`, `beidermorse`
The `replace` parameter (defaults to `true`) controls if the token processed
should be replaced with the encoded one (set it to `true`), or added (set it to `false`).
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_metaphone"]
}
},
"filter" : {
"my_metaphone" : {
"type" : "phonetic",
"encoder" : "metaphone",
"replace" : false
}
}
}
}
}
```
Note that `beidermorse` does not support `replace` parameter.
Questions
---------
If you have questions or comments please use the [mailing list](https://groups.google.com/group/elasticsearch) instead
of Github Issues tracker.
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,58 +0,0 @@
Smart Chinese Analysis for Elasticsearch
==================================
The Smart Chinese Analysis plugin integrates Lucene Smart Chinese analysis module into elasticsearch.
In order to install the plugin, simply run:
```sh
bin/plugin install elasticsearch/elasticsearch-analysis-smartcn/2.5.0
```
| elasticsearch | Smart Chinese Analysis Plugin | Docs |
|---------------|-----------------------|------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.4 | [2.4.4](https://github.com/elasticsearch/elasticsearch-analysis-smartcn/tree/v2.4.4/#version-244-for-elasticsearch-14) |
| < 1.4.5 | 2.4.3 | [2.4.3](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v2.4.3/#version-243-for-elasticsearch-14) |
| < 1.4.3 | 2.4.2 | [2.4.2](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v2.4.2/#version-242-for-elasticsearch-14) |
| es-1.3 | 2.3.1 | [2.3.1](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v2.3.1/#version-231-for-elasticsearch-13) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v2.2.0/#smart-chinese-analysis-for-elasticsearch) |
| es-1.1 | 2.1.0 | [2.1.0](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v2.1.0/#smart-chinese-analysis-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v2.0.0/#smart-chinese-analysis-for-elasticsearch) |
| es-0.90 | 1.8.0 | [1.8.0](https://github.com/elastic/elasticsearch-analysis-smartcn/tree/v1.8.0/#smart-chinese-analysis-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install analysis-smartcn \
--url file:target/releases/elasticsearch-analysis-smartcn-X.X.X-SNAPSHOT.zip
```
## User guide
The plugin includes the `smartcn` analyzer and `smartcn_tokenizer` tokenizer.
Note that `smartcn_word` token filter and `smartcn_sentence` have been deprecated.
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,56 +0,0 @@
Stempel (Polish) Analysis for Elasticsearch
==================================
The Stempel (Polish) Analysis plugin integrates Lucene stempel (polish) analysis module into elasticsearch.
In order to install the plugin, simply run:
```sh
bin/plugin install elasticsearch/elasticsearch-analysis-stempel/2.4.3
```
| elasticsearch | Stempel Analysis Plugin | Docs |
|---------------|-----------------------|------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elastic/elasticsearch-analysis-stempel/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.3 | [2.4.3](https://github.com/elasticsearch/elasticsearch-analysis-stempel/tree/v2.4.3/#version-243-for-elasticsearch-14) |
| < 1.4.5 | 2.4.2 | [2.4.2](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v2.4.2/#version-242-for-elasticsearch-14) |
| < 1.4.3 | 2.4.1 | [2.4.1](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v2.4.1/#version-241-for-elasticsearch-14) |
| es-1.3 | 2.3.0 | [2.3.0](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v2.3.0/#stempel-polish-analysis-for-elasticsearch) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v2.2.0/#stempel-polish-analysis-for-elasticsearch) |
| es-1.1 | 2.1.0 | [2.1.0](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v2.1.0/#stempel-polish-analysis-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v2.0.0/#stempel-polish-analysis-for-elasticsearch) |
| es-0.90 | 1.13.0 | [1.13.0](https://github.com/elastic/elasticsearch-analysis-stempel/tree/v1.13.0/#stempel-polish-analysis-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install analysis-stempel \
--url file:target/releases/elasticsearch-analysis-stempel-X.X.X-SNAPSHOT.zip
```
Stempel Plugin
-----------------
The plugin includes the `polish` analyzer and `polish_stem` token filter.
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,361 +0,0 @@
AWS Cloud Plugin for Elasticsearch
==================================
The Amazon Web Service (AWS) Cloud plugin allows to use [AWS API](https://github.com/aws/aws-sdk-java)
for the unicast discovery mechanism and add S3 repositories.
In order to install the plugin, run:
```sh
bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.5.1
```
You need to install a version matching your Elasticsearch version:
| Elasticsearch | AWS Cloud Plugin | Docs |
|------------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.1 | [2.5.1](https://github.com/elastic/elasticsearch-cloud-aws/tree/v2.5.1/#version-251-for-elasticsearch-15) |
| es-1.4 | 2.4.2 | [2.4.2](https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/v2.4.2/#version-242-for-elasticsearch-14) |
| es-1.3 | 2.3.0 | [2.3.0](https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/v2.3.0/#version-230-for-elasticsearch-13) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/v2.2.0/#aws-cloud-plugin-for-elasticsearch) |
| es-1.1 | 2.1.1 | [2.1.1](https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/v2.1.1/#aws-cloud-plugin-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/v2.0.0/#aws-cloud-plugin-for-elasticsearch) |
| es-0.90 | 1.16.0 | [1.16.0](https://github.com/elasticsearch/elasticsearch-cloud-aws/tree/v1.16.0/#aws-cloud-plugin-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install cloud-aws \
--url file:target/releases/elasticsearch-cloud-aws-X.X.X-SNAPSHOT.zip
```
## Generic Configuration
The plugin will default to using [IAM Role](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html) credentials
for authentication. These can be overridden by, in increasing order of precedence, system properties `aws.accessKeyId` and `aws.secretKey`,
environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_KEY`, or the elasticsearch config using `cloud.aws.access_key` and `cloud.aws.secret_key`:
```
cloud:
aws:
access_key: AKVAIQBF2RECL7FJWGJQ
secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
```
### Transport security
By default this plugin uses HTTPS for all API calls to AWS endpoints. If you wish to configure HTTP you can set
`cloud.aws.protocol` in the elasticsearch config. You can optionally override this setting per individual service
via: `cloud.aws.ec2.protocol` or `cloud.aws.s3.protocol`.
```
cloud:
aws:
protocol: https
s3:
protocol: http
ec2:
protocol: https
```
In addition, a proxy can be configured with the `proxy_host` and `proxy_port` settings (note that protocol can be `http` or `https`):
```
cloud:
aws:
protocol: https
proxy_host: proxy1.company.com
proxy_port: 8083
```
You can also set different proxies for `ec2` and `s3`:
```
cloud:
aws:
s3:
proxy_host: proxy1.company.com
proxy_port: 8083
ec2:
proxy_host: proxy2.company.com
proxy_port: 8083
```
### Region
The `cloud.aws.region` can be set to a region and will automatically use the relevant settings for both `ec2` and `s3`. The available values are:
* `us-east` (`us-east-1`)
* `us-west` (`us-west-1`)
* `us-west-1`
* `us-west-2`
* `ap-southeast` (`ap-southeast-1`)
* `ap-southeast-1`
* `ap-southeast-2`
* `ap-northeast` (`ap-northeast-1`)
* `eu-west` (`eu-west-1`)
* `eu-central` (`eu-central-1`)
* `sa-east` (`sa-east-1`)
* `cn-north` (`cn-north-1`)
### EC2/S3 Signer API
If you are using a compatible EC2 or S3 service, they might be using an older API to sign the requests.
You can set your compatible signer API using `cloud.aws.signer` (or `cloud.aws.ec2.signer` and `cloud.aws.s3.signer`)
with the right signer to use. Defaults to `AWS4SignerType`.
## EC2 Discovery
ec2 discovery allows to use the ec2 APIs to perform automatic discovery (similar to multicast in non hostile multicast environments). Here is a simple sample configuration:
```
discovery:
type: ec2
```
The ec2 discovery is using the same credentials as the rest of the AWS services provided by this plugin (`repositories`).
See [Generic Configuration](#generic-configuration) for details.
The following are a list of settings (prefixed with `discovery.ec2`) that can further control the discovery:
* `groups`: Either a comma separated list or array based list of (security) groups. Only instances with the provided security groups will be used in the cluster discovery. (NOTE: You could provide either group NAME or group ID.)
* `host_type`: The type of host type to use to communicate with other instances. Can be one of `private_ip`, `public_ip`, `private_dns`, `public_dns`. Defaults to `private_ip`.
* `availability_zones`: Either a comma separated list or array based list of availability zones. Only instances within the provided availability zones will be used in the cluster discovery.
* `any_group`: If set to `false`, will require all security groups to be present for the instance to be used for the discovery. Defaults to `true`.
* `ping_timeout`: How long to wait for existing EC2 nodes to reply during discovery. Defaults to `3s`. If no unit like `ms`, `s` or `m` is specified, milliseconds are used.
### Recommended EC2 Permissions
EC2 discovery requires making a call to the EC2 service. You'll want to setup an IAM policy to allow this. You can create a custom policy via the IAM Management Console. It should look similar to this.
```js
{
"Statement": [
{
"Action": [
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": [
"*"
]
}
],
"Version": "2012-10-17"
}
```
### Filtering by Tags
The ec2 discovery can also filter machines to include in the cluster based on tags (and not just groups). The settings to use include the `discovery.ec2.tag.` prefix. For example, setting `discovery.ec2.tag.stage` to `dev` will only filter instances with a tag key set to `stage`, and a value of `dev`. Several tags set will require all of those tags to be set for the instance to be included.
One practical use for tag filtering is when an ec2 cluster contains many nodes that are not running elasticsearch. In this case (particularly with high `ping_timeout` values) there is a risk that a new node's discovery phase will end before it has found the cluster (which will result in it declaring itself master of a new cluster with the same name - highly undesirable). Tagging elasticsearch ec2 nodes and then filtering by that tag will resolve this issue.
### Automatic Node Attributes
Though not dependent on actually using `ec2` as discovery (but still requires the cloud aws plugin installed), the plugin can automatically add node attributes relating to ec2 (for example, availability zone, that can be used with the awareness allocation feature). In order to enable it, set `cloud.node.auto_attributes` to `true` in the settings.
### Using other EC2 endpoint
If you are using any EC2 api compatible service, you can set the endpoint you want to use by setting `cloud.aws.ec2.endpoint`
to your URL provider.
## S3 Repository
The S3 repository is using S3 to store snapshots. The S3 repository can be created using the following command:
```sh
$ curl -XPUT 'http://localhost:9200/_snapshot/my_s3_repository' -d '{
"type": "s3",
"settings": {
"bucket": "my_bucket_name",
"region": "us-west"
}
}'
```
The following settings are supported:
* `bucket`: The name of the bucket to be used for snapshots. (Mandatory)
* `region`: The region where bucket is located. Defaults to US Standard
* `endpoint`: The endpoint to the S3 API. Defaults to AWS's default S3 endpoint. Note that setting a region overrides the endpoint setting.
* `protocol`: The protocol to use (`http` or `https`). Defaults to value of `cloud.aws.protocol` or `cloud.aws.s3.protocol`.
* `base_path`: Specifies the path within bucket to repository data. Defaults to value of `repositories.s3.base_path` or to root directory if not set.
* `access_key`: The access key to use for authentication. Defaults to value of `cloud.aws.access_key`.
* `secret_key`: The secret key to use for authentication. Defaults to value of `cloud.aws.secret_key`.
* `chunk_size`: Big files can be broken down into chunks during snapshotting if needed. The chunk size can be specified in bytes or by using size value notation, i.e. `1g`, `10m`, `5k`. Defaults to `100m`.
* `compress`: When set to `true` metadata files are stored in compressed format. This setting doesn't affect index files that are already compressed by default. Defaults to `false`.
* `server_side_encryption`: When set to `true` files are encrypted on server side using AES256 algorithm. Defaults to `false`.
* `buffer_size`: Minimum threshold below which the chunk is uploaded using a single request. Beyond this threshold, the S3 repository will use the [AWS Multipart Upload API](http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html) to split the chunk into several parts, each of `buffer_size` length, and to upload each part in its own request. Note that positionning a buffer size lower than `5mb` is not allowed since it will prevents the use of the Multipart API and may result in upload errors. Defaults to `5mb`.
* `max_retries`: Number of retries in case of S3 errors. Defaults to `3`.
The S3 repositories are using the same credentials as the rest of the AWS services provided by this plugin (`discovery`).
See [Generic Configuration](#generic-configuration) for details.
Multiple S3 repositories can be created. If the buckets require different credentials, then define them as part of the repository settings.
### Recommended S3 Permissions
In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy, and using a Policy Document similar to this (changing snaps.example.com to your bucket name).
```js
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com/*"
]
}
],
"Version": "2012-10-17"
}
```
You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".
```js
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Condition": {
"StringLike": {
"s3:prefix": [
"foo/*"
]
}
},
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com/foo/*"
]
}
],
"Version": "2012-10-17"
}
```
The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository registration will fail. If you want elasticsearch to create the bucket instead, you can add the permission to create a specific bucket like this:
```js
{
"Action": [
"s3:CreateBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
}
```
### Using other S3 endpoint
If you are using any S3 api compatible service, you can set a global endpoint by setting `cloud.aws.s3.endpoint`
to your URL provider. Note that this setting will be used for all S3 repositories.
Different `endpoint`, `region` and `protocol` settings can be set on a per-repository basis (see [S3 Repository](#s3-repository) section for detail).
## Testing
Integrations tests in this plugin require working AWS configuration and therefore disabled by default. Three buckets and two iam users have to be created. The first iam user needs access to two buckets in different regions and the final bucket is exclusive for the other iam user. To enable tests prepare a config file elasticsearch.yml with the following content:
```
cloud:
aws:
access_key: AKVAIQBF2RECL7FJWGJQ
secret_key: vExyMThREXeRMm/b/LRzEB8jWwvzQeXgjqMX+6br
repositories:
s3:
bucket: "bucket_name"
region: "us-west-2"
private-bucket:
bucket: <bucket not accessible by default key>
access_key: <access key>
secret_key: <secret key>
remote-bucket:
bucket: <bucket in other region>
region: <region>
external-bucket:
bucket: <bucket>
access_key: <access key>
secret_key: <secret key>
endpoint: <endpoint>
protocol: <protocol>
```
Replace all occurrences of `access_key`, `secret_key`, `endpoint`, `protocol`, `bucket` and `region` with your settings. Please, note that the test will delete all snapshot/restore related files in the specified buckets.
To run test:
```sh
mvn -Dtests.aws=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test
```
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,14 +0,0 @@
Elasticsearch
Copyright 2009-2015 Elasticsearch
This product includes software developed by The Apache Software
Foundation (http://www.apache.org/).
activation-*.jar, javax.inject-*.jar, and jaxb-*.jar are under the CDDL license,
the original source code for these can be found at http://www.oracle.com/.
jersey-*.jar are under the CDDL license, the original source code for these
can be found at https://jersey.java.net/.
The LICENSE and NOTICE files for all dependencies may be found in the licenses/
directory.

View File

@ -1,568 +0,0 @@
Azure Cloud Plugin for Elasticsearch
====================================
The Azure Cloud plugin allows to use Azure API for the unicast discovery mechanism.
In order to install the plugin, run:
```sh
bin/plugin install elasticsearch/elasticsearch-cloud-azure/2.6.1
```
You need to install a version matching your Elasticsearch version:
| Elasticsearch | Azure Cloud Plugin| Docs |
|------------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.7.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-cloud-azure/tree/es-1.x/#version-270-snapshot-for-elasticsearch-1x)|
| es-1.5 | 2.6.1 | [2.6.1](https://github.com/elastic/elasticsearch-cloud-azure/tree/v2.6.1/#version-261-for-elasticsearch-15) |
| es-1.4 | 2.5.2 | [2.5.2](https://github.com/elastic/elasticsearch-cloud-azure/tree/v2.5.2/#version-252-for-elasticsearch-14) |
| es-1.3 | 2.4.0 | [2.4.0](https://github.com/elasticsearch/elasticsearch-cloud-azure/tree/v2.4.0/#version-240-for-elasticsearch-13) |
| es-1.2 | 2.3.0 | [2.3.0](https://github.com/elasticsearch/elasticsearch-cloud-azure/tree/v2.3.0/#azure-cloud-plugin-for-elasticsearch) |
| es-1.1 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-cloud-azure/tree/v2.2.0/#azure-cloud-plugin-for-elasticsearch) |
| es-1.0 | 2.1.0 | [2.1.0](https://github.com/elasticsearch/elasticsearch-cloud-azure/tree/v2.1.0/#azure-cloud-plugin-for-elasticsearch) |
| es-0.90 | 1.0.0.alpha1 | [1.0.0.alpha1](https://github.com/elasticsearch/elasticsearch-cloud-azure/tree/v1.0.0.alpha1/#azure-cloud-plugin-for-elasticsearch)|
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install cloud-azure \
--url file:target/releases/elasticsearch-cloud-azure-X.X.X-SNAPSHOT.zip
```
Azure Virtual Machine Discovery
===============================
Azure VM discovery allows to use the azure APIs to perform automatic discovery (similar to multicast in non hostile
multicast environments). Here is a simple sample configuration:
```
cloud:
azure:
management:
subscription.id: XXX-XXX-XXX-XXX
cloud.service.name: es-demo-app
keystore:
path: /path/to/azurekeystore.pkcs12
password: WHATEVER
type: pkcs12
discovery:
type: azure
```
How to start (short story)
--------------------------
* Create Azure instances
* Install Elasticsearch
* Install Azure plugin
* Modify `elasticsearch.yml` file
* Start Elasticsearch
Azure credential API settings
-----------------------------
The following are a list of settings that can further control the credential API:
* `cloud.azure.management.keystore.path`: /path/to/keystore
* `cloud.azure.management.keystore.type`: `pkcs12`, `jceks` or `jks`. Defaults to `pkcs12`.
* `cloud.azure.management.keystore.password`: your_password for the keystore
* `cloud.azure.management.subscription.id`: your_azure_subscription_id
* `cloud.azure.management.cloud.service.name`: your_azure_cloud_service_name
Note that in previous versions, it was:
```
cloud:
azure:
keystore: /path/to/keystore
password: your_password_for_keystore
subscription_id: your_azure_subscription_id
service_name: your_azure_cloud_service_name
```
Advanced settings
-----------------
The following are a list of settings that can further control the discovery:
* `discovery.azure.host.type`: either `public_ip` or `private_ip` (default). Azure discovery will use the one you set to ping
other nodes. This feature was not documented before but was existing under `cloud.azure.host_type`.
* `discovery.azure.endpoint.name`: when using `public_ip` this setting is used to identify the endpoint name used to forward requests
to elasticsearch (aka transport port name). Defaults to `elasticsearch`. In Azure management console, you could define
an endpoint `elasticsearch` forwarding for example requests on public IP on port 8100 to the virtual machine on port 9300.
This feature was not documented before but was existing under `cloud.azure.port_name`.
* `discovery.azure.deployment.name`: deployment name if any. Defaults to the value set with `cloud.azure.management.cloud.service.name`.
* `discovery.azure.deployment.slot`: either `staging` or `production` (default).
For example:
```
discovery:
type: azure
azure:
host:
type: private_ip
endpoint:
name: elasticsearch
deployment:
name: your_azure_cloud_service_name
slot: production
```
How to start (long story)
--------------------------
We will expose here one strategy which is to hide our Elasticsearch cluster from outside.
With this strategy, only VM behind this same virtual port can talk to each other.
That means that with this mode, you can use elasticsearch unicast discovery to build a cluster.
Best, you can use the `elasticsearch-cloud-azure` plugin to let it fetch information about your nodes using
azure API.
### Prerequisites
Before starting, you need to have:
* A [Windows Azure account](http://www.windowsazure.com/)
* SSH keys and certificate
* OpenSSL that isn't from MacPorts, specifically `OpenSSL 1.0.1f 6 Jan
2014` doesn't seem to create a valid keypair for ssh. FWIW,
`OpenSSL 1.0.1c 10 May 2012` on Ubuntu 12.04 LTS is known to work.
You should follow [this guide](http://azure.microsoft.com/en-us/documentation/articles/linux-use-ssh-key/) to learn
how to create or use existing SSH keys. If you have already did it, you can skip the following.
Here is a description on how to generate SSH keys using `openssl`:
```sh
# You may want to use another dir than /tmp
cd /tmp
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout azure-private.key -out azure-certificate.pem
chmod 600 azure-private.key azure-certificate.pem
openssl x509 -outform der -in azure-certificate.pem -out azure-certificate.cer
```
Generate a keystore which will be used by the plugin to authenticate with a certificate
all Azure API calls.
```sh
# Generate a keystore (azurekeystore.pkcs12)
# Transform private key to PEM format
openssl pkcs8 -topk8 -nocrypt -in azure-private.key -inform PEM -out azure-pk.pem -outform PEM
# Transform certificate to PEM format
openssl x509 -inform der -in azure-certificate.cer -out azure-cert.pem
cat azure-cert.pem azure-pk.pem > azure.pem.txt
# You MUST enter a password!
openssl pkcs12 -export -in azure.pem.txt -out azurekeystore.pkcs12 -name azure -noiter -nomaciter
```
Upload the `azure-certificate.cer` file both in the elasticsearch Cloud Service (under `Manage Certificates`),
and under `Settings -> Manage Certificates`.
**Important**: when prompted for a password, you need to enter a non empty one.
See this [guide](http://www.windowsazure.com/en-us/manage/linux/how-to-guides/ssh-into-linux/) to have
more details on how to create keys for Azure.
Once done, you need to upload your certificate in Azure:
* Go to the [management console](https://account.windowsazure.com/).
* Sign in using your account.
* Click on `Portal`.
* Go to Settings (bottom of the left list)
* On the bottom bar, click on `Upload` and upload your `azure-certificate.cer` file.
You may want to use [Windows Azure Command-Line Tool](http://www.windowsazure.com/en-us/develop/nodejs/how-to-guides/command-line-tools/):
* Install [NodeJS](https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager), for example using
homebrew on MacOS X:
```sh
brew install node
```
* Install Azure tools:
```sh
sudo npm install azure-cli -g
```
* Download and import your azure settings:
```sh
# This will open a browser and will download a .publishsettings file
azure account download
# Import this file (we have downloaded it to /tmp)
# Note, it will create needed files in ~/.azure. You can remove azure.publishsettings when done.
azure account import /tmp/azure.publishsettings
```
### Creating your first instance
You need to have a storage account available. Check [Azure Blob Storage documentation](http://www.windowsazure.com/en-us/develop/net/how-to-guides/blob-storage/#create-account)
for more information.
You will need to choose the operating system you want to run on. To get a list of official available images, run:
```sh
azure vm image list
```
Let's say we are going to deploy an Ubuntu image on an extra small instance in West Europe:
* Azure cluster name: `azure-elasticsearch-cluster`
* Image: `b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20130808-alpha3-en-us-30GB`
* VM Name: `myesnode1`
* VM Size: `extrasmall`
* Location: `West Europe`
* Login: `elasticsearch`
* Password: `password1234!!`
Using command line:
```sh
azure vm create azure-elasticsearch-cluster \
b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20130808-alpha3-en-us-30GB \
--vm-name myesnode1 \
--location "West Europe" \
--vm-size extrasmall \
--ssh 22 \
--ssh-cert /tmp/azure-certificate.pem \
elasticsearch password1234\!\!
```
You should see something like:
```
info: Executing command vm create
+ Looking up image
+ Looking up cloud service
+ Creating cloud service
+ Retrieving storage accounts
+ Configuring certificate
+ Creating VM
info: vm create command OK
```
Now, your first instance is started. You need to install Elasticsearch on it.
> **Note on SSH**
>
> You need to give the private key and username each time you log on your instance:
>
>```sh
>ssh -i ~/.ssh/azure-private.key elasticsearch@myescluster.cloudapp.net
>```
>
> But you can also define it once in `~/.ssh/config` file:
>
>```
>Host *.cloudapp.net
> User elasticsearch
> StrictHostKeyChecking no
> UserKnownHostsFile=/dev/null
> IdentityFile ~/.ssh/azure-private.key
>```
```sh
# First, copy your keystore on this machine
scp /tmp/azurekeystore.pkcs12 azure-elasticsearch-cluster.cloudapp.net:/home/elasticsearch
# Then, connect to your instance using SSH
ssh azure-elasticsearch-cluster.cloudapp.net
```
Once connected, install Elasticsearch:
```sh
# Install Latest Java version
# Read http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html for details
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
# If you want to install OpenJDK instead
# sudo apt-get update
# sudo apt-get install openjdk-7-jre-headless
# Download Elasticsearch
curl -s https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.0.deb -o elasticsearch-1.0.0.deb
# Prepare Elasticsearch installation
sudo dpkg -i elasticsearch-1.0.0.deb
```
Check that elasticsearch is running:
```sh
curl http://localhost:9200/
```
This command should give you a JSON result:
```javascript
{
"status" : 200,
"name" : "Living Colossus",
"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},
"tagline" : "You Know, for Search"
}
```
### Install elasticsearch cloud azure plugin
```sh
# Stop elasticsearch
sudo service elasticsearch stop
# Install the plugin
sudo /usr/share/elasticsearch/bin/plugin install elasticsearch/elasticsearch-cloud-azure/2.6.1
# Configure it
sudo vi /etc/elasticsearch/elasticsearch.yml
```
And add the following lines:
```yaml
# If you don't remember your account id, you may get it with `azure account list`
cloud:
azure:
management:
subscription.id: your_azure_subscription_id
cloud.service.name: your_azure_cloud_service_name
keystore:
path: /home/elasticsearch/azurekeystore.pkcs12
password: your_password_for_keystore
discovery:
type: azure
# Recommended (warning: non durable disk)
# path.data: /mnt/resource/elasticsearch/data
```
Restart elasticsearch:
```sh
sudo service elasticsearch start
```
If anything goes wrong, check your logs in `/var/log/elasticsearch`.
Scaling Out!
------------
You need first to create an image of your previous machine.
Disconnect from your machine and run locally the following commands:
```sh
# Shutdown the instance
azure vm shutdown myesnode1
# Create an image from this instance (it could take some minutes)
azure vm capture myesnode1 esnode-image --delete
# Note that the previous instance has been deleted (mandatory)
# So you need to create it again and BTW create other instances.
azure vm create azure-elasticsearch-cluster \
esnode-image \
--vm-name myesnode1 \
--location "West Europe" \
--vm-size extrasmall \
--ssh 22 \
--ssh-cert /tmp/azure-certificate.pem \
elasticsearch password1234\!\!
```
> **Note:** It could happen that azure changes the endpoint public IP address.
> DNS propagation could take some minutes before you can connect again using
> name. You can get from azure the IP address if needed, using:
>
> ```sh
> # Look at Network `Endpoints 0 Vip`
> azure vm show myesnode1
> ```
Let's start more instances!
```sh
for x in $(seq 2 10)
do
echo "Launching azure instance #$x..."
azure vm create azure-elasticsearch-cluster \
esnode-image \
--vm-name myesnode$x \
--vm-size extrasmall \
--ssh $((21 + $x)) \
--ssh-cert /tmp/azure-certificate.pem \
--connect \
elasticsearch password1234\!\!
done
```
If you want to remove your running instances:
```
azure vm delete myesnode1
```
Azure Repository
================
To enable Azure repositories, you have first to set your azure storage settings in `elasticsearch.yml` file:
```
cloud:
azure:
storage:
account: your_azure_storage_account
key: your_azure_storage_key
```
For information, in previous version of the azure plugin, settings were:
```
cloud:
azure:
storage_account: your_azure_storage_account
storage_key: your_azure_storage_key
```
The Azure repository supports following settings:
* `container`: Container name. Defaults to `elasticsearch-snapshots`
* `base_path`: Specifies the path within container to repository data. Defaults to empty (root directory).
* `chunk_size`: Big files can be broken down into chunks during snapshotting if needed. The chunk size can be specified
in bytes or by using size value notation, i.e. `1g`, `10m`, `5k`. Defaults to `64m` (64m max)
* `compress`: When set to `true` metadata files are stored in compressed format. This setting doesn't affect index
files that are already compressed by default. Defaults to `false`.
Some examples, using scripts:
```sh
# The simpliest one
$ curl -XPUT 'http://localhost:9200/_snapshot/my_backup1' -d '{
"type": "azure"
}'
# With some settings
$ curl -XPUT 'http://localhost:9200/_snapshot/my_backup2' -d '{
"type": "azure",
"settings": {
"container": "backup_container",
"base_path": "backups",
"chunk_size": "32m",
"compress": true
}
}'
```
Example using Java:
```java
client.admin().cluster().preparePutRepository("my_backup3")
.setType("azure").setSettings(Settings.settingsBuilder()
.put(Storage.CONTAINER, "backup_container")
.put(Storage.CHUNK_SIZE, new ByteSizeValue(32, ByteSizeUnit.MB))
).get();
```
Repository validation rules
---------------------------
According to the [containers naming guide](http://msdn.microsoft.com/en-us/library/dd135715.aspx), a container name must
be a valid DNS name, conforming to the following naming rules:
* Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
* Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not
permitted in container names.
* All letters in a container name must be lowercase.
* Container names must be from 3 through 63 characters long.
Testing
=======
Integrations tests in this plugin require working Azure configuration and therefore disabled by default.
To enable tests prepare a config file `elasticsearch.yml` with the following content:
```
cloud:
azure:
storage:
account: "YOUR-AZURE-STORAGE-NAME"
key: "YOUR-AZURE-STORAGE-KEY"
```
Replaces `account`, `key` with your settings. Please, note that the test will delete all snapshot/restore related files in the specified bucket.
To run test:
```sh
mvn -Dtests.azure=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test
```
Working around a bug in Windows SMB and Java on windows
=======================================================
When using a shared file system based on the SMB protocol (like Azure File Service) to store indices, the way Lucene open index segment files is with a write only flag. This is the *correct* way to open the files, as they will only be used for writes and allows different FS implementations to optimize for it. Sadly, in windows with SMB, this disables the cache manager, causing writes to be slow. This has been described in [LUCENE-6176](https://issues.apache.org/jira/browse/LUCENE-6176), but it affects each and every Java program out there!. This need and must be fixed outside of ES and/or Lucene, either in windows or OpenJDK. For now, we are providing an experimental support to open the files with read flag, but this should be considered experimental and the correct way to fix it is in OpenJDK or Windows.
The Azure Cloud plugin provides two storage types optimized for SMB:
- `smb_mmap_fs`: a SMB specific implementation of the default [mmap fs](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#mmapfs)
- `smb_simple_fs`: a SMB specific implementation of the default [simple fs](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#simplefs)
To use one of these specific storage types, you need to install the Azure Cloud plugin and restart the node.
Then configure Elasticsearch to set the storage type you want.
This can be configured for all indices by adding this to the `elasticsearch.yml` file:
```yaml
index.store.type: smb_simple_fs
```
Note that setting will be applied for newly created indices.
It can also be set on a per-index basis at index creation time:
```sh
curl -XPUT localhost:9200/my_index -d '{
"settings": {
"index.store.type": "smb_mmap_fs"
}
}'
```
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,8 +0,0 @@
Elasticsearch
Copyright 2009-2015 Elasticsearch
This product includes software developed by The Apache Software
Foundation (http://www.apache.org/).
The LICENSE and NOTICE files for all dependencies may be found in the licenses/
directory.

View File

@ -1,421 +0,0 @@
Google Compute Engine Cloud Plugin for Elasticsearch
====================================================
The GCE Cloud plugin allows to use GCE API for the unicast discovery mechanism.
In order to install the plugin, run:
```sh
bin/plugin install elasticsearch/elasticsearch-cloud-gce/2.5.0
```
You need to install a version matching your Elasticsearch version:
| Elasticsearch | GCE Cloud Plugin | Docs |
|------------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-cloud-gce/tree/es-1.x/#google-compute-engine-cloud-plugin-for-elasticsearch)|
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-cloud-gce/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.1 | [2.4.1](https://github.com/elasticsearch/elasticsearch-cloud-gce/tree/v2.4.1/#version-241-for-elasticsearch-14) |
| es-1.3 | 2.3.0 | [2.3.0](https://github.com/elasticsearch/elasticsearch-cloud-gce/tree/v2.3.0/#version-230-for-elasticsearch-13) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-cloud-gce/tree/v2.2.0/#google-compute-engine-cloud-plugin-for-elasticsearch)|
| es-1.1 | 2.1.2 | [2.1.2](https://github.com/elasticsearch/elasticsearch-cloud-gce/tree/v2.1.2/#google-compute-engine-cloud-plugin-for-elasticsearch)|
| es-1.0 | 2.0.1 | [2.0.1](https://github.com/elasticsearch/elasticsearch-cloud-gce/tree/v2.0.1/#google-compute-engine-cloud-plugin-for-elasticsearch)|
| es-0.90 | 1.3.0 | [1.3.0](https://github.com/elasticsearch/elasticsearch-cloud-gce/tree/v1.3.0/#google-compute-engine-cloud-plugin-for-elasticsearch)|
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install cloud-gce \
--url file:target/releases/elasticsearch-cloud-gce-X.X.X-SNAPSHOT.zip
```
Google Compute Engine Virtual Machine Discovery
===============================
Google Compute Engine VM discovery allows to use the google APIs to perform automatic discovery (similar to multicast in non hostile
multicast environments). Here is a simple sample configuration:
```yaml
cloud:
gce:
project_id: <your-google-project-id>
zone: <your-zone>
discovery:
type: gce
```
How to start (short story)
--------------------------
* Create Google Compute Engine instance (with compute rw permissions)
* Install Elasticsearch
* Install Google Compute Engine Cloud plugin
* Modify `elasticsearch.yml` file
* Start Elasticsearch
How to start (long story)
--------------------------
### Prerequisites
Before starting, you should have:
* Your project ID. Let's say here `es-cloud`. Get it from [Google APIS Console](https://code.google.com/apis/console/).
* [Google Cloud SDK](https://developers.google.com/cloud/sdk/)
If you did not set it yet, you can define your default project you will work on:
```sh
gcloud config set project es-cloud
```
### Creating your first instance
```sh
gcutil addinstance myesnode1 \
--service_account_scope=compute-rw,storage-full \
--persistent_boot_disk
```
You will be asked to open a link in your browser. Login and allow access to listed services.
You will get back a verification code. Copy and paste it in your terminal.
You should get `Authentication successful.` message.
Then, choose your zone. Let's say here that we choose `europe-west1-a`.
Choose your compute instance size. Let's say `f1-micro`.
Choose your OS. Let's say `projects/debian-cloud/global/images/debian-7-wheezy-v20140606`.
You may be asked to create a ssh key. Follow instructions to create one.
When done, a report like this one should appears:
```sh
Table of resources:
+-----------+--------------+-------+---------+--------------+----------------+----------------+----------------+---------+----------------+
| name | machine-type | image | network | network-ip | external-ip | disks | zone | status | status-message |
+-----------+--------------+-------+---------+--------------+----------------+----------------+----------------+---------+----------------+
| myesnode1 | f1-micro | | default | 10.240.20.57 | 192.158.29.199 | boot-myesnode1 | europe-west1-a | RUNNING | |
+-----------+--------------+-------+---------+--------------+----------------+----------------+----------------+---------+----------------+
```
You can now connect to your instance:
```
# Connect using google cloud SDK
gcloud compute ssh myesnode1 --zone europe-west1-a
# Or using SSH with external IP address
ssh -i ~/.ssh/google_compute_engine 192.158.29.199
```
*Note Regarding Service Account Permissions*
It's important when creating an instance that the correct permissions are set. At a minimum, you must ensure you have:
```
service_account_scope=compute-rw
```
Failing to set this will result in unauthorized messages when starting Elasticsearch.
See [Machine Permissions](#machine-permissions).
Once connected, install Elasticsearch:
```sh
sudo apt-get update
# Download Elasticsearch
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.deb
# Prepare Java installation
sudo apt-get install java7-runtime-headless
# Prepare Elasticsearch installation
sudo dpkg -i elasticsearch-1.2.1.deb
```
### Install elasticsearch cloud gce plugin
Install the plugin:
```sh
# Use Plugin Manager to install it
sudo /usr/share/elasticsearch/bin/plugin install elasticsearch/elasticsearch-cloud-gce/2.2.0
# Configure it:
sudo vi /etc/elasticsearch/elasticsearch.yml
```
And add the following lines:
```yaml
cloud:
gce:
project_id: es-cloud
zone: europe-west1-a
discovery:
type: gce
```
Start elasticsearch:
```sh
sudo /etc/init.d/elasticsearch start
```
If anything goes wrong, you should check logs:
```sh
tail -f /var/log/elasticsearch/elasticsearch.log
```
If needed, you can change log level to `TRACE` by modifying `sudo vi /etc/elasticsearch/logging.yml`:
```yaml
# discovery
discovery.gce: TRACE
```
### Cloning your existing machine
In order to build a cluster on many nodes, you can clone your configured instance to new nodes.
You won't have to reinstall everything!
First create an image of your running instance and upload it to Google Cloud Storage:
```sh
# Create an image of yur current instance
sudo /usr/bin/gcimagebundle -d /dev/sda -o /tmp/
# An image has been created in `/tmp` directory:
ls /tmp
e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz
# Upload your image to Google Cloud Storage:
# Create a bucket to hold your image, let's say `esimage`:
gsutil mb gs://esimage
# Copy your image to this bucket:
gsutil cp /tmp/e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz gs://esimage
# Then add your image to images collection:
gcutil addimage elasticsearch-1-2-1 gs://esimage/e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz
# If the previous command did not work for you, logout from your instance
# and launch the same command from your local machine.
```
### Start new instances
As you have now an image, you can create as many instances as you need:
```sh
# Just change node name (here myesnode2)
gcutil addinstance --image=elasticsearch-1-2-1 myesnode2
# If you want to provide all details directly, you can use:
gcutil addinstance --image=elasticsearch-1-2-1 \
--kernel=projects/google/global/kernels/gce-v20130603 myesnode2 \
--zone europe-west1-a --machine_type f1-micro --service_account_scope=compute-rw \
--persistent_boot_disk
```
### Remove an instance (aka shut it down)
You can use [Google Cloud Console](https://cloud.google.com/console) or CLI to manage your instances:
```sh
# Stopping and removing instances
gcutil deleteinstance myesnode1 myesnode2 \
--zone=europe-west1-a
# Consider removing disk as well if you don't need them anymore
gcutil deletedisk boot-myesnode1 boot-myesnode2 \
--zone=europe-west1-a
```
Using zones
-----------
`cloud.gce.zone` helps to retrieve instances running in a given zone. It should be one of the
[GCE supported zones](https://developers.google.com/compute/docs/zones#available).
The GCE discovery can support multi zones although you need to be aware of network latency between zones.
To enable discovery across more than one zone, just enter add your zone list to `cloud.gce.zone` setting:
```yaml
cloud:
gce:
project_id: <your-google-project-id>
zone: ["<your-zone1>", "<your-zone2>"]
discovery:
type: gce
```
Filtering by tags
-----------------
The GCE discovery can also filter machines to include in the cluster based on tags using `discovery.gce.tags` settings.
For example, setting `discovery.gce.tags` to `dev` will only filter instances having a tag set to `dev`. Several tags
set will require all of those tags to be set for the instance to be included.
One practical use for tag filtering is when an GCE cluster contains many nodes that are not running
elasticsearch. In this case (particularly with high ping_timeout values) there is a risk that a new node's discovery
phase will end before it has found the cluster (which will result in it declaring itself master of a new cluster
with the same name - highly undesirable). Adding tag on elasticsearch GCE nodes and then filtering by that
tag will resolve this issue.
Add your tag when building the new instance:
```sh
gcutil --project=es-cloud addinstance myesnode1 \
--service_account_scope=compute-rw \
--persistent_boot_disk \
--tags=elasticsearch,dev
```
Then, define it in `elasticsearch.yml`:
```yaml
cloud:
gce:
project_id: es-cloud
zone: europe-west1-a
discovery:
type: gce
gce:
tags: elasticsearch, dev
```
Changing default transport port
-------------------------------
By default, elasticsearch GCE plugin assumes that you run elasticsearch on 9300 default port.
But you can specify the port value elasticsearch is meant to use using google compute engine metadata `es_port`:
### When creating instance
Add `--metadata=es_port:9301` option:
```sh
# when creating first instance
gcutil addinstance myesnode1 \
--service_account_scope=compute-rw,storage-full \
--persistent_boot_disk \
--metadata=es_port:9301
# when creating an instance from an image
gcutil addinstance --image=elasticsearch-1-0-0-RC1 \
--kernel=projects/google/global/kernels/gce-v20130603 myesnode2 \
--zone europe-west1-a --machine_type f1-micro --service_account_scope=compute-rw \
--persistent_boot_disk --metadata=es_port:9301
```
### On a running instance
```sh
# Get metadata fingerprint
gcutil getinstance myesnode1 --zone=europe-west1-a
+------------------------+---------------------------------------------------------------------------------------------------------+
| property | value |
+------------------------+---------------------------------------------------------------------------------------------------------+
| metadata | |
| fingerprint | 42WmSpB8rSM= |
+------------------------+---------------------------------------------------------------------------------------------------------+
# Use that fingerprint
gcutil setinstancemetadata myesnode1 \
--zone=europe-west1-a \
--metadata=es_port:9301 \
--fingerprint=42WmSpB8rSM=
```
Tips
----
### Store project id locally
If you don't want to repeat the project id each time, you can save it in `~/.gcutil.flags` file using:
```sh
gcutil getproject --project=es-cloud --cache_flag_values
```
`~/.gcutil.flags` file now contains:
```
--project=es-cloud
```
### Machine Permissions
**Creating machines with gcutil**
Ensure the following flags are set:
````
--service_account_scope=compute-rw
```
**Creating with console (web)**
When creating an instance using the web portal, click **Show advanced options**.
At the bottom of the page, under `PROJECT ACCESS`, choose `>> Compute >> Read Write`.
**Creating with knife google**
Set the service account scopes when creating the machine:
```
$ knife google server create www1 \
-m n1-standard-1 \
-I debian-7-wheezy-v20131120 \
-Z us-central1-a \
-i ~/.ssh/id_rsa \
-x jdoe \
--gce-service-account-scopes https://www.googleapis.com/auth/compute.full_control
```
Or, you may use the alias:
```
--gce-service-account-scopes compute-rw
```
If you have created a machine without the correct permissions, you will see `403 unauthorized` error messages. The only
way to alter these permissions is to delete the instance (NOT THE DISK). Then create another with the correct permissions.
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,5 +0,0 @@
Example JVM Plugin for Elasticsearch
==================================
Leniency is the root of all evil

View File

@ -1,177 +0,0 @@
JavaScript lang Plugin for Elasticsearch
==================================
The JavaScript language plugin allows to have `javascript` (or `js`) as the language of scripts to execute.
In order to install the plugin, simply run:
```sh
bin/plugin install elasticsearch/elasticsearch-lang-javascript/2.5.0
```
You need to install a version matching your Elasticsearch version:
| elasticsearch | JavaScript Plugin | Docs |
|---------------|-----------------------|------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-transport-thrift/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-lang-javascript/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.1 | [2.4.1](https://github.com/elasticsearch/elasticsearch-lang-javascript/tree/v2.4.1/#version-241-for-elasticsearch-14) |
| es-1.3 | 2.3.1 | [2.3.1](https://github.com/elasticsearch/elasticsearch-lang-javascript/tree/v2.3.1/#version-231-for-elasticsearch-13) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-lang-javascript/tree/v2.2.0/#javascript-lang-plugin-for-elasticsearch) |
| es-1.1 | 2.1.0 | [2.1.0](https://github.com/elasticsearch/elasticsearch-lang-javascript/tree/v2.1.0/#javascript-lang-plugin-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elasticsearch/elasticsearch-lang-javascript/tree/v2.0.0/#javascript-lang-plugin-for-elasticsearch) |
| es-0.90 | 1.4.0 | [1.4.0](https://github.com/elasticsearch/elasticsearch-lang-javascript/tree/v1.4.0/#javascript-lang-plugin-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install lang-javascript \
--url file:target/releases/elasticsearch-lang-javascript-X.X.X-SNAPSHOT.zip
```
Using javascript with function_score
------------------------------------
Let's say you want to use `function_score` API using `javascript`. Here is
a way of doing it:
```sh
curl -XDELETE "http://localhost:9200/test"
curl -XPUT "http://localhost:9200/test/doc/1" -d '{
"num": 1.0
}'
curl -XPUT "http://localhost:9200/test/doc/2?refresh" -d '{
"num": 2.0
}'
curl -XGET "http://localhost:9200/test/_search?pretty" -d '
{
"query": {
"function_score": {
"script_score": {
"script": "doc[\"num\"].value",
"lang": "javascript"
}
}
}
}'
```
gives
```javascript
{
// ...
"hits": {
"total": 2,
"max_score": 4,
"hits": [
{
// ...
"_score": 4
},
{
// ...
"_score": 1
}
]
}
}
```
Using javascript with script_fields
-----------------------------------
```sh
curl -XDELETE "http://localhost:9200/test"
curl -XPUT "http://localhost:9200/test/doc/1?refresh" -d'
{
"obj1": {
"test": "something"
},
"obj2": {
"arr2": [ "arr_value1", "arr_value2" ]
}
}'
curl -XGET "http://localhost:9200/test/_search" -d'
{
"script_fields": {
"s_obj1": {
"script": "_source.obj1", "lang": "js"
},
"s_obj1_test": {
"script": "_source.obj1.test", "lang": "js"
},
"s_obj2": {
"script": "_source.obj2", "lang": "js"
},
"s_obj2_arr2": {
"script": "_source.obj2.arr2", "lang": "js"
}
}
}'
```
gives
```javascript
{
// ...
"hits": [
{
// ...
"fields": {
"s_obj2_arr2": [
[
"arr_value1",
"arr_value2"
]
],
"s_obj1_test": [
"something"
],
"s_obj2": [
{
"arr2": [
"arr_value1",
"arr_value2"
]
}
],
"s_obj1": [
{
"test": "something"
}
]
}
}
]
}
```
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

View File

@ -1,178 +0,0 @@
Python lang Plugin for Elasticsearch
==================================
The Python (jython) language plugin allows to have `python` as the language of scripts to execute.
In order to install the plugin, simply run:
```sh
bin/plugin install elasticsearch/elasticsearch-lang-python/2.5.0
```
You need to install a version matching your Elasticsearch version:
| elasticsearch | Python Lang Plugin | Docs |
|---------------|-----------------------|------------|
| master | Build from source | See below |
| es-1.x | Build from source | [2.6.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-lang-python/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x) |
| es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-lang-python/tree/v2.5.0/#version-250-for-elasticsearch-15) |
| es-1.4 | 2.4.1 | [2.4.1](https://github.com/elasticsearch/elasticsearch-lang-python/tree/v2.4.1/#version-241-for-elasticsearch-14) |
| es-1.3 | 2.3.1 | [2.3.1](https://github.com/elasticsearch/elasticsearch-lang-python/tree/v2.3.1/#version-231-for-elasticsearch-13) |
| < 1.3.5 | 2.3.0 | [2.3.0](https://github.com/elasticsearch/elasticsearch-lang-python/tree/v2.3.0/#version-230-for-elasticsearch-13) |
| es-1.2 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-lang-python/tree/v2.2.0/#python-lang-plugin-for-elasticsearch) |
| es-1.0 | 2.0.0 | [2.0.0](https://github.com/elasticsearch/elasticsearch-lang-python/tree/v2.0.0/#python-lang-plugin-for-elasticsearch) |
| es-0.90 | 1.0.0 | [1.0.0](https://github.com/elasticsearch/elasticsearch-lang-python/tree/v1.0.0/#python-lang-plugin-for-elasticsearch) |
To build a `SNAPSHOT` version, you need to build it with Maven:
```bash
mvn clean install
plugin install lang-python \
--url file:target/releases/elasticsearch-lang-python-X.X.X-SNAPSHOT.zip
```
User Guide
----------
Using python with function_score
--------------------------------
Let's say you want to use `function_score` API using `python`. Here is
a way of doing it:
```sh
curl -XDELETE "http://localhost:9200/test"
curl -XPUT "http://localhost:9200/test/doc/1" -d '{
"num": 1.0
}'
curl -XPUT "http://localhost:9200/test/doc/2?refresh" -d '{
"num": 2.0
}'
curl -XGET "http://localhost:9200/test/_search?pretty" -d'
{
"query": {
"function_score": {
"script_score": {
"script": "doc[\"num\"].value * _score",
"lang": "python"
}
}
}
}'
```
gives
```javascript
{
// ...
"hits": {
"total": 2,
"max_score": 2,
"hits": [
{
// ...
"_score": 2
},
{
// ...
"_score": 1
}
]
}
}
```
Using python with script_fields
-------------------------------
```sh
curl -XDELETE "http://localhost:9200/test"
curl -XPUT "http://localhost:9200/test/doc/1?refresh" -d'
{
"obj1": {
"test": "something"
},
"obj2": {
"arr2": [ "arr_value1", "arr_value2" ]
}
}'
curl -XGET "http://localhost:9200/test/_search" -d'
{
"script_fields": {
"s_obj1": {
"script": "_source[\"obj1\"]", "lang": "python"
},
"s_obj1_test": {
"script": "_source[\"obj1\"][\"test\"]", "lang": "python"
},
"s_obj2": {
"script": "_source[\"obj2\"]", "lang": "python"
},
"s_obj2_arr2": {
"script": "_source[\"obj2\"][\"arr2\"]", "lang": "python"
}
}
}'
```
gives
```javascript
{
// ...
"hits": [
{
// ...
"fields": {
"s_obj2_arr2": [
[
"arr_value1",
"arr_value2"
]
],
"s_obj1_test": [
"something"
],
"s_obj2": [
{
"arr2": [
"arr_value1",
"arr_value2"
]
}
],
"s_obj1": [
{
"test": "something"
}
]
}
}
]
}
```
License
-------
This software is licensed under the Apache 2 license, quoted below.
Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.