Docs: Removed all the added/deprecated tags from 1.x
This commit is contained in:
parent
e85e07941d
commit
cb00d4a542
|
@ -14,8 +14,7 @@ type:
|
|||
|`pattern` |The regular expression pattern, defaults to `\W+`.
|
||||
|`flags` |The regular expression flags.
|
||||
|`stopwords` |A list of stopwords to initialize the stop filter with.
|
||||
Defaults to an 'empty' stopword list added[1.0.0.RC1, Previously
|
||||
defaulted to the English stopwords list]. Check
|
||||
Defaults to an 'empty' stopword list Check
|
||||
<<analysis-stop-analyzer,Stop Analyzer>> for more details.
|
||||
|===================================================================
|
||||
|
||||
|
|
|
@ -18,8 +18,7 @@ type:
|
|||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`stopwords` |A list of stopwords to initialize the stop filter with.
|
||||
Defaults to an 'empty' stopword list added[1.0.0.Beta1, Previously
|
||||
defaulted to the English stopwords list]. Check
|
||||
Defaults to an 'empty' stopword list Check
|
||||
<<analysis-stop-analyzer,Stop Analyzer>> for more details.
|
||||
|`max_token_length` |The maximum token length. If a token is seen that
|
||||
exceeds this length then it is discarded. Defaults to `255`.
|
||||
|
|
|
@ -1,7 +1,5 @@
|
|||
[[analysis-apostrophe-tokenfilter]]
|
||||
=== Apostrophe Token Filter
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
The `apostrophe` token filter strips all characters after an apostrophe,
|
||||
including the apostrophe itself.
|
||||
|
|
|
@ -20,7 +20,6 @@ equivalents, if one exists. Example:
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
added[1.1.0]
|
||||
Accepts `preserve_original` setting which defaults to false but if true
|
||||
will keep the original token as well as emit the folded token. For
|
||||
example:
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[analysis-classic-tokenfilter]]
|
||||
=== Classic Token Filter
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
The `classic` token filter does optional post-processing of
|
||||
terms that are generated by the <<analysis-classic-tokenizer,`classic` tokenizer>>.
|
||||
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[analysis-keep-types-tokenfilter]]
|
||||
=== Keep Types Token Filter
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
A token filter of type `keep_types` that only keeps tokens with a token type
|
||||
contained in a predefined set.
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
A token filter of type `lowercase` that normalizes token text to lower
|
||||
case.
|
||||
|
||||
Lowercase token filter supports Greek, Irish added[1.3.0], and Turkish lowercase token
|
||||
Lowercase token filter supports Greek, Irish, and Turkish lowercase token
|
||||
filters through the `language` parameter. Below is a usage example in a
|
||||
custom analyzer
|
||||
|
||||
|
|
|
@ -11,19 +11,19 @@ http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/
|
|||
|
||||
German::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html[`german_normalization`] added[1.3.0]
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html[`german_normalization`]
|
||||
|
||||
Hindi::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizer.html[`hindi_normalization`] added[1.3.0]
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizer.html[`hindi_normalization`]
|
||||
|
||||
Indic::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizer.html[`indic_normalization`] added[1.3.0]
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizer.html[`indic_normalization`]
|
||||
|
||||
Kurdish (Sorani)::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizer.html[`sorani_normalization`] added[1.3.0]
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizer.html[`sorani_normalization`]
|
||||
|
||||
Persian::
|
||||
|
||||
|
@ -31,6 +31,6 @@ http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/
|
|||
|
||||
Scandinavian::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html[`scandinavian_normalization`] added[1.3.0],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html[`scandinavian_folding`] added[1.3.0]
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html[`scandinavian_normalization`],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html[`scandinavian_folding`]
|
||||
|
||||
|
|
|
@ -65,15 +65,15 @@ http://snowball.tartarus.org/algorithms/danish/stemmer.html[*`danish`*]
|
|||
Dutch::
|
||||
|
||||
http://snowball.tartarus.org/algorithms/dutch/stemmer.html[*`dutch`*],
|
||||
http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html[`dutch_kp`] added[1.3.0,Renamed from `kp`]
|
||||
http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html[`dutch_kp`]
|
||||
|
||||
English::
|
||||
|
||||
http://snowball.tartarus.org/algorithms/porter/stemmer.html[*`english`*] added[1.3.0,Returns the <<analysis-porterstem-tokenfilter,`porter_stem`>> instead of the <<analysis-snowball-tokenfilter,`english` Snowball token filter>>],
|
||||
http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[`light_english`] added[1.3.0,Returns the <<analysis-kstem-tokenfilter,`kstem` token filter>>],
|
||||
http://snowball.tartarus.org/algorithms/porter/stemmer.html[*`english`*],
|
||||
http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[`light_english`],
|
||||
http://www.researchgate.net/publication/220433848_How_effective_is_suffixing[`minimal_english`],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/en/EnglishPossessiveFilter.html[`possessive_english`],
|
||||
http://snowball.tartarus.org/algorithms/english/stemmer.html[`porter2`] added[1.3.0,Returns the <<analysis-snowball-tokenfilter,`english` Snowball token filter>> instead of the <<analysis-snowball-tokenfilter,`porter` Snowball token filter>>],
|
||||
http://snowball.tartarus.org/algorithms/english/stemmer.html[`porter2`],
|
||||
http://snowball.tartarus.org/algorithms/lovins/stemmer.html[`lovins`]
|
||||
|
||||
Finnish::
|
||||
|
@ -89,8 +89,8 @@ http://dl.acm.org/citation.cfm?id=318984[`minimal_french`]
|
|||
|
||||
Galician::
|
||||
|
||||
http://bvg.udc.es/recursos_lingua/stemming.jsp[*`galician`*] added[1.3.0],
|
||||
http://bvg.udc.es/recursos_lingua/stemming.jsp[`minimal_galician`] (Plural step only) added[1.3.0]
|
||||
http://bvg.udc.es/recursos_lingua/stemming.jsp[*`galician`*],
|
||||
http://bvg.udc.es/recursos_lingua/stemming.jsp[`minimal_galician`] (Plural step only)
|
||||
|
||||
German::
|
||||
|
||||
|
@ -127,7 +127,7 @@ http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[*`light_italian`*
|
|||
|
||||
Kurdish (Sorani)::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ckb/SoraniStemmer.html[*`sorani`*] added[1.3.0]
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ckb/SoraniStemmer.html[*`sorani`*]
|
||||
|
||||
Latvian::
|
||||
|
||||
|
@ -136,20 +136,20 @@ http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/
|
|||
Norwegian (Bokmål)::
|
||||
|
||||
http://snowball.tartarus.org/algorithms/norwegian/stemmer.html[*`norwegian`*],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html[*`light_norwegian`*] added[1.3.0],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html[*`light_norwegian`*],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.html[`minimal_norwegian`]
|
||||
|
||||
Norwegian (Nynorsk)::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html[*`light_nynorsk`*] added[1.3.0],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.html[`minimal_nynorsk`] added[1.3.0]
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/no/NorwegianLightStemmer.html[*`light_nynorsk`*],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.html[`minimal_nynorsk`]
|
||||
|
||||
Portuguese::
|
||||
|
||||
http://snowball.tartarus.org/algorithms/portuguese/stemmer.html[`portuguese`],
|
||||
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[*`light_portuguese`*],
|
||||
http://www.inf.ufrgs.br/\~buriol/papers/Orengo_CLEF07.pdf[`minimal_portuguese`],
|
||||
http://www.inf.ufrgs.br/\~viviane/rslp/index.htm[`portuguese_rslp`] added[1.3.0]
|
||||
http://www.inf.ufrgs.br/\~viviane/rslp/index.htm[`portuguese_rslp`]
|
||||
|
||||
Romanian::
|
||||
|
||||
|
|
|
@ -1,7 +1,5 @@
|
|||
[[analysis-uppercase-tokenfilter]]
|
||||
=== Uppercase Token Filter
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
A token filter of type `uppercase` that normalizes token text to upper
|
||||
case.
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[analysis-classic-tokenizer]]
|
||||
=== Classic Tokenizer
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
A tokenizer of type `classic` providing grammar based tokenizer that is
|
||||
a good tokenizer for English language documents. This tokenizer has
|
||||
heuristics for special treatment of acronyms, company names, email addresses,
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[analysis-thai-tokenizer]]
|
||||
=== Thai Tokenizer
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
A tokenizer of type `thai` that segments Thai text into words. This tokenizer
|
||||
uses the built-in Thai segmentation algorithm included with Java to divide
|
||||
up Thai text. Text in other languages in general will be treated the same
|
||||
|
|
|
@ -51,7 +51,7 @@ specified to expand to all indices.
|
|||
+
|
||||
If `none` is specified then wildcard expansion will be disabled and if `all`
|
||||
is specified, wildcard expressions will expand to all indices (this is equivalent
|
||||
to specifying `open,closed`). coming[1.4.0]
|
||||
to specifying `open,closed`).
|
||||
|
||||
The defaults settings for the above parameters depend on the api being used.
|
||||
|
||||
|
@ -82,7 +82,7 @@ The human readable values can be turned off by adding `?human=false`
|
|||
to the query string. This makes sense when the stats results are
|
||||
being consumed by a monitoring tool, rather than intended for human
|
||||
consumption. The default for the `human` flag is
|
||||
`false`. added[1.00.Beta,Previously defaulted to `true`]
|
||||
`false`.
|
||||
|
||||
[float]
|
||||
=== Flat Settings
|
||||
|
@ -246,7 +246,7 @@ document indexed.
|
|||
[float]
|
||||
=== JSONP
|
||||
|
||||
By default JSONP responses are disabled. coming[1.3,Previously JSONP was enabled by default]
|
||||
By default JSONP responses are disabled.
|
||||
|
||||
When enabled, all REST APIs accept a `callback` parameter
|
||||
resulting in a http://en.wikipedia.org/wiki/JSONP[JSONP] result. You can enable
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[cat-fielddata]]
|
||||
== cat fielddata
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
`fielddata` shows information about currently loaded fielddata on a per-node
|
||||
basis.
|
||||
|
||||
|
|
|
@ -103,8 +103,6 @@ due to forced awareness or allocation filtering.
|
|||
[float]
|
||||
===== Disable allocation
|
||||
|
||||
added[1.0.0.RC1]
|
||||
|
||||
All the disable allocation settings have been deprecated in favour for
|
||||
`cluster.routing.allocation.enable` setting.
|
||||
|
||||
|
@ -156,7 +154,7 @@ All the disable allocation settings have been deprecated in favour for
|
|||
`discovery.zen.minimum_master_nodes`::
|
||||
See <<modules-discovery-zen>>
|
||||
|
||||
`discovery.zen.publish_timeout` added[1.1.0, The setting existed before but wasn't dynamic]::
|
||||
`discovery.zen.publish_timeout`::
|
||||
See <<modules-discovery-zen>>
|
||||
|
||||
[float]
|
||||
|
|
|
@ -20,7 +20,7 @@ $ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
|
|||
--------------------------------------------------
|
||||
|
||||
NOTE: The query being sent in the body must be nested in a `query` key, same as
|
||||
the <<search-search,search api>> works added[1.0.0.RC1,The query was previously the top-level object].
|
||||
the <<search-search,search api>> works
|
||||
|
||||
Both above examples end up doing the same thing, which is delete all
|
||||
tweets from the twitter index for a certain user. The result of the
|
||||
|
|
|
@ -73,8 +73,6 @@ to fetch the first document matching the id across all types.
|
|||
[[get-source-filtering]]
|
||||
=== Source filtering
|
||||
|
||||
added[1.0.0.Beta1]
|
||||
|
||||
By default, the get operation returns the contents of the `_source` field unless
|
||||
you have used the `fields` parameter or if the `_source` field is disabled.
|
||||
You can turn off `_source` retrieval by using the `_source` parameter:
|
||||
|
@ -127,8 +125,6 @@ will fail.
|
|||
[float]
|
||||
[[generated-fields]]
|
||||
=== Generated fields
|
||||
coming[1.4.0]
|
||||
|
||||
If no refresh occurred between indexing and refresh, GET will access the transaction log to fetch the document. However, some fields are generated only when indexing.
|
||||
If you try to access a field that is only generated when indexing, you will get an exception (default). You can choose to ignore field that are generated if the transaction log is accessed by setting `ignore_errors_on_generated_fields=true`.
|
||||
|
||||
|
|
|
@ -113,8 +113,6 @@ GET /test/_mget/
|
|||
[[mget-source-filtering]]
|
||||
=== Source filtering
|
||||
|
||||
added[1.0.0.Beta1]
|
||||
|
||||
By default, the `_source` field will be returned for every document (if stored).
|
||||
Similar to the <<get-source-filtering,get>> API, you can retrieve only parts of
|
||||
the `_source` (or not at all) by using the `_source` parameter. You can also use
|
||||
|
@ -183,8 +181,6 @@ curl 'localhost:9200/_mget' -d '{
|
|||
[float]
|
||||
=== Generated fields
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
See <<generated-fields>> for fields are generated only when indexing.
|
||||
|
||||
[float]
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
Multi termvectors API allows to get multiple termvectors at once. The
|
||||
documents from which to retrieve the term vectors are specified by an index,
|
||||
type and id. But the documents could also be artificially provided coming[1.4.0].
|
||||
type and id. But the documents could also be artificially provided
|
||||
The response includes a `docs`
|
||||
array with all the fetched termvectors, each element having the structure
|
||||
provided by the <<docs-termvectors,termvectors>>
|
||||
|
@ -92,7 +92,7 @@ curl 'localhost:9200/testidx/test/_mtermvectors' -d '{
|
|||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
Additionally coming[1.4.0], just like for the <<docs-termvectors,termvectors>>
|
||||
Additionally, just like for the <<docs-termvectors,termvectors>>
|
||||
API, term vectors could be generated for user provided documents. The syntax
|
||||
is similar to the <<search-percolate,percolator>> API. The mapping used is
|
||||
determined by `_index` and `_type`.
|
||||
|
|
|
@ -1,11 +1,9 @@
|
|||
[[docs-termvectors]]
|
||||
== Term Vectors
|
||||
|
||||
added[1.0.0.Beta1]
|
||||
|
||||
Returns information and statistics on terms in the fields of a particular
|
||||
document. The document could be stored in the index or artificially provided
|
||||
by the user coming[1.4.0]. Note that for documents stored in the index, this
|
||||
by the user Note that for documents stored in the index, this
|
||||
is a near realtime API as the term vectors are not available until the next
|
||||
refresh.
|
||||
|
||||
|
@ -24,7 +22,7 @@ curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?fields=text,...'
|
|||
|
||||
or by adding the requested fields in the request body (see
|
||||
example below). Fields can also be specified with wildcards
|
||||
in similar way to the <<query-dsl-multi-match-query,multi match query>> coming[1.4.0].
|
||||
in similar way to the <<query-dsl-multi-match-query,multi match query>>
|
||||
|
||||
[float]
|
||||
=== Return values
|
||||
|
@ -45,8 +43,6 @@ If the requested information wasn't stored in the index, it will be
|
|||
computed on the fly if possible. Additionally, term vectors could be computed
|
||||
for documents not even existing in the index, but instead provided by the user.
|
||||
|
||||
coming[1.4.0,The ability to computed term vectors on the fly as well as support for artificial documents is only available from 1.4.0 onwards (see below example 2 and 3 respectively)]
|
||||
|
||||
[WARNING]
|
||||
======
|
||||
Start and end offsets assume UTF-16 encoding is being used. If you want to use
|
||||
|
@ -232,7 +228,7 @@ Response:
|
|||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Example 2 coming[1.4.0]
|
||||
=== Example 2
|
||||
|
||||
Term vectors which are not explicitly stored in the index are automatically
|
||||
computed on the fly. The following request returns all information and statistics for the
|
||||
|
@ -251,7 +247,7 @@ curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' -d '{
|
|||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Example 3 coming[1.4.0]
|
||||
=== Example 3
|
||||
|
||||
Additionally, term vectors can also be generated for artificial documents,
|
||||
that is for documents not present in the index. The syntax is similar to the
|
||||
|
|
|
@ -145,8 +145,6 @@ curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
|
|||
}
|
||||
}'
|
||||
--------------------------------------------------
|
||||
coming[1.4.0]
|
||||
|
||||
If the document does not exist you may want your update script to
|
||||
run anyway in order to initialize the document contents using
|
||||
business logic unknown to the client. In this case pass the
|
||||
|
|
|
@ -100,7 +100,7 @@ settings API.
|
|||
[[disk]]
|
||||
=== Disk-based Shard Allocation
|
||||
|
||||
added[1.3.0] disk based shard allocation is enabled from version 1.3.0 onward
|
||||
disk based shard allocation is enabled from version 1.3.0 onward
|
||||
|
||||
Elasticsearch can be configured to prevent shard
|
||||
allocation on nodes depending on disk usage for the node. This
|
||||
|
|
|
@ -28,8 +28,6 @@ example, can be set to `5m` for a 5 minute expiry.
|
|||
[[circuit-breaker]]
|
||||
=== Circuit Breaker
|
||||
|
||||
coming[1.4.0,Prior to 1.4.0 there was only a single circuit breaker for fielddata]
|
||||
|
||||
Elasticsearch contains multiple circuit breakers used to prevent operations from
|
||||
causing an OutOfMemoryError. Each breaker specifies a limit for how much memory
|
||||
it can use. Additionally, there is a parent-level breaker that specifies the
|
||||
|
@ -59,18 +57,10 @@ parameters:
|
|||
A constant that all field data estimations are multiplied with to determine a
|
||||
final estimation. Defaults to 1.03
|
||||
|
||||
`indices.fielddata.breaker.limit`::
|
||||
deprecated[1.4.0,Replaced by `indices.breaker.fielddata.limit`]
|
||||
|
||||
`indices.fielddata.breaker.overhead`::
|
||||
deprecated[1.4.0,Replaced by `indices.breaker.fielddata.overhead`]
|
||||
|
||||
[float]
|
||||
[[request-circuit-breaker]]
|
||||
==== Request circuit breaker
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
The request circuit breaker allows Elasticsearch to prevent per-request data
|
||||
structures (for example, memory used for calculating aggregations during a
|
||||
request) from exceeding a certain amount of memory.
|
||||
|
@ -162,8 +152,6 @@ field data format.
|
|||
[float]
|
||||
==== Global ordinals
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
Global ordinals is a data-structure on top of field data, that maintains an
|
||||
incremental numbering for all the terms in field data in a lexicographic order.
|
||||
Each term has a unique number and the number of term 'A' is lower than the number
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[index-modules-shard-query-cache]]
|
||||
== Shard query cache
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
When a search request is run against an index or against many indices, each
|
||||
involved shard executes the search locally and returns its local results to
|
||||
the _coordinating node_, which combines these shard-level results into a
|
||||
|
|
|
@ -113,7 +113,7 @@ See <<vm-max-map-count>>
|
|||
|
||||
[[default_fs]]
|
||||
[float]
|
||||
==== Hybrid MMap / NIO FS added[1.3.0]
|
||||
==== Hybrid MMap / NIO FS
|
||||
|
||||
The `default` type stores the shard index on the file system depending on
|
||||
the file type by mapping a file into memory (mmap) or using Java NIO. Currently
|
||||
|
|
|
@ -74,8 +74,6 @@ the same index. The filter can be defined using Query DSL and is applied
|
|||
to all Search, Count, Delete By Query and More Like This operations with
|
||||
this alias.
|
||||
|
||||
coming[1.4.0,Fields referred to in alias filters must exist in the mappings of the index/indices pointed to by the alias]
|
||||
|
||||
To create a filtered alias, first we need to ensure that the fields already
|
||||
exist in the mapping:
|
||||
|
||||
|
@ -242,8 +240,6 @@ curl -XPUT 'localhost:9200/users/_alias/user_12' -d '{
|
|||
[[alias-index-creation]]
|
||||
=== Aliases during index creation
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
Aliases can also be specified during <<create-index-aliases,index creation>>:
|
||||
|
||||
[source,js]
|
||||
|
@ -314,8 +310,6 @@ Possible options:
|
|||
|
||||
The rest endpoint is: `/{index}/_alias/{alias}`.
|
||||
|
||||
coming[1.4.0,The API will always include an `aliases` section, even if there aren't any aliases. Previous versions would not return the `aliases` section]
|
||||
|
||||
[float]
|
||||
==== Examples:
|
||||
|
||||
|
|
|
@ -10,7 +10,7 @@ $ curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
|
|||
--------------------------------------------------
|
||||
|
||||
The API, by default, will clear all caches. Specific caches can be cleaned
|
||||
explicitly by setting `filter`, `fielddata`, `query_cache` coming[1.4.0],
|
||||
explicitly by setting `filter`, `fielddata`, `query_cache`,
|
||||
or `id_cache` to `true`.
|
||||
|
||||
All caches relating to a specific field(s) can also be cleared by
|
||||
|
|
|
@ -111,8 +111,6 @@ curl -XPUT localhost:9200/test -d '{
|
|||
[[create-index-aliases]]
|
||||
=== Aliases
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
The create index API allows also to provide a set of <<indices-aliases,aliases>>:
|
||||
|
||||
[source,js]
|
||||
|
@ -133,8 +131,6 @@ curl -XPUT localhost:9200/test -d '{
|
|||
[float]
|
||||
=== Creation Date
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
When an index is created, a timestamp is stored in the index metadata for the creation date. By
|
||||
default this it is automatically generated but it can also be specified using the
|
||||
`creation_date` parameter on the create index API:
|
||||
|
|
|
@ -23,7 +23,7 @@ The flush API accepts the following request parameters:
|
|||
`wait_if_ongoing`:: If set to `true` the flush operation will block until the
|
||||
flush can be executed if another flush operation is already executing.
|
||||
The default is `false` and will cause an exception to be thrown on
|
||||
the shard level if another flush operation is already running. coming[1.4.0]
|
||||
the shard level if another flush operation is already running.
|
||||
|
||||
`full`:: If set to `true` a new index writer is created and settings that have
|
||||
been changed related to the index writer will be refreshed. Note: if a full flush
|
||||
|
|
|
@ -29,8 +29,6 @@ curl -XGET 'http://localhost:9200/_all/_mapping/tweet,book'
|
|||
If you want to get mappings of all indices and types then the following
|
||||
two examples are equivalent:
|
||||
|
||||
coming[1.4.0,The API will always include a `mappings` section, even if there aren't any mappings. Previous versions would not return the `mappings` section]
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XGET 'http://localhost:9200/_all/_mapping'
|
||||
|
|
|
@ -38,7 +38,7 @@ to `true`. Note, a merge can potentially be a very heavy operation, so
|
|||
it might make sense to run it set to `false`.
|
||||
|
||||
`force`:: Force a merge operation, even if there is a single segment in the
|
||||
shard with no deletions. added[1.1.0]
|
||||
shard with no deletions.
|
||||
|
||||
[float]
|
||||
[[optimize-multi-index]]
|
||||
|
|
|
@ -43,7 +43,7 @@ specified as well in the URI. Those stats can be any of:
|
|||
`fielddata`:: Fielddata statistics.
|
||||
`flush`:: Flush statistics.
|
||||
`merge`:: Merge statistics.
|
||||
`query_cache`:: <<index-modules-shard-query-cache,Shard query cache>> statistics. coming[1.4.0]
|
||||
`query_cache`:: <<index-modules-shard-query-cache,Shard query cache>> statistics.
|
||||
`refresh`:: Refresh statistics.
|
||||
`suggest`:: Suggest statistics.
|
||||
`warmer`:: Warmer statistics.
|
||||
|
|
|
@ -27,8 +27,6 @@ Defines a template named template_1, with a template pattern of `te*`.
|
|||
The settings and mappings will be applied to any index name that matches
|
||||
the `te*` template.
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
It is also possible to include aliases in an index template as follows:
|
||||
|
||||
[source,js]
|
||||
|
|
|
@ -111,8 +111,6 @@ settings API:
|
|||
`index.routing.allocation.disable_replica_allocation`::
|
||||
Disable replica allocation. Defaults to `false`. Deprecated in favour for `index.routing.allocation.enable`.
|
||||
|
||||
added[1.0.0.RC1]
|
||||
|
||||
`index.routing.allocation.enable`::
|
||||
Enables shard allocation for a specific index. It can be set to:
|
||||
* `all` (default) - Allows shard allocation for all shards.
|
||||
|
|
|
@ -66,8 +66,6 @@ curl -XPUT localhost:9200/_template/template_1 -d '
|
|||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
On the same level as `types` and `source`, the `query_cache` flag is supported
|
||||
to enable query caching for the warmed search request. If not specified, it will
|
||||
use the index level configuration of query caching.
|
||||
|
@ -142,8 +140,6 @@ where
|
|||
|
||||
Instead of `_warmer` you can also use the plural `_warmers`.
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
The `query_cache` parameter can be used to enable query caching for
|
||||
the search request. If not specified, it will use the index level configuration
|
||||
of query caching.
|
||||
|
@ -182,8 +178,6 @@ Getting a warmer for specific index (or alias, or several indices) based
|
|||
on its name. The provided name can be a simple wildcard expression or
|
||||
omitted to get all warmers.
|
||||
|
||||
coming[1.4.0,The API will always include a `warmers` section, even if there aren't any warmers. Previous versions would not return the `warmers` section]
|
||||
|
||||
Some examples:
|
||||
|
||||
[source,js]
|
||||
|
|
|
@ -67,8 +67,6 @@ root and inner object types:
|
|||
[float]
|
||||
=== Unmapped fields in queries
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
Queries and filters can refer to fields which don't exist in a mapping, except
|
||||
when registering a new <<search-percolate,percolator query>> or when creating
|
||||
a <<filtered,filtered alias>>. In these two cases, any fields referred to in
|
||||
|
|
|
@ -17,8 +17,6 @@ include::fields/all-field.asciidoc[]
|
|||
|
||||
include::fields/analyzer-field.asciidoc[]
|
||||
|
||||
include::fields/boost-field.asciidoc[]
|
||||
|
||||
include::fields/parent-field.asciidoc[]
|
||||
|
||||
include::fields/field-names-field.asciidoc[]
|
||||
|
|
|
@ -1,72 +0,0 @@
|
|||
[[mapping-boost-field]]
|
||||
=== `_boost`
|
||||
|
||||
deprecated[1.0.0.RC1,See <<function-score-instead-of-boost>>]
|
||||
|
||||
Boosting is the process of enhancing the relevancy of a document or
|
||||
field. Field level mapping allows to define an explicit boost level on a
|
||||
specific field. The boost field mapping (applied on the
|
||||
<<mapping-root-object-type,root object>>) allows
|
||||
to define a boost field mapping where *its content will control the
|
||||
boost level of the document*. For example, consider the following
|
||||
mapping:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"_boost" : {"name" : "my_boost", "null_value" : 1.0}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The above mapping defines a mapping for a field named `my_boost`. If the
|
||||
`my_boost` field exists within the JSON document indexed, its value will
|
||||
control the boost level of the document indexed. For example, the
|
||||
following JSON document will be indexed with a boost value of `2.2`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"my_boost" : 2.2,
|
||||
"message" : "This is a tweet!"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[[function-score-instead-of-boost]]
|
||||
==== Function score instead of boost
|
||||
|
||||
Support for document boosting via the `_boost` field has been removed
|
||||
from Lucene and is deprecated in Elasticsearch as of v1.0.0.RC1. The
|
||||
implementation in Lucene resulted in unpredictable result when
|
||||
used with multiple fields or multi-value fields.
|
||||
|
||||
Instead, the <<query-dsl-function-score-query>> can be used to achieve
|
||||
the desired functionality by boosting each document by the value in
|
||||
any field of the document:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"query": {
|
||||
"function_score": {
|
||||
"query": { <1>
|
||||
"match": {
|
||||
"title": "your main query"
|
||||
}
|
||||
},
|
||||
"functions": [{
|
||||
"field_value_factor": { <2>
|
||||
"field": "my_boost_field"
|
||||
}
|
||||
}],
|
||||
"score_mode": "multiply"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> The original query, now wrapped in a `function_score` query.
|
||||
<2> This function returns the value in `my_boost_field`, which is then
|
||||
multiplied by the query `_score` for each document.
|
||||
|
||||
Note, that `field_value_factor` is a 1.2.x feature.
|
|
@ -1,8 +1,6 @@
|
|||
[[mapping-field-names-field]]
|
||||
=== `_field_names`
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
The `_field_names` field indexes the field names of a document, which can later
|
||||
be used to search for documents based on the fields that they contain typically
|
||||
using the `exists` and `missing` filters.
|
||||
|
|
|
@ -1,7 +1,5 @@
|
|||
[[mapping-transform]]
|
||||
== Transform
|
||||
added[1.3.0]
|
||||
|
||||
The document can be transformed before it is indexed by registering a
|
||||
script in the `transform` element of the mapping. The result of the
|
||||
transform is indexed but the original source is stored in the `_source`
|
||||
|
|
|
@ -181,7 +181,6 @@ you don't need scoring on a specific field, it is highly recommended to disable
|
|||
norms on it. In particular, this is the case for fields that are used solely
|
||||
for filtering or aggregations.
|
||||
|
||||
added[1.2.0]
|
||||
In case you would like to disable norms after the fact, it is possible to do so
|
||||
by using the <<indices-put-mapping,PUT mapping API>>. Please however note that
|
||||
norms won't be removed instantly, but as your index will receive new insertions
|
||||
|
@ -556,8 +555,6 @@ The following Similarities are configured out-of-box:
|
|||
[float]
|
||||
===== Copy to field
|
||||
|
||||
added[1.0.0.RC2]
|
||||
|
||||
Adding `copy_to` parameter to any field mapping will cause all values of this field to be copied to fields specified in
|
||||
the parameter. In the following example all values from fields `title` and `abstract` will be copied to the field
|
||||
`meta_data`.
|
||||
|
@ -590,8 +587,6 @@ Multiple fields are also supported:
|
|||
[float]
|
||||
===== Multi fields
|
||||
|
||||
added[1.0.0.RC1]
|
||||
|
||||
The `fields` options allows to map several core types fields into a single
|
||||
json source field. This can be useful if a single field need to be
|
||||
used in different ways. For example a single field is to be used for both
|
||||
|
|
|
@ -38,23 +38,13 @@ The following settings may be used:
|
|||
`cluster.routing.allocation.enable`::
|
||||
Controls shard allocation for all indices, by allowing specific
|
||||
kinds of shard to be allocated.
|
||||
added[1.0.0.RC1,Replaces `cluster.routing.allocation.disable*`]
|
||||
|
||||
Can be set to:
|
||||
* `all` (default) - Allows shard allocation for all kinds of shards.
|
||||
* `primaries` - Allows shard allocation only for primary shards.
|
||||
* `new_primaries` - Allows shard allocation only for primary shards for new indices.
|
||||
* `none` - No shard allocations of any kind are allowed for all indices.
|
||||
|
||||
`cluster.routing.allocation.disable_new_allocation`::
|
||||
deprecated[1.0.0.RC1,Replaced by `cluster.routing.allocation.enable`]
|
||||
|
||||
`cluster.routing.allocation.disable_allocation`::
|
||||
deprecated[1.0.0.RC1,Replaced by `cluster.routing.allocation.enable`]
|
||||
|
||||
|
||||
`cluster.routing.allocation.disable_replica_allocation`::
|
||||
deprecated[1.0.0.RC1,Replaced by `cluster.routing.allocation.enable`]
|
||||
|
||||
`cluster.routing.allocation.same_shard.host`::
|
||||
Allows to perform a check to prevent allocation of multiple instances
|
||||
of the same shard on a single host, based on host name and host address.
|
||||
|
|
|
@ -75,8 +75,6 @@ configure the election to handle cases of slow or congested networks
|
|||
(higher values assure less chance of failure). Once a node joins, it
|
||||
will send a join request to the master (`discovery.zen.join_timeout`)
|
||||
with a timeout defaulting at 20 times the ping timeout.
|
||||
added[1.3.0,Previously defaulted to 10 times the ping timeout].
|
||||
|
||||
Nodes can be excluded from becoming a master by setting `node.master` to
|
||||
`false`. Note, once a node is a client node (`node.client` set to
|
||||
`true`), it will not be allowed to become a master (`node.master` is
|
||||
|
@ -161,5 +159,4 @@ updates its own cluster state and replies to the master node, which waits for
|
|||
all nodes to respond, up to a timeout, before going ahead processing the next
|
||||
updates in the queue. The `discovery.zen.publish_timeout` is set by default
|
||||
to 30 seconds and can be changed dynamically through the
|
||||
<<cluster-update-settings,cluster update settings api>> added[1.1.0, The
|
||||
setting existed before but wasn't dynamic].
|
||||
<<cluster-update-settings,cluster update settings api>>
|
||||
|
|
|
@ -43,8 +43,7 @@ once all `gateway.recover_after...nodes` conditions are met.
|
|||
The `gateway.expected_nodes` allows to set how many data and master
|
||||
eligible nodes are expected to be in the cluster, and once met, the
|
||||
`gateway.recover_after_time` is ignored and recovery starts.
|
||||
Setting `gateway.expected_nodes` also defaults `gateway.recovery_after_time` to `5m` added[1.3.0, before `expected_nodes`
|
||||
required `recovery_after_time` to be set]. The `gateway.expected_data_nodes` and `gateway.expected_master_nodes`
|
||||
Setting `gateway.expected_nodes` also defaults `gateway.recovery_after_time` to `5m` The `gateway.expected_data_nodes` and `gateway.expected_master_nodes`
|
||||
settings are also supported. For example setting:
|
||||
|
||||
[source,js]
|
||||
|
|
|
@ -72,10 +72,10 @@ share the following allowed settings:
|
|||
|=======================================================================
|
||||
|Setting |Description
|
||||
|`network.tcp.no_delay` |Enable or disable tcp no delay setting.
|
||||
Defaults to `true`. coming[1.4,Can be set to `default` to not be set at all.]
|
||||
Defaults to `true`.
|
||||
|
||||
|`network.tcp.keep_alive` |Enable or disable tcp keep alive. Defaults
|
||||
to `true`. coming[1.4,Can be set to `default` to not be set at all].
|
||||
to `true`.
|
||||
|
||||
|`network.tcp.reuse_address` |Should an address be reused or not.
|
||||
Defaults to `true` on non-windows machines.
|
||||
|
|
|
@ -165,8 +165,6 @@ bin/plugin --install mobz/elasticsearch-head
|
|||
[float]
|
||||
==== Lucene version dependent plugins
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
For some plugins, such as analysis plugins, a specific major Lucene version is
|
||||
required to run. In that case, the plugin provides in its `es-plugin.properties`
|
||||
file the Lucene version for which the plugin was built for.
|
||||
|
|
|
@ -6,10 +6,6 @@ expressions. For example, scripts can be used to return "script fields"
|
|||
as part of a search request, or can be used to evaluate a custom score
|
||||
for a query and so on.
|
||||
|
||||
deprecated[1.3.0,Mvel has been deprecated and will be removed in 1.4.0]
|
||||
|
||||
added[1.3.0,Groovy scripting support]
|
||||
|
||||
The scripting module uses by default http://groovy.codehaus.org/[groovy]
|
||||
(previously http://mvel.codehaus.org/[mvel] in 1.3.x and earlier) as the
|
||||
scripting language with some extensions. Groovy is used since it is extremely
|
||||
|
@ -23,8 +19,6 @@ All places where a `script` parameter can be used, a `lang` parameter
|
|||
script. The `lang` options are `groovy`, `js`, `mvel`, `python`,
|
||||
`expression` and `native`.
|
||||
|
||||
added[1.2.0, Dynamic scripting is disabled for non-sandboxed languages by default since version 1.2.0]
|
||||
|
||||
To increase security, Elasticsearch does not allow you to specify scripts for
|
||||
non-sandboxed languages with a request. Instead, scripts must be placed in the
|
||||
`scripts` directory inside the configuration directory (the directory where
|
||||
|
|
|
@ -189,7 +189,7 @@ should be restored as well as prevent global cluster state from being restored b
|
|||
<<search-multi-index-type,multi index syntax>>. The `rename_pattern` and `rename_replacement` options can be also used to
|
||||
rename index on restore using regular expression that supports referencing the original text as explained
|
||||
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here].
|
||||
Set `include_aliases` to `false` to prevent aliases from being restored together with associated indices added[1.3.0].
|
||||
Set `include_aliases` to `false` to prevent aliases from being restored together with associated indices
|
||||
|
||||
[source,js]
|
||||
-----------------------------------
|
||||
|
@ -211,8 +211,6 @@ persistent settings are added to the existing persistent settings.
|
|||
[float]
|
||||
=== Partial restore
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
By default, entire restore operation will fail if one or more indices participating in the operation don't have
|
||||
snapshots of all shards available. It can occur if some shards failed to snapshot for example. It is still possible to
|
||||
restore such indices by setting `partial` to `true`. Please note, that only successfully snapshotted shards will be
|
||||
|
@ -222,8 +220,6 @@ restored in this case and all missing shards will be recreated empty.
|
|||
[float]
|
||||
=== Snapshot status
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
A list of currently running snapshots with their detailed status information can be obtained using the following command:
|
||||
|
||||
[source,shell]
|
||||
|
|
|
@ -56,8 +56,6 @@ tribe:
|
|||
metadata: true
|
||||
--------------------------------
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
The tribe node can also configure blocks on indices explicitly:
|
||||
|
||||
[source,yaml]
|
||||
|
@ -67,8 +65,6 @@ tribe:
|
|||
indices.write: hk*,ldn*
|
||||
--------------------------------
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
When there is a conflict and multiple clusters hold the same index, by default
|
||||
the tribe node will pick one of them. This can be configured using the `tribe.on_conflict`
|
||||
setting. It defaults to `any`, but can be set to `drop` (drop indices that have
|
||||
|
|
|
@ -64,8 +64,6 @@ next to the given cell.
|
|||
[float]
|
||||
==== Caching
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
The result of the filter is not cached by default. The
|
||||
`_cache` parameter can be set to `true` to turn caching on.
|
||||
By default the filter uses the resulting geohash cells as a cache key.
|
||||
|
|
|
@ -45,8 +45,6 @@ The `has_child` filter also accepts a filter instead of a query:
|
|||
[float]
|
||||
==== Min/Max Children
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
The `has_child` filter allows you to specify that a minimum and/or maximum
|
||||
number of children are required to match for the parent doc to be considered
|
||||
a match:
|
||||
|
|
|
@ -30,8 +30,6 @@ The `range` filter accepts the following parameters:
|
|||
`lte`:: Less-than or equal to
|
||||
`lt`:: Less-than
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
When applied on `date` fields the `range` filter accepts also a `time_zone` parameter.
|
||||
The `time_zone` parameter will be applied to your input lower and upper bounds and will
|
||||
move them to UTC time based date:
|
||||
|
|
|
@ -6,11 +6,6 @@ retrieved by a query. This can be useful if, for example, a score
|
|||
function is computationally expensive and it is sufficient to compute
|
||||
the score on a filtered set of documents.
|
||||
|
||||
`function_score` provides the same functionality that
|
||||
`custom_boost_factor`, `custom_score` and
|
||||
`custom_filters_score` provided
|
||||
but with additional capabilities such as distance and recency scoring (see description below).
|
||||
|
||||
==== Using function score
|
||||
|
||||
To use `function_score`, the user has to define a query and one or
|
||||
|
@ -73,7 +68,7 @@ First, each document is scored by the defined functions. The parameter
|
|||
`max`:: maximum score is used
|
||||
`min`:: minimum score is used
|
||||
|
||||
Because scores can be on different scales (for example, between 0 and 1 for decay functions but arbitrary for `field_value_factor`) and also because sometimes a different impact of functions on the score is desirable, the score of each function can be adjusted with a user defined `weight` (coming[1.4.0]). The `weight` can be defined per function in the `functions` array (example above) and is multiplied with the score computed by the respective function.
|
||||
Because scores can be on different scales (for example, between 0 and 1 for decay functions but arbitrary for `field_value_factor`) and also because sometimes a different impact of functions on the score is desirable, the score of each function can be adjusted with a user defined `weight` (). The `weight` can be defined per function in the `functions` array (example above) and is multiplied with the score computed by the respective function.
|
||||
If weight is given without any other function declaration, `weight` acts as a function that simply returns the `weight`.
|
||||
|
||||
The new score can be restricted to not exceed a certain limit by setting
|
||||
|
@ -135,8 +130,6 @@ you wish to inhibit this, set `"boost_mode": "replace"`
|
|||
|
||||
===== Weight
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
The `weight` score allows you to multiply the score by the provided
|
||||
`weight`. This can sometimes be desired since boost value set on
|
||||
specific queries gets normalized, while for this score function it does
|
||||
|
@ -147,13 +140,6 @@ not.
|
|||
"weight" : number
|
||||
--------------------------------------------------
|
||||
|
||||
===== Boost factor
|
||||
|
||||
deprecated[1.4.0]
|
||||
|
||||
Same as `weight`. Use `weight` instead.
|
||||
|
||||
|
||||
===== Random
|
||||
|
||||
The `random_score` generates scores using a hash of the `_uid` field,
|
||||
|
@ -172,8 +158,6 @@ be a memory intensive operation since the values are unique.
|
|||
|
||||
===== Field Value factor
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
The `field_value_factor` function allows you to use a field from a document to
|
||||
influence the score. It's similar to using the `script_score` function, however,
|
||||
it avoids the overhead of scripting. If used on a multi-valued field, only the
|
||||
|
@ -489,105 +473,3 @@ are supported.
|
|||
If the numeric field is missing in the document, the function will
|
||||
return 1.
|
||||
|
||||
==== Relation to `custom_boost`, `custom_score` and `custom_filters_score`
|
||||
|
||||
The `custom_boost_factor` query
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"custom_boost_factor": {
|
||||
"boost_factor": 5.2,
|
||||
"query": {...}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
becomes
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"weight": 5.2,
|
||||
"query": {...}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The `custom_score` query
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"custom_score": {
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"query": {...},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
becomes
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"boost_mode": "replace",
|
||||
"query": {...},
|
||||
"script_score": {
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
and the `custom_filters_score`
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"custom_filters_score": {
|
||||
"filters": [
|
||||
{
|
||||
"boost": "3",
|
||||
"filter": {...}
|
||||
},
|
||||
{
|
||||
"filter": {...},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
],
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"query": {...},
|
||||
"score_mode": "first"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
becomes:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"functions": [
|
||||
{
|
||||
"weight": "3",
|
||||
"filter": {...}
|
||||
},
|
||||
{
|
||||
"filter": {...},
|
||||
"script_score": {
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
}
|
||||
],
|
||||
"query": {...},
|
||||
"score_mode": "first"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
|
|
@ -56,8 +56,6 @@ inside the `has_child` query:
|
|||
[float]
|
||||
==== Min/Max Children
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
The `has_child` query allows you to specify that a minimum and/or maximum
|
||||
number of children are required to match for the parent doc to be considered
|
||||
a match:
|
||||
|
|
|
@ -45,21 +45,10 @@ Individual fields can be boosted with the caret (`^`) notation:
|
|||
--------------------------------------------------
|
||||
<1> The `subject` field is three times as important as the `message` field.
|
||||
|
||||
[float]
|
||||
=== `use_dis_max`
|
||||
|
||||
deprecated[1.1.0,Use `type:best_fields` or `type:most_fields` instead. See <<multi-match-types>>]
|
||||
|
||||
By default, the `multi_match` query generates a `match` clause per field, then wraps them
|
||||
in a `dis_max` query. By setting `use_dis_max` to `false`, they will be wrapped in a
|
||||
`bool` query instead.
|
||||
|
||||
[[multi-match-types]]
|
||||
[float]
|
||||
=== Types of `multi_match` query:
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
The way the `multi_match` query is executed internally depends on the `type`
|
||||
parameter, which can be set to:
|
||||
|
||||
|
|
|
@ -72,7 +72,7 @@ both>>.
|
|||
|`lenient` |If set to `true` will cause format based failures (like
|
||||
providing text to a numeric field) to be ignored.
|
||||
|
||||
|`locale` | added[1.1.0] Locale that should be used for string conversions.
|
||||
|`locale` | Locale that should be used for string conversions.
|
||||
Defaults to `ROOT`.
|
||||
|=======================================================================
|
||||
|
||||
|
|
|
@ -29,8 +29,6 @@ The `range` query accepts the following parameters:
|
|||
`lt`:: Less-than
|
||||
`boost`:: Sets the boost value of the query, defaults to `1.0`
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
When applied on `date` fields the `range` filter accepts also a `time_zone` parameter.
|
||||
The `time_zone` parameter will be applied to your input lower and upper bounds and will
|
||||
move them to UTC time based date:
|
||||
|
|
|
@ -44,10 +44,10 @@ enable. Defaults to `ALL`.
|
|||
be automatically lower-cased or not (since they are not analyzed). Defaults to
|
||||
true.
|
||||
|
||||
|`locale` | added[1.1.0] Locale that should be used for string conversions.
|
||||
|`locale` | Locale that should be used for string conversions.
|
||||
Defaults to `ROOT`.
|
||||
|
||||
|`lenient` | added[1.1.0] If set to `true` will cause format based failures
|
||||
|`lenient` | If set to `true` will cause format based failures
|
||||
(like providing text to a numeric field) to be ignored.
|
||||
|=======================================================================
|
||||
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[query-dsl-template-query]]
|
||||
=== Template Query
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
A query that accepts a query template and a map of key/value pairs to fill in
|
||||
template parameters.
|
||||
|
||||
|
@ -95,8 +93,6 @@ which is then turned into:
|
|||
}
|
||||
------------------------------------------
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
You can register a template by storing it in the elasticsearch index `.scripts` or by using the REST API. (See <<search-template>> for more details)
|
||||
In order to execute the stored template, reference it by name in the `query`
|
||||
parameter:
|
||||
|
|
|
@ -118,8 +118,6 @@ define fixed number of multiple buckets, and others dynamically create the bucke
|
|||
[float]
|
||||
=== Caching heavy aggregations
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
Frequently used aggregations (e.g. for display on the home page of a website)
|
||||
can be cached for faster responses. These cached results are the same results
|
||||
that would be returned by an uncached aggregation -- you will never get stale
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-bucket-children-aggregation]]
|
||||
=== Children Aggregation
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
A special single bucket aggregation that enables aggregating from buckets on parent document types to buckets on child documents.
|
||||
|
||||
This aggregation relies on the <<mapping-parent-field,_parent field>> in the mapping. This aggregation has a single option:
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-bucket-filters-aggregation]]
|
||||
=== Filters Aggregation
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
Defines a multi bucket aggregations where each bucket is associated with a
|
||||
filter. Each bucket will collect all documents that match its associated
|
||||
filter.
|
||||
|
|
|
@ -117,7 +117,7 @@ precision:: Optional. The string length of the geohashes used to define
|
|||
size:: Optional. The maximum number of geohash buckets to return
|
||||
(defaults to 10,000). When results are trimmed, buckets are
|
||||
prioritised based on the volumes of documents they contain.
|
||||
added[1.1.0] A value of `0` will return all buckets that
|
||||
A value of `0` will return all buckets that
|
||||
contain a hit, use with caution as this could use a lot of CPU
|
||||
and network bandwith if there are many buckets.
|
||||
|
||||
|
@ -126,6 +126,6 @@ shard_size:: Optional. To allow for more accurate counting of the top cells
|
|||
returning `max(10,(size x number-of-shards))` buckets from each
|
||||
shard. If this heuristic is undesirable, the number considered
|
||||
from each shard can be over-ridden using this parameter.
|
||||
added[1.1.0] A value of `0` makes the shard size unlimited.
|
||||
A value of `0` makes the shard size unlimited.
|
||||
|
||||
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-bucket-reverse-nested-aggregation]]
|
||||
=== Reverse nested Aggregation
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
A special single bucket aggregation that enables aggregating on parent docs from nested documents. Effectively this
|
||||
aggregation can break out of the nested block structure and link to other nested structures or the root document,
|
||||
which allows nesting other aggregations that aren't part of the nested object in a nested aggregation.
|
||||
|
|
|
@ -10,8 +10,6 @@ This feature is marked as experimental, and may be subject to change in the
|
|||
future. If you use this feature, please let us know your experience with it!
|
||||
=====
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
|
||||
.Example use cases:
|
||||
* Suggesting "H5N1" when users search for "bird flu" in text
|
||||
|
@ -231,8 +229,6 @@ are presented unstemmed, highlighted, with the right case, in the right order an
|
|||
============
|
||||
|
||||
==== Custom background sets
|
||||
added[1.2.0]
|
||||
|
||||
|
||||
Ordinarily, the foreground set of documents is "diffed" against a background set of all the documents in your index.
|
||||
However, sometimes it may prove useful to use a narrower background set as the basis for comparisons.
|
||||
|
@ -284,8 +280,6 @@ However, the `size` and `shard size` settings covered in the next section provid
|
|||
The scores are derived from the doc frequencies in _foreground_ and _background_ sets. The _absolute_ change in popularity (foregroundPercent - backgroundPercent) would favor common terms whereas the _relative_ change in popularity (foregroundPercent/ backgroundPercent) would favor rare terms. Rare vs common is essentially a precision vs recall balance and so the absolute and relative changes are multiplied to provide a sweet spot between precision and recall.
|
||||
|
||||
===== mutual information
|
||||
added[1.3.0]
|
||||
|
||||
Mutual information as described in "Information Retrieval", Manning et al., Chapter 13.5.1 can be used as significance score by adding the parameter
|
||||
|
||||
[source,js]
|
||||
|
@ -308,8 +302,6 @@ Per default, the assumption is that the documents in the bucket are also contain
|
|||
|
||||
|
||||
===== Chi square
|
||||
coming[1.4.0]
|
||||
|
||||
Chi square as described in "Information Retrieval", Manning et al., Chapter 13.5.2 can be used as significance score by adding the parameter
|
||||
|
||||
[source,js]
|
||||
|
@ -323,8 +315,6 @@ Chi square behaves like mutual information and can be configured with the same p
|
|||
|
||||
|
||||
===== google normalized distance
|
||||
coming[1.4.0]
|
||||
|
||||
Google normalized distance as described in "The Google Similarity Distance", Cilibrasi and Vitanyi, 2007 (http://arxiv.org/pdf/cs/0412098v3.pdf) can be used as significance score by adding the parameter
|
||||
|
||||
[source,js]
|
||||
|
@ -355,7 +345,7 @@ If the number of unique terms is greater than `size`, the returned list can be s
|
|||
(it could be that the term counts are slightly off and it could even be that a term that should have been in the top
|
||||
size buckets was not returned).
|
||||
|
||||
added[1.2.0] If set to `0`, the `size` will be set to `Integer.MAX_VALUE`.
|
||||
If set to `0`, the `size` will be set to `Integer.MAX_VALUE`.
|
||||
|
||||
To ensure better accuracy a multiple of the final `size` is used as the number of terms to request from each shard
|
||||
using a heuristic based on the number of shards. To take manual control of this setting the `shard_size` parameter
|
||||
|
@ -368,7 +358,7 @@ a consolidated review by the reducing node before the final selection. Obviously
|
|||
will cause extra network traffic and RAM usage so this is quality/cost trade off that needs to be balanced. If `shard_size` is set to -1 (the default) then `shard_size` will be automatically estimated based on the number of shards and the `size` parameter.
|
||||
|
||||
|
||||
added[1.2.0] If set to `0`, the `shard_size` will be set to `Integer.MAX_VALUE`.
|
||||
If set to `0`, the `shard_size` will be set to `Integer.MAX_VALUE`.
|
||||
|
||||
|
||||
NOTE: `shard_size` cannot be smaller than `size` (as it doesn't make much sense). When it is, elasticsearch will
|
||||
|
@ -399,7 +389,7 @@ The above aggregation would only return tags which have been found in 10 hits or
|
|||
|
||||
Terms that score highly will be collected on a shard level and merged with the terms collected from other shards in a second step. However, the shard does not have the information about the global term frequencies available. The decision if a term is added to a candidate list depends only on the score computed on the shard using local shard frequencies, not the global frequencies of the word. The `min_doc_count` criterion is only applied after merging local terms statistics of all shards. In a way the decision to add the term as a candidate is made without being very _certain_ about if the term will actually reach the required `min_doc_count`. This might cause many (globally) high frequent terms to be missing in the final result if low frequent but high scoring terms populated the candidate lists. To avoid this, the `shard_size` parameter can be increased to allow more candidate terms on the shards. However, this increases memory consumption and network traffic.
|
||||
|
||||
added[1.2.0] `shard_min_doc_count` parameter
|
||||
`shard_min_doc_count` parameter
|
||||
|
||||
The parameter `shard_min_doc_count` regulates the _certainty_ a shard has if the term should actually be added to the candidate list or not with respect to the `min_doc_count`. Terms will only be considered if their local shard frequency within the set is higher than the `shard_min_doc_count`. If your dictionary contains many low frequent words and you are not interested in these (for example misspellings), then you can set the `shard_min_doc_count` parameter to filter out candidate terms on a shard level that will with a reasonable certainty not reach the required `min_doc_count` even after merging the local frequencies. `shard_min_doc_count` is set to `1` per default and has no effect unless you explicitly set it.
|
||||
|
||||
|
|
|
@ -54,19 +54,19 @@ size buckets was not returned). If set to `0`, the `size` will be set to `Intege
|
|||
|
||||
==== Document counts are approximate
|
||||
|
||||
As described above, the document counts (and the results of any sub aggregations) in the terms aggregation are not always
|
||||
accurate. This is because each shard provides its own view of what the ordered list of terms should be and these are
|
||||
As described above, the document counts (and the results of any sub aggregations) in the terms aggregation are not always
|
||||
accurate. This is because each shard provides its own view of what the ordered list of terms should be and these are
|
||||
combined to give a final view. Consider the following scenario:
|
||||
|
||||
A request is made to obtain the top 5 terms in the field product, ordered by descending document count from an index with
|
||||
3 shards. In this case each shard is asked to give its top 5 terms.
|
||||
A request is made to obtain the top 5 terms in the field product, ordered by descending document count from an index with
|
||||
3 shards. In this case each shard is asked to give its top 5 terms.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"aggs" : {
|
||||
"products" : {
|
||||
"terms" : {
|
||||
"terms" : {
|
||||
"field" : "product",
|
||||
"size" : 5
|
||||
}
|
||||
|
@ -75,23 +75,23 @@ A request is made to obtain the top 5 terms in the field product, ordered by des
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The terms for each of the three shards are shown below with their
|
||||
The terms for each of the three shards are shown below with their
|
||||
respective document counts in brackets:
|
||||
|
||||
[width="100%",cols="^2,^2,^2,^2",options="header"]
|
||||
|=========================================================
|
||||
| | Shard A | Shard B | Shard C
|
||||
|
||||
| 1 | Product A (25) | Product A (30) | Product A (45)
|
||||
| 2 | Product B (18) | Product B (25) | Product C (44)
|
||||
| 3 | Product C (6) | Product F (17) | Product Z (36)
|
||||
| 4 | Product D (3) | Product Z (16) | Product G (30)
|
||||
| 5 | Product E (2) | Product G (15) | Product E (29)
|
||||
| 6 | Product F (2) | Product H (14) | Product H (28)
|
||||
| 7 | Product G (2) | Product I (10) | Product Q (2)
|
||||
| 8 | Product H (2) | Product Q (6) | Product D (1)
|
||||
| 9 | Product I (1) | Product J (8) |
|
||||
| 10 | Product J (1) | Product C (4) |
|
||||
| 1 | Product A (25) | Product A (30) | Product A (45)
|
||||
| 2 | Product B (18) | Product B (25) | Product C (44)
|
||||
| 3 | Product C (6) | Product F (17) | Product Z (36)
|
||||
| 4 | Product D (3) | Product Z (16) | Product G (30)
|
||||
| 5 | Product E (2) | Product G (15) | Product E (29)
|
||||
| 6 | Product F (2) | Product H (14) | Product H (28)
|
||||
| 7 | Product G (2) | Product I (10) | Product Q (2)
|
||||
| 8 | Product H (2) | Product Q (6) | Product D (1)
|
||||
| 9 | Product I (1) | Product J (8) |
|
||||
| 10 | Product J (1) | Product C (4) |
|
||||
|
||||
|=========================================================
|
||||
|
||||
|
@ -102,41 +102,41 @@ The shards will return their top 5 terms so the results from the shards will be:
|
|||
|=========================================================
|
||||
| | Shard A | Shard B | Shard C
|
||||
|
||||
| 1 | Product A (25) | Product A (30) | Product A (45)
|
||||
| 2 | Product B (18) | Product B (25) | Product C (44)
|
||||
| 3 | Product C (6) | Product F (17) | Product Z (36)
|
||||
| 4 | Product D (3) | Product Z (16) | Product G (30)
|
||||
| 5 | Product E (2) | Product G (15) | Product E (29)
|
||||
| 1 | Product A (25) | Product A (30) | Product A (45)
|
||||
| 2 | Product B (18) | Product B (25) | Product C (44)
|
||||
| 3 | Product C (6) | Product F (17) | Product Z (36)
|
||||
| 4 | Product D (3) | Product Z (16) | Product G (30)
|
||||
| 5 | Product E (2) | Product G (15) | Product E (29)
|
||||
|
||||
|=========================================================
|
||||
|
||||
Taking the top 5 results from each of the shards (as requested) and combining them to make a final top 5 list produces
|
||||
Taking the top 5 results from each of the shards (as requested) and combining them to make a final top 5 list produces
|
||||
the following:
|
||||
|
||||
[width="40%",cols="^2,^2"]
|
||||
|=========================================================
|
||||
|
||||
| 1 | Product A (100)
|
||||
| 2 | Product Z (52)
|
||||
| 3 | Product C (50)
|
||||
| 4 | Product G (45)
|
||||
| 5 | Product B (43)
|
||||
| 1 | Product A (100)
|
||||
| 2 | Product Z (52)
|
||||
| 3 | Product C (50)
|
||||
| 4 | Product G (45)
|
||||
| 5 | Product B (43)
|
||||
|
||||
|=========================================================
|
||||
|
||||
Because Product A was returned from all shards we know that its document count value is accurate. Product C was only
|
||||
returned by shards A and C so its document count is shown as 50 but this is not an accurate count. Product C exists on
|
||||
shard B, but its count of 4 was not high enough to put Product C into the top 5 list for that shard. Product Z was also
|
||||
returned only by 2 shards but the third shard does not contain the term. There is no way of knowing, at the point of
|
||||
combining the results to produce the final list of terms, that there is an error in the document count for Product C and
|
||||
not for Product Z. Product H has a document count of 44 across all 3 shards but was not included in the final list of
|
||||
Because Product A was returned from all shards we know that its document count value is accurate. Product C was only
|
||||
returned by shards A and C so its document count is shown as 50 but this is not an accurate count. Product C exists on
|
||||
shard B, but its count of 4 was not high enough to put Product C into the top 5 list for that shard. Product Z was also
|
||||
returned only by 2 shards but the third shard does not contain the term. There is no way of knowing, at the point of
|
||||
combining the results to produce the final list of terms, that there is an error in the document count for Product C and
|
||||
not for Product Z. Product H has a document count of 44 across all 3 shards but was not included in the final list of
|
||||
terms because it did not make it into the top five terms on any of the shards.
|
||||
|
||||
==== Shard Size
|
||||
|
||||
The higher the requested `size` is, the more accurate the results will be, but also, the more expensive it will be to
|
||||
compute the final results (both due to bigger priority queues that are managed on a shard level and due to bigger data
|
||||
transfers between the nodes and the client).
|
||||
transfers between the nodes and the client).
|
||||
|
||||
The `shard_size` parameter can be used to minimize the extra work that comes with bigger requested `size`. When defined,
|
||||
it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the
|
||||
|
@ -148,17 +148,15 @@ the client. If set to `0`, the `shard_size` will be set to `Integer.MAX_VALUE`.
|
|||
NOTE: `shard_size` cannot be smaller than `size` (as it doesn't make much sense). When it is, elasticsearch will
|
||||
override it and reset it to be equal to `size`.
|
||||
|
||||
added[1.1.0] It is possible to not limit the number of terms that are returned by setting `size` to `0`. Don't use this
|
||||
It is possible to not limit the number of terms that are returned by setting `size` to `0`. Don't use this
|
||||
on high-cardinality fields as this will kill both your CPU since terms need to be return sorted, and your network.
|
||||
|
||||
==== Calculating Document Count Error
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
There are two error values which can be shown on the terms aggregation. The first gives a value for the aggregation as
|
||||
a whole which represents the maximum potential document count for a term which did not make it into the final list of
|
||||
terms. This is calculated as the sum of the document count from the last term returned from each shard .For the example
|
||||
given above the value would be 46 (2 + 15 + 29). This means that in the worst case scenario a term which was not returned
|
||||
There are two error values which can be shown on the terms aggregation. The first gives a value for the aggregation as
|
||||
a whole which represents the maximum potential document count for a term which did not make it into the final list of
|
||||
terms. This is calculated as the sum of the document count from the last term returned from each shard .For the example
|
||||
given above the value would be 46 (2 + 15 + 29). This means that in the worst case scenario a term which was not returned
|
||||
could have the 4th highest document count.
|
||||
|
||||
[source,js]
|
||||
|
@ -185,13 +183,13 @@ could have the 4th highest document count.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The second error value can be enabled by setting the `show_term_doc_count_error` parameter to true. This shows an error value
|
||||
for each term returned by the aggregation which represents the 'worst case' error in the document count and can be useful when
|
||||
deciding on a value for the `shard_size` parameter. This is calculated by summing the document counts for the last term returned
|
||||
by all shards which did not return the term. In the example above the error in the document count for Product C would be 15 as
|
||||
Shard B was the only shard not to return the term and the document count of the last termit did return was 15. The actual document
|
||||
count of Product C was 54 so the document count was only actually off by 4 even though the worst case was that it would be off by
|
||||
15. Product A, however has an error of 0 for its document count, since every shard returned it we can be confident that the count
|
||||
The second error value can be enabled by setting the `show_term_doc_count_error` parameter to true. This shows an error value
|
||||
for each term returned by the aggregation which represents the 'worst case' error in the document count and can be useful when
|
||||
deciding on a value for the `shard_size` parameter. This is calculated by summing the document counts for the last term returned
|
||||
by all shards which did not return the term. In the example above the error in the document count for Product C would be 15 as
|
||||
Shard B was the only shard not to return the term and the document count of the last termit did return was 15. The actual document
|
||||
count of Product C was 54 so the document count was only actually off by 4 even though the worst case was that it would be off by
|
||||
15. Product A, however has an error of 0 for its document count, since every shard returned it we can be confident that the count
|
||||
returned is accurate.
|
||||
|
||||
[source,js]
|
||||
|
@ -220,10 +218,10 @@ returned is accurate.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
These errors can only be calculated in this way when the terms are ordered by descending document count. When the aggregation is
|
||||
ordered by the terms values themselves (either ascending or descending) there is no error in the document count since if a shard
|
||||
does not return a particular term which appears in the results from another shard, it must not have that term in its index. When the
|
||||
aggregation is either sorted by a sub aggregation or in order of ascending document count, the error in the document counts cannot be
|
||||
These errors can only be calculated in this way when the terms are ordered by descending document count. When the aggregation is
|
||||
ordered by the terms values themselves (either ascending or descending) there is no error in the document count since if a shard
|
||||
does not return a particular term which appears in the results from another shard, it must not have that term in its index. When the
|
||||
aggregation is either sorted by a sub aggregation or in order of ascending document count, the error in the document counts cannot be
|
||||
determined and is given a value of -1 to indicate this.
|
||||
|
||||
==== Order
|
||||
|
@ -342,8 +340,6 @@ PATH := <AGG_NAME>[<AGG_SEPARATOR><AGG_NAME>]*[<METRIC_SEPARATOR
|
|||
|
||||
The above will sort the countries buckets based on the average height among the female population.
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
Multiple criteria can be used to order the buckets by providing an array of order criteria such as the following:
|
||||
|
||||
[source,js]
|
||||
|
@ -368,13 +364,13 @@ Multiple criteria can be used to order the buckets by providing an array of orde
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The above will sort the countries buckets based on the average height among the female population and then by
|
||||
The above will sort the countries buckets based on the average height among the female population and then by
|
||||
their `doc_count` in descending order.
|
||||
|
||||
NOTE: In the event that two buckets share the same values for all order criteria the bucket's term value is used as a
|
||||
NOTE: In the event that two buckets share the same values for all order criteria the bucket's term value is used as a
|
||||
tie-breaker in ascending alphabetical order to prevent non-deterministic ordering of buckets.
|
||||
|
||||
==== Minimum document count
|
||||
==== Minimum document count
|
||||
|
||||
It is possible to only return terms that match more than a configured number of hits using the `min_doc_count` option:
|
||||
|
||||
|
@ -397,7 +393,7 @@ The above aggregation would only return tags which have been found in 10 hits or
|
|||
|
||||
Terms are collected and ordered on a shard level and merged with the terms collected from other shards in a second step. However, the shard does not have the information about the global document count available. The decision if a term is added to a candidate list depends only on the order computed on the shard using local shard frequencies. The `min_doc_count` criterion is only applied after merging local terms statistics of all shards. In a way the decision to add the term as a candidate is made without being very _certain_ about if the term will actually reach the required `min_doc_count`. This might cause many (globally) high frequent terms to be missing in the final result if low frequent terms populated the candidate lists. To avoid this, the `shard_size` parameter can be increased to allow more candidate terms on the shards. However, this increases memory consumption and network traffic.
|
||||
|
||||
added[1.2.0] `shard_min_doc_count` parameter
|
||||
`shard_min_doc_count` parameter
|
||||
|
||||
The parameter `shard_min_doc_count` regulates the _certainty_ a shard has if the term should actually be added to the candidate list or not with respect to the `min_doc_count`. Terms will only be considered if their local shard frequency within the set is higher than the `shard_min_doc_count`. If your dictionary contains many low frequent terms and you are not interested in those (for example misspellings), then you can set the `shard_min_doc_count` parameter to filter out candidate terms on a shard level that will with a reasonable certainty not reach the required `min_doc_count` even after merging the local counts. `shard_min_doc_count` is set to `0` per default and has no effect unless you explicitly set it.
|
||||
|
||||
|
@ -530,7 +526,7 @@ strings that represent the terms as they are found in the index:
|
|||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
--------------------------------------------------
|
||||
|
||||
==== Multi-field terms aggregation
|
||||
|
||||
|
@ -561,12 +557,12 @@ this single field, which will benefit from the global ordinals optimization.
|
|||
|
||||
==== Collect mode
|
||||
|
||||
added[1.3.0] Deferring calculation of child aggregations
|
||||
Deferring calculation of child aggregations
|
||||
|
||||
For fields with many unique terms and a small number of required results it can be more efficient to delay the calculation
|
||||
of child aggregations until the top parent-level aggs have been pruned. Ordinarily, all branches of the aggregation tree
|
||||
are expanded in one depth-first pass and only then any pruning occurs. In some rare scenarios this can be very wasteful and can hit memory constraints.
|
||||
An example problem scenario is querying a movie database for the 10 most popular actors and their 5 most common co-stars:
|
||||
An example problem scenario is querying a movie database for the 10 most popular actors and their 5 most common co-stars:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -590,11 +586,11 @@ An example problem scenario is querying a movie database for the 10 most popular
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Even though the number of movies may be comparatively small and we want only 50 result buckets there is a combinatorial explosion of buckets
|
||||
during calculation - a single movie will produce n² buckets where n is the number of actors. The sane option would be to first determine
|
||||
Even though the number of movies may be comparatively small and we want only 50 result buckets there is a combinatorial explosion of buckets
|
||||
during calculation - a single movie will produce n² buckets where n is the number of actors. The sane option would be to first determine
|
||||
the 10 most popular actors and only then examine the top co-stars for these 10 actors. This alternative strategy is what we call the `breadth_first` collection
|
||||
mode as opposed to the default `depth_first` mode:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -620,24 +616,20 @@ mode as opposed to the default `depth_first` mode:
|
|||
|
||||
|
||||
When using `breadth_first` mode the set of documents that fall into the uppermost buckets are
|
||||
cached for subsequent replay so there is a memory overhead in doing this which is linear with the number of matching documents.
|
||||
cached for subsequent replay so there is a memory overhead in doing this which is linear with the number of matching documents.
|
||||
In most requests the volume of buckets generated is smaller than the number of documents that fall into them so the default `depth_first`
|
||||
collection mode is normally the best bet but occasionally the `breadth_first` strategy can be significantly more efficient. Currently
|
||||
collection mode is normally the best bet but occasionally the `breadth_first` strategy can be significantly more efficient. Currently
|
||||
elasticsearch will always use the `depth_first` collect_mode unless explicitly instructed to use `breadth_first` as in the above example.
|
||||
Note that the `order` parameter can still be used to refer to data from a child aggregation when using the `breadth_first` setting - the parent
|
||||
aggregation understands that this child aggregation will need to be called first before any of the other child aggregations.
|
||||
|
||||
WARNING: It is not possible to nest aggregations such as `top_hits` which require access to match score information under an aggregation that uses
|
||||
the `breadth_first` collection mode. This is because this would require a RAM buffer to hold the float score value for every document and
|
||||
this would typically be too costly in terms of RAM.
|
||||
this would typically be too costly in terms of RAM.
|
||||
|
||||
[[search-aggregations-bucket-terms-aggregation-execution-hint]]
|
||||
==== Execution hint
|
||||
|
||||
added[1.2.0] Added the `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality` execution modes
|
||||
|
||||
deprecated[1.3.0] Removed the `ordinals` execution mode
|
||||
|
||||
There are different mechanisms by which terms aggregations can be executed:
|
||||
|
||||
- by using field values directly in order to aggregate data per-bucket (`map`)
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-metrics-cardinality-aggregation]]
|
||||
=== Cardinality Aggregation
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
A `single-value` metrics aggregation that calculates an approximate count of
|
||||
distinct values. Values can be extracted either from specific fields in the
|
||||
document or generated by a script.
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-metrics-geobounds-aggregation]]
|
||||
=== Geo Bounds Aggregation
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
A metric aggregation that computes the bounding box containing all geo_point values for a field.
|
||||
|
||||
.Experimental!
|
||||
|
|
|
@ -1,20 +1,11 @@
|
|||
[[search-aggregations-metrics-percentile-aggregation]]
|
||||
=== Percentiles Aggregation
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
A `multi-value` metrics aggregation that calculates one or more percentiles
|
||||
over numeric values extracted from the aggregated documents. These values
|
||||
can be extracted either from specific numeric fields in the documents, or
|
||||
be generated by a provided script.
|
||||
|
||||
.Experimental!
|
||||
[IMPORTANT]
|
||||
=====
|
||||
This feature is marked as experimental, and may be subject to change in the
|
||||
future. If you use this feature, please let us know your experience with it!
|
||||
=====
|
||||
|
||||
Percentiles show the point at which a certain percentage of observed values
|
||||
occur. For example, the 95th percentile is the value which is greater than 95%
|
||||
of the observed values.
|
||||
|
@ -71,9 +62,6 @@ percentiles: `[ 1, 5, 25, 50, 75, 95, 99 ]`. The response will look like this:
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
WARNING: added[1.2.0] The above response structure applies for `1.2.0` and above. Pre `1.2.0` release, the `values` object was
|
||||
missing and all the percentiles where placed directly under the aggregation name object
|
||||
|
||||
As you can see, the aggregation will return a calculated value for each percentile
|
||||
in the default range. If we assume response times are in milliseconds, it is
|
||||
immediately obvious that the webpage normally loads in 15-30ms, but occasionally
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-metrics-percentile-rank-aggregation]]
|
||||
=== Percentile Ranks Aggregation
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
A `multi-value` metrics aggregation that calculates one or more percentile ranks
|
||||
over numeric values extracted from the aggregated documents. These values
|
||||
can be extracted either from specific numeric fields in the documents, or
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-metrics-scripted-metric-aggregation]]
|
||||
=== Scripted Metric Aggregation
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
A metric aggregation that executes using scripts to provide a metric output.
|
||||
|
||||
.Experimental!
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-aggregations-metrics-top-hits-aggregation]]
|
||||
=== Top hits Aggregation
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
A `top_hits` metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended
|
||||
to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
|
||||
|
||||
|
|
|
@ -34,8 +34,6 @@ The name of the aggregation (`grades_count` above) also serves as the key by whi
|
|||
retrieved from the returned response.
|
||||
|
||||
==== Script
|
||||
added[1.1.0]
|
||||
|
||||
Counting the values generated by a script:
|
||||
|
||||
[source,js]
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-benchmark]]
|
||||
== Benchmark
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
.Experimental!
|
||||
[IMPORTANT]
|
||||
=====
|
||||
|
|
|
@ -21,7 +21,7 @@ $ curl -XGET 'http://localhost:9200/twitter/tweet/_count' -d '
|
|||
--------------------------------------------------
|
||||
|
||||
NOTE: The query being sent in the body must be nested in a `query` key, same as
|
||||
the <<search-search,search api>> works added[1.0.0.RC1,The query was previously the top-level object].
|
||||
the <<search-search,search api>> works
|
||||
|
||||
Both examples above do the same thing, which is count the number of
|
||||
tweets from the twitter index for a certain user. The result is:
|
||||
|
@ -64,7 +64,7 @@ query.
|
|||
|default_operator |The default operator to be used, can be `AND` or
|
||||
`OR`. Defaults to `OR`.
|
||||
|
||||
|coming[1.4.0] terminate_after |The maximum count for each shard, upon
|
||||
|terminate_after |The maximum count for each shard, upon
|
||||
reaching which the query execution will terminate early.
|
||||
If set, the response will have a boolean field `terminated_early` to
|
||||
indicate whether the query execution has actually terminated_early.
|
||||
|
|
|
@ -63,7 +63,7 @@ This will yield the same result as the previous request.
|
|||
[horizontal]
|
||||
`_source`::
|
||||
|
||||
added[1.0.0.Beta1] Set to `true` to retrieve the `_source` of the document explained. You can also
|
||||
Set to `true` to retrieve the `_source` of the document explained. You can also
|
||||
retrieve part of the document by using `_source_include` & `_source_exclude` (see <<get-source-filtering,Get API>> for more details)
|
||||
|
||||
`fields`::
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-percolate]]
|
||||
== Percolator
|
||||
|
||||
added[1.0.0.Beta1]
|
||||
|
||||
Traditionally you design documents based on your data and store them into an index and then define queries via the search api
|
||||
in order to retrieve these documents. The percolator works in the opposite direction, first you store queries into an
|
||||
index and then via the percolate api you define documents in order to retrieve these queries.
|
||||
|
@ -20,7 +18,6 @@ in the percolate api.
|
|||
|
||||
Field referred to in a percolator query must *already* exist in the mapping
|
||||
assocated with the index used for percolation.
|
||||
coming[1.4.0,Applies to indices created in 1.4.0 or later]
|
||||
There are two ways to make sure that a field mapping exist:
|
||||
|
||||
* Add or update a mapping via the <<indices-create-index,create index>> or
|
||||
|
|
|
@ -70,13 +70,13 @@ And here is a sample response:
|
|||
|
||||
`query_cache`::
|
||||
|
||||
coming[1.4.0] Set to `true` or `false` to enable or disable the caching
|
||||
Set to `true` or `false` to enable or disable the caching
|
||||
of search results for requests where `?search_type=count`, ie
|
||||
aggregations and suggestions. See <<index-modules-shard-query-cache>>.
|
||||
|
||||
`terminate_after`::
|
||||
|
||||
coming[1.4.0] The maximum number of documents to collect for each shard,
|
||||
The maximum number of documents to collect for each shard,
|
||||
upon reaching which the query execution will terminate early. If set, the
|
||||
response will have a boolean field `terminated_early` to indicate whether
|
||||
the query execution has actually terminated_early. Defaults to no
|
||||
|
|
|
@ -44,60 +44,3 @@ Script fields can also be automatically detected and used as fields, so
|
|||
things like `_source.obj1.field1` can be used, though not recommended, as
|
||||
`obj1.field1` will work as well.
|
||||
|
||||
[[partial]]
|
||||
==== Partial
|
||||
|
||||
deprecated[1.0.0Beta1,Replaced by <<search-request-source-filtering>>]
|
||||
|
||||
|
||||
When loading data from `_source`, partial fields can be used to use
|
||||
wildcards to control what part of the `_source` will be loaded based on
|
||||
`include` and `exclude` patterns. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"query" : {
|
||||
"match_all" : {}
|
||||
},
|
||||
"partial_fields" : {
|
||||
"partial1" : {
|
||||
"include" : "obj1.obj2.*",
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
And one that will also exclude `obj1.obj3`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"query" : {
|
||||
"match_all" : {}
|
||||
},
|
||||
"partial_fields" : {
|
||||
"partial1" : {
|
||||
"include" : "obj1.obj2.*",
|
||||
"exclude" : "obj1.obj3.*"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Both `include` and `exclude` support multiple patterns:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"query" : {
|
||||
"match_all" : {}
|
||||
},
|
||||
"partial_fields" : {
|
||||
"partial1" : {
|
||||
"include" : ["obj1.obj2.*", "obj1.obj4.*"],
|
||||
"exclude" : "obj1.obj3.*"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
|
|
@ -124,8 +124,6 @@ The following is an example that forces the use of the plain highlighter:
|
|||
|
||||
==== Force highlighting on source
|
||||
|
||||
added[1.0.0.RC1]
|
||||
|
||||
Forces the highlighting to highlight fields based on the source even if fields are
|
||||
stored separately. Defaults to `false`.
|
||||
|
||||
|
|
|
@ -82,8 +82,6 @@ for <<query-dsl-function-score-query,`function query`>> rescores.
|
|||
|
||||
==== Multiple Rescores
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
It is also possible to execute multiple rescores in sequence:
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
|
|
@ -131,12 +131,6 @@ the `nested_filter` then a missing value is used.
|
|||
|
||||
==== Ignoring Unmapped Fields
|
||||
|
||||
coming[1.4.0] Before 1.4.0 there was the `ignore_unmapped` boolean
|
||||
parameter, which was not enough information to decide on the sort
|
||||
values to emit, and didn't work for cross-index search. It is still
|
||||
supported but users are encouraged to migrate to the new
|
||||
`unmapped_type` instead.
|
||||
|
||||
By default, the search request will fail if there is no mapping
|
||||
associated with a field. The `unmapped_type` option allows to ignore
|
||||
fields that have no mapping and not sort by them. The value of this
|
||||
|
@ -285,8 +279,6 @@ conform with http://geojson.org/[GeoJSON].
|
|||
|
||||
==== Multiple reference points
|
||||
|
||||
coming[1.4.0]
|
||||
|
||||
Multiple geo points can be passed as an array containing any `geo_point` format, for example
|
||||
|
||||
[source,js]
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-request-source-filtering]]
|
||||
=== Source filtering
|
||||
|
||||
added[1.0.0.Beta1]
|
||||
|
||||
|
||||
Allows to control how the `_source` field is returned with every hit.
|
||||
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[search-template]]
|
||||
== Search Template
|
||||
|
||||
added[1.1.0]
|
||||
|
||||
The `/_search/template` endpoint allows to use the mustache language to pre render search requests,
|
||||
before they are executed and fill existing templates with template parameters.
|
||||
|
||||
|
@ -224,8 +222,6 @@ GET /_search/template
|
|||
|
||||
<1> Name of the the query template in `config/scripts/`, i.e., `storedTemplate.mustache`.
|
||||
|
||||
added[1.3.0]
|
||||
|
||||
You can also register search templates by storing it in the elasticsearch cluster in a special index named `.scripts`.
|
||||
There are REST APIs to manage these indexed templates.
|
||||
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[suggester-context]]
|
||||
=== Context Suggester
|
||||
|
||||
added[1.2.0]
|
||||
|
||||
The context suggester is an extension to the suggest API of Elasticsearch. Namely the
|
||||
suggester system provides a very fast way of searching documents by handling these
|
||||
entirely in memory. But this special treatment does not allow the handling of
|
||||
|
|
|
@ -62,7 +62,7 @@ query.
|
|||
|`explain` |For each hit, contain an explanation of how scoring of the
|
||||
hits was computed.
|
||||
|
||||
|`_source`| added[1.0.0.Beta1]Set to `false` to disable retrieval of the `_source` field. You can also retrieve
|
||||
|`_source`|Set to `false` to disable retrieval of the `_source` field. You can also retrieve
|
||||
part of the document by using `_source_include` & `_source_exclude` (see the <<search-request-source-filtering, request body>>
|
||||
documentation for more details)
|
||||
|
||||
|
@ -82,7 +82,7 @@ scores and return them as part of each hit.
|
|||
within the specified time value and bail with the hits accumulated up to
|
||||
that point when expired. Defaults to no timeout.
|
||||
|
||||
|coming[1.4.0] `terminate_after` |The maximum number of documents to collect for
|
||||
|`terminate_after` |The maximum number of documents to collect for
|
||||
each shard, upon reaching which the query execution will terminate early.
|
||||
If set, the response will have a boolean field `terminated_early` to
|
||||
indicate whether the query execution has actually terminated_early.
|
||||
|
|
|
@ -43,7 +43,7 @@ curl -XGET 'http://localhost:9200/twitter/tweet/_validate/query' -d '{
|
|||
--------------------------------------------------
|
||||
|
||||
NOTE: The query being sent in the body must be nested in a `query` key, same as
|
||||
the <<search-search,search api>> works added[1.0.0.RC1,The query was previously the top-level object].
|
||||
the <<search-search,search api>> works
|
||||
|
||||
If the query is invalid, `valid` will be `false`. Here the query is
|
||||
invalid because Elasticsearch knows the post_date field should be a date
|
||||
|
|
|
@ -52,7 +52,7 @@ $ bin/elasticsearch -Xmx2g -Xms2g -Des.index.store.type=memory --node.name=my-no
|
|||
|
||||
Elasticsearch is built using Java, and requires at least
|
||||
http://www.oracle.com/technetwork/java/javase/downloads/index.html[Java 7] in
|
||||
order to run added[1.2.0,Was at least Java 6 before]. Only Oracle's Java and
|
||||
order to run Only Oracle's Java and
|
||||
the OpenJDK are supported.
|
||||
|
||||
We recommend installing the *Java 8 update 20 or later*, or *Java 7 update 55
|
||||
|
|
|
@ -1,8 +1,6 @@
|
|||
[[testing-framework]]
|
||||
== Java Testing Framework
|
||||
|
||||
added[1.0.0.RC1]
|
||||
|
||||
[[testing-intro]]
|
||||
|
||||
Testing is a crucial part of your application, and as information retrieval itself is already a complex topic, there should not be any additional complexity in setting up a testing infrastructure, which uses elasticsearch. This is the main reason why we decided to release an additional file to the release, which allows you to use the same testing infrastructure we do in the elasticsearch core. The testing framework allows you to setup clusters with multiple nodes in order to check if your code covers everything needed to run in a cluster. The framework prevents you from writing complex code yourself to start, stop or manage several test nodes in a cluster. In addition there is another very important feature called randomized testing, which you are getting for free as it is part of the elasticsearch infrastructure.
|
||||
|
|
Loading…
Reference in New Issue