Previously node.max_local_storage_nodes defaulted to fifty, and this
permitted users to start multiple instances of Elasticsearch sharing the
same data folder. This can be dangerous, and usually it does not make
sense to run more than one instance of Elasticsearch on a single
server. Because of this, we had a note in the important settings docs
advising users to set this setting to one. However, we have since
changed the default value of this setting to one so this advise is no
longer needed.
Relates #21305
This query is deprecated from 5.0 on. Similar to IndicesQueryBuilder we should
log a deprecation warning whenever this query is used.
Relates to #15760
Lucene 6.2 introduces the new `Analyzer.normalize` API, which allows to apply
only character-level normalization such as lowercasing or accent folding, which
is exactly what is needed to process queries that operate on partial terms such
as `prefix`, `wildcard` or `fuzzy` queries. As a consequence, the
`lowercase_expanded_terms` option is not necessary anymore. Furthermore, the
`locale` option was only needed in order to know how to perform the lowercasing,
so this one can be removed as well.
Closes#9978
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
* A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined.
* Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false.
* Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
* Add docs with up to date instructions on updating default similarity
The default similarity can no longer be set in the configuration file
(you will get an error on startup). Update the docs with the method
that works.
* Add instructions for changing similarity on index creation
It was 10mb and that was causing trouble when folks reindex-from-remoted
with large documents.
We also improve the error reporting so it tells folks to use a smaller
batch size if they hit a buffer size exception. Finally, adds some docs
to reindex-from-remote mentioning the buffer and giving an example of
lowering the size.
Closes#21185
Refactored ScriptType to clean up some of the variable and method names. Added more documentation. Deprecated the 'in' ParseField in favor of 'stored' to match the indexed scripts being replaced by stored scripts.
Adds support for indexing into lists and arrays with negative
indexes meaning "counting from the back". So for if
`x = ["cat", "dog", "chicken"]` then `x[-1] == "chicken"`.
This adds an extra branch to every array and list access but
some performance testing makes it look like the branch predictor
successfully predicts the branch every time so there isn't a
in execution time for this feature when the index is positive.
When the index is negative performance testing showed the runtime
is the same as writing `x[x.length - 1]`, again, presumably thanks
to the branch predictor.
Those performance metrics were calculated for lists and arrays but
`def`s get roughly the same treatment though instead of inlining
the test they need to make a invoke dynamic so we don't screw up
maps.
Closes#20870
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
Converts docs for `_cat/segments`, `_cat/plugins` and `_cat/repositories`
from `curl` to `// CONSOLE` so they are tested as part of the build and
are cleaner to use in Console. They should work fine with `curl` with
the `COPY AS CURL` link.
Also swaps the `source` type of the response from `js` to `txt` because
that is more correct. The syntax highlighter doesn't care. It looks at
the text to figure out the language. So it looks a little funny for `_cat`
responses regardless.
Relates to #18160
On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.
Relates #21094
This change adds a TypesQuery that checks if the disjunction of types should be rewritten to a MatchAllDocs query. The check is done only if the number of terms is below a threshold (16 by default and configurable via max_boolean_clause).
This allows you to whitelist `localhost:*` or `127.0.10.*:9200`.
It explicitly checks for patterns like `*` in the whitelist and
refuses to start if the whitelist would match everything. Beyond
that the user is on their own designing a secure whitelist.
This commit fixes two issues with the slow log docs:
- clarifies that these settings are per index
- updates index slow log configuration for Log4j 2
Relates #20976
It is important that folks understand that snapshot/restore isn't
for archiving. It is appropriate for backup and disaster recovery
but not for archival over long periods of time because of version
incompatibility.
Closes#20866
Relates to #18160
Uses the new sorting (#20658) in the `_cat` API to support all use
cases natively. We can still resort to piping things through `sort`
if we need to, but we don't have to for basic stuff like sorting!
* Adding built-in sorting capability to _cat apis.
Closes#16975
* addressing pr comments
* changing value types back to original implementation and fixing cosmetic issues
* Changing compareTo, hashCode of value types to a better implementation
* Changed value compareTos to use Double.compare instead of if statements + fixed some failed unit tests
The shards preference on a search request enables specifying a list of
shards to hit, and then a secondary preference (e.g., "_primary") can be
added. Today, the separator between the shards list and the secondary
preference is ';'. Unfortunately, this is also a valid separtor for URL
query parameters. This means that a preference like "_shards:0;_primary"
will be parsed into two URL parameters: "_shards:0" and "_primary". With
the recent change to strict URL parsing, the second parameter will be
rejected, "_primary" is not a valid URL parameter on a search
request. This means that this feature has never worked (unless the ';'
is escaped, but no one does that because our docs do not that, and there
was no indication from Elasticsearch that this did not work). This
commit changes the separator to '|'.
Relates #20786
This change proposes the removal of all non-tcp transport implementations. The
mock transport can be used by default to run tests instead of local transport that has
roughly the same performance compared to TCP or at least not noticeably slower.
This is a master only change, deprecation notice in 5.x will be committed as a
separate change.
Previously, this doc was using a field called "content". This is
confusing, especially when the doc starts talking about the content of
the content field. This change makes the field name "comment" which
is less ambiguous and also changes some related field names in the doc
to make a consistent example theme of editing docs around blog posts.
This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.
Relates to #18160
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).
This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.
Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.
Relates #20722
On Windows the JDK uses `CreateFileW` which has a stupidly high
limit for the number of `Handle`s it can make - `16 * 1024 * 1024`.
So this isn't really a problem on Windows at all.
Closes#20732
today it's not possible to use date-math efficiently with the `_rollover`
API. This change adds support for date-math in the target index as well as
support for preserving the math logic when an existing index that was created with
a date math expression all subsequent indices are created with the same expression.
this change adds a hard limit to `index.number_of_shard` that prevents
indices from being created that have more than 1024 shards. This is still
a huge limit and can only be changed via settings a system property.
* master: (1199 commits)
[DOCS] Remove non-valid link to mapping migration document
Revert "Default `include_in_all` for numeric-like types to false"
test: add a test with ipv6 address
docs: clearify that both ip4 and ip6 addresses are supported
Include complex settings in settings requests
Add production warning for pre-release builds
Clean up confusing error message on unhandled endpoint
[TEST] Increase logging level in testDelayShards()
change health from string to enum (#20661)
Provide error message when plugin id is missing
Document that sliced scroll works for reindex
Make reindex-from-remote ignore unknown fields
Remove NoopGatewayAllocator in favor of a more realistic mock (#20637)
Remove Marvel character reference from guide
Fix documentation for setting Java I/O temp dir
Update client benchmarks to log4j2
Changes the API of GatewayAllocator#applyStartedShards and (#20642)
Removes FailedRerouteAllocation and StartedRerouteAllocation
IndexRoutingTable.initializeEmpty shouldn't override supplied primary RecoverySource (#20638)
Smoke tester: Adjust to latest changes (#20611)
...
Surprise! You can use sliced scroll to easily parallelize reindex
and friend. They support it because they use the same infrastructure
as a regular search to parse the search request. While we would like
to make an "automatic" option for parallelizing reindex, this manual
option works right now and is pretty convenient!
This commit fixes the documentation for configuring the Java I/O temp
dir which incorrectly suggested using the -D flag as a parameter on the
command line; these flags have been removed and should now be specified
as arguments to the JVM using either the ES_JAVA_OPTS environment
variable or using the jvm.options configuration file.
Closes#20652
* plugins/discovery-azure-class.asciidoc
* reference/cluster.asciidoc
* reference/modules/cluster/misc.asciidoc
* reference/modules/indices/request_cache.asciidoc
After this is merged there will be no unconvereted snippets outside
of `reference`.
Related to #18160
Adds a cat api endpoint: /_cat/templates and its more specific version, /_cat/templates/{name}.
It looks something like:
$ curl "localhost:9200/_cat/templates?v"
name template order version
sushi_california_roll *avocado* 1 1
pizza_hawaiian *pineapples* 1
pizza_pepperoni *pepperoni* 1
The specified version (only allows * globs) looks like:
$ curl "localhost:9200/_cat/templates/pizza*"
name template order version
pizza_hawaiian *pineapples* 1
pizza_pepperoni *pepperoni* 1
Partially specified columns:
$ curl "localhost:9200/_cat/templates/pizza*?v=true&h=name,template"
name template
pizza_hawaiian *pineapples*
pizza_pepperoni *pepperoni*
The help text:
$ curl "localhost:9200/_cat/templates/pizza*?help"
name | n | template name
template | t | template pattern string
order | o | template application order number
version | v | version
Closes#20467
With the unified release process across the elastic stack, download
links for all products are changing. This change updates docs referring
to the old download and packages urls.
Note that this change also updates the plugin installation command as
the url for downloads is being changed to be consistent with that for
packages (both plural).
The serial collector is not suitable for running with a server
application like Elasticsearch and can decimate performance and lead to
cluster instability. This commit adds a bootstrap check to prevent usage
of the serial collector when Elasticsearch is running in production
mode.
Relates #20558
This tracks the snippets that probably should be converted to
`// CONSOLE` or `// TESTRESPONSE` and fails the build if the list
of files with such snippets doesn't match the list in `docs/build.gradle`.
Setting the file looks like
```
/* List of files that have snippets that probably should be converted to
* `// CONSOLE` and `// TESTRESPONSE` but have yet to be converted. Try and
* only remove entries from this list. When it is empty we'll remove it
* entirely and have a party! There will be cake and everything.... */
buildRestTests.expectedUnconvertedCandidates = [
'plugins/discovery-azure-classic.asciidoc',
...
'reference/search/suggesters/completion-suggest.asciidoc',
]
```
This list is in `build.gradle` because we expect it to be fairly
temporary. In a few months we'll have converted all of the docs and won't
ned it any more.
From now on if you add now docs that contain a snippet that shows an
interaction with elasticsearch you have three choices:
1. Stick `// CONSOLE` on the interactions and `// TESTRESPONSE` on the
responses. The build (specifically (`gradle docs:check`) will test that
these interactions "work". If there isn't a `// TESTRESPONSE` snippet
then "work" just means "Elasticsearch responds with a 200-level response
code and no `WARNING` headers. This is way better than nothing.
2. Add `// NOTCONSOLE` if the snippet isn't actually interacting with
Elasticsearch. This should only be required for stuff like javascript
source code or `curl` against an external service like AWS or GCE. The
snippet will not get "OPEN IN CONSOLE" or "COPY AS CURL" buttons or be
tested.
3. Add `// TEST[skip:reason]` under the snippet. This will just skip the
snippet in the test phase. This should really be reserved for snippets
where we can't test them because they require an external service that
we don't have at testing time.
Please, please, please, please don't add more things to the list. After
all, it sais there'll be cake when we remove it entirely!
Relates to #18160
We can now run templates using `explain` and/or `profile` parameters.
Which is interesting when you have defined a complicated profile but want to debug it in an easier way than running the full query again.
You can use `explain` parameter when running a template:
```js
GET /_search/template
{
"file": "my_template",
"params": {
"status": [ "pending", "published" ]
},
"explain": true
}
```
You can use `profile` parameter when running a template:
```js
GET /_search/template
{
"file": "my_template",
"params": {
"status": [ "pending", "published" ]
},
"profile": true
}
```
Funny node names have been removed in #19456 and replaced by UUID. This commit removes these obsolete node names and replace them by real UUIDs in the documentation.
closes#20065
`gender.keyword` should be used instead of just `gender` or we
get an error `Fielddata is disabled on text fields by default. Set
fielddata=true on [gender] in order to load fielddata in memory by
uninverting the inverted index. Note that this can however use
Closes#20535
significant memory.`
`cluster.routing.allocation.cluster_concurrent_rebalance` setting,
clarifying in which shard allocation situations the rebalance limit
takes effect.
Closes#20529
`profile.asciidoc` now runs all of its command but it doesn't validate
all of the results. Writing the validation is time consuming so I only
did some of it.
* Update rescoring docs in respect to sort
If sort is present in a query the rescore query is not executed. As long as this feature is neither implemented (see discussion in #6788) nor the combination of sort and rescoring raises an error, we should warn the user in the documentation about this.
* Missed a dot
In 7560101ec7, the Elasticsearch logger
names were modified to be their fully-qualified class name (with some
exceptions for special loggers like the slow logs and the transport
tracer). This commit updates the docs accordingly.
Relates #20475
With the cut over to LatLonPoint the geohash, geohash_precision, lat_lon, and geohash_prefix parameters have been removed. This commit fixes the doc build by removing the remaining dangling references to these removed parameters.
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.
The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain
The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain
The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.
Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.
Fixes#20155
This commit adds a -q/--quiet option to Elasticsearch so that it does not log anything in the console and closes stdout & stderr streams. This is useful for SystemD to avoid duplicate logs in both journalctl and /var/log/elasticsearch/elasticsearch.log while still allows the JVM to print error messages in stdout/stderr if needed.
closes#17220
This commit adds a health status parameter to the cat indices API for
filtering on indices that match the specified status (green|yellow|red).
Relates #20393
Add docs to template support for _msearch
Relates to #10885
Relates to #15674
* Reference those docs from the rest api spec for _msearch/template support.
In 5.x we allowed this with a deprecation warning. This removes the code
added for that deprecation, requiring the cluster name to not be in the
data path.
Resolves#20391
This was an error-prone version type that allowed overriding previous
version semantics. It could cause primaries and replicas to be out of
sync however, so it has been removed.
Resolves#19769
Previous versions of Elasticsearch permitted unquoted JSON field names even though this is against the JSON spec. This leniency was disabled by default in the 5.x series of Elasticsearch but a backwards compatibility layer was added via a system property with the intention of removing this layer in 6.0.0. This commit removes this backwards compatibility layer.
Relates #20388
This includes:
- All regular numeric types such as int, long, scaled-float, double, etc
- IP addresses
- Dates
- Geopoints and Geoshapes
Relates to #19784
The collect_payloads parameter of the span_near query was previously
deprecated with the intention to be removed. This commit removes this
parameter.
Relates #20385
This was an error-prone version type that allowed overriding previous
version semantics. It could cause primaries and replicas to be out of
sync however, so it has been removed.
Resolves#19769
Hi all,
I was trying to run the percolate examples, but I figured that because of the "type":"keyword" , the code wasn't working.
In the saerch query the "message" : "A new bonsai tree in the office" is a pure string.
I changed it to "text".
Exposing lucene 6.x minhash tokenfilter
Generate min hash tokens from an incoming stream of tokens that can
be used to estimate document similarity.
Closes#20149
** The default script language is now maintained in `Script` class.
* Added `script.legacy.default_lang` setting that controls the default language for scripts that are stored inside documents (for example percolator queries). This defaults to groovy.
** Added `QueryParseContext#getDefaultScriptLanguage()` that manages the default scripting language. Returns always `painless`, unless loading query/search request in legacy mode then the returns what is configured in `script.legacy.default_lang` setting.
** In the aggregation parsing code added `ParserContext` that also holds the default scripting language like `QueryParseContext`. Most parser don't have access to `QueryParseContext`. This is for scripts in aggregations.
* The `lang` script field is always serialized (toXContent).
Closes#20122
and be much more stingy about what we consider a console candidate.
* Add `// CONSOLE` to check-running
* Fix version in some snippets
* Mark groovy snippets as groovy
* Fix versions in plugins
* Fix language marker errors
* Fix language parsing in snippets
This adds support for snippets who's language is written like
`[source, txt]` and `["source","js",subs="attributes,callouts"]`.
This also makes language required for snippets which is nice because
then we can be sure we can grep for snippets in a particular language.
- Using log() to indicate natural log can add some confusion when trying to further adjust/tweak scores. Other parts of the API (field_value_factor on this same page) use 'ln' and 'log', so this change should be more consistent
- Fixes#20027
- I generated the images using http://latex2png.com/ at a resolution of 150 which seemed to be about the same size as before
This commit configures the deprecation logs to be size-limited to 1 GB,
and compress these logs when they roll. The default configuration will
preserve up to four rolled logs.
Relates #20287
The mem section was buggy in cluster stats and removed. It is now added back with the same structure as in node stats, containing total memory, available memory, used memory and percentages. All the values are the sum of all the nodes across the cluster (or at least the ones that we were able to get the values from).
If elasticsearch controls the ID values as well as the documents
version we can optimize the code that adds / appends the documents
to the index. Essentially we an skip the version lookup for all
documents unless the same document is delivered more than once.
On the lucene level we can simply call IndexWriter#addDocument instead
of #updateDocument but on the Engine level we need to ensure that we deoptimize
the case once we see the same document more than once.
This is done as follows:
1. Mark every request with a timestamp. This is done once on the first node that
receives a request and is fixed for this request. This can be even the
machine local time (see why later). The important part is that retry
requests will have the same value as the original one.
2. In the engine we make sure we keep the highest seen time stamp of "retry" requests.
This is updated while the retry request has its doc id lock. Call this `maxUnsafeAutoIdTimestamp`
3. When the engine runs an "optimized" request comes, it compares it's timestamp with the
current `maxUnsafeAutoIdTimestamp` (but doesn't update it). If the the request
timestamp is higher it is safe to execute it as optimized (no retry request with the same
timestamp has been run before). If not we fall back to "non-optimzed" mode and run the request as a retry one
and update the `maxUnsafeAutoIdTimestamp` unless it's been updated already to a higher value
Relates to #19813
* master:
Avoid NPE in LoggingListener
Randomly use Netty 3 plugin in some tests
Skip smoke test client on JDK 9
Revert "Don't allow XContentBuilder#writeValue(TimeValue)"
[docs] Remove coming in 2.0.0
Don't allow XContentBuilder#writeValue(TimeValue)
[doc] Remove leftover from CONSOLE conversion
Parameter improvements to Cluster Health API wait for shards (#20223)
Add 2.4.0 to packaging tests list
Docs: clarify scale is applied at origin+offest (#20242)
* Params improvements to Cluster Health API wait for shards
Previously, the cluster health API used a strictly numeric value
for `wait_for_active_shards`. However, with the introduction of
ActiveShardCount and the removal of write consistency level for
replication operations, `wait_for_active_shards` is used for
write operations to represent values for ActiveShardCount. This
commit moves the cluster health API's usage of `wait_for_active_shards`
to be consistent with its usage in the write operation APIs.
This commit also changes `wait_for_relocating_shards` from a
numeric value to a simple boolean value `wait_for_no_relocating_shards`
to set whether the cluster health operation should wait for
all relocating shards to complete relocation.
* Addresses code review comments
* Don't be lenient if `wait_for_relocating_shards` is set
* master:
Increase visibility of deprecation logger
Skip transport client plugin installed on JDK 9
Explicitly disable Netty key set replacement
percolator: Fail indexing percolator queries containing either a has_child or has_parent query.
Make it possible for Ingest Processors to access AnalysisRegistry
Allow RestClient to send array-based headers
Silence rest util tests until the bogusness can be simplified
Remove unknown HttpContext-based test as it fails unpredictably on different JVMs
Tests: Improve rest suite names and generated test names for docs tests
Add support for a RestClient base path
The deprecation logger is an important way to make visible features of
Elasticsearch that are deprecated. Yet, the default logging makes the
log messages for the deprecation logger invisible. We want these log
messages to be visible, so the default logging for the deprecation
logger should enable these log messages. This commit changes the log
level of deprecation log message to warn, and configures the deprecation
logger so that these log messages are visible out of the box.
Relates #20254
While removing an index isn't actually an alias action, if we add
an alias action that deletes an index then we can delete and index
and add an alias with the same name as the index atomically, in
the same cluster state update.
Closes#20064
This commit adds the support for exclusion filter to the response filtering (filter_path) feature. It changes the XContentBuilder APIs so that it now accepts two types of filters: inclusive and exclusive. Filters are no more String arrays but sets of String instead.
This changes Elasticsearch to automatically downgrade `text` and
`keyword` fields into appropriate `string` fields when changing the
mapping of indexes imported from 2.x. This allows users to use the
modern, documented syntax against 2.x indexes. It also makes it clear
that reindexing in order to recreate the index in 5.0 is required for
any long lived indexes. This change is useful for the times when you
can't (cluster is just starting, not stable enough for reindex) or
shouldn't (index will only live 90 days or something).
This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation.
To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this:
````
POST _search
{
"stored_fields": "_none_"
}
````
Today we do a lot of accounting inside the engine to maintain locations
of documents inside the transaction log. This is only needed to ensure
we can return the documents source from the engine if it hasn't been refreshed.
Aside of the added complexity to be able to read from the currently writing translog,
maintainance of pointers into the translog this also caused inconsistencies like different values
of the `_ttl` field if it was read from the tlog or not. TermVectors are totally different if
the document is fetched from the tranlog since copy fields are ignored etc.
This chance will simply call `refresh` if the documents latest version is not in the index. This
streamlines the semantics of the `_get` API and allows for more optimizations inside the engine
and on the transaction log. Note: `_refresh` is only called iff the requested document is not refreshed
yet but has recently been updated or added.
#Relates to #19787
Deprecates the optimize_bbox parameter on geodistance queries. This has no longer been needed since version 2.2 because lucene geo distance queries (postings and LatLonPoint) already optimize by bounding box.
Fix field examples to make documents actually visible
This commit adds refresh calls to field examples an removes not working
`_routing` and `_field_names` script access.
Closes#20118
This includes:
- All regular numeric types such as int, long, scaled-float, double, etc
- IP addresses
- Dates
- Geopoints and Geoshapes
Relates to #19784
Previously this was possible, which was problematic when issuing a
request like `DELETE /-myindex`, which was interpretted as "delete
everything except for myindex".
Resolves#19800
Most of the examples in the pipeline aggregation docs use a small
"sales" test data set and I converted all of the examples that use
it to `// CONSOLE`. There are still a bunch of snippets in the pipeline
aggregation docs that aren't `// CONSOLE` so they aren't tested. Most
of them are "this is the most basic form of this aggregation" so they
are more immune to errors and bit rot then the examples that I converted.
I'd like to do something with them as well but I'm not sure what.
Also, the moving average docs and serial diff docs didn't get a lot of
love from this pass because they don't use the test data set or follow
the same general layout.
Relates to #18160
Currently both `PUT` and `POST` can be used to create indices. This commit
removes support for `POST index_name` so that we can use it to index documents
with auto-generated ids once types are removed.
Relates #15613
In the example there was a alias removed and then a different alias created for the same index, but I think actually swapping a index by another one for the same alias would make more sense as an example here.
This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.
Relates #19964
This commit rewords the expect header bug notice to provide the precise
details for the bug arising. In particular, the bug does not impact any
request over 1024 bytes, but instead impacts any request with a body
that is sent in two requests, the first with an Expect: 100-continue
header. The size is irrelevant, and requests with bodies larger than
1024 bytes are okay as long as the Expect: 100-continue header is not
also sent.
Relates #19911
>However, the version of the new cluster should be the same or newer than the cluster that was
Afaik, you can't restore a snapshot to a newer cluster that is not consecutively newer (i.e. can't restore 1.x snapshot to a 5.x cluster). This is to clarify the statement above moving forward.
When compiling many dynamically changing scripts, parameterized
scripts (<https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-using.html#prefer-params>)
should be preferred. This enforces a limit to the number of scripts that
can be compiled within a minute. A new dynamic setting is added -
`script.max_compilations_per_minute`, which defaults to 15.
If more dynamic scripts are sent, a user will get the following
exception:
```json
{
"error" : {
"root_cause" : [
{
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "i",
"node" : "a5V1eXcZRYiIk8lecjZ4Jw",
"reason" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
}
],
"caused_by" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
},
"status" : 500
}
```
This also fixes a bug in `ScriptService` where requests being executed
concurrently on a single node could cause a script to be compiled
multiple times (many in the case of a powerful node with many shards)
due to no synchronization between checking the cache and compiling the
script. There is now synchronization so that a script being compiled
will only be compiled once regardless of the number of concurrent
searches on a node.
Relates to #19396
The payload option was introduced with the new completion
suggester implementation in v5, as a stop gap solution
to return additional metadata with suggestions.
Now we can return associated documents with suggestions
(#19536) through fetch phase using stored field (_source).
The additional fetch phase ensures that we only fetch
the _source for the global top-N suggestions instead of
fetching _source of top results for each shard.
This note in the delete api about broadcasting to all shards is a leftover that should have been removed when the broadcasting feature was removed
Relates to #10136
GeoDistance is implemented using a crazy enum that causes issues with the scripting modules. This commit moves all distance calculations to arcDistance and planeDistance static methods in GeoUtils. It also removes unnecessary distance helper methods from ScriptDocValues.GeoPoints.
This commit enables completion suggester to return documents
associated with suggestions. Now the document source is returned
with every suggestion, which respects source filtering options.
In case of suggest queries spanning more than one shard, the
suggest is executed in two phases, where the last phase fetches
the relevant documents from shards, implying executing suggest
requests against a single shard is more performant due to the
document fetch overhead when the suggest spans multiple shards.
Adds `warnings` syntax to the yaml test that allows you to expect
a `Warning` header that looks like:
```
- do:
warnings:
- '[index] is deprecated'
- quotes are not required because yaml
- but this argument is always a list, never a single string
- no matter how many warnings you expect
get:
index: test
type: test
id: 1
```
These are accessible from the docs with:
```
// TEST[warning:some warning]
```
This should help to force you to update the docs if you deprecate
something. You *must* add the warnings marker to the docs or the build
will fail. While you are there you *should* update the docs to add
deprecation warnings visible in the rendered results.
Today, when listing thread pools via the cat thread pool API, thread
pools are listed in a column-delimited format. This is unfriendly to
command-line tools, and inconsistent with other cat APIs. Instead,
thread pools should be listed in a row-delimited format.
Additionally, the cat thread pool API is limited to a fixed list of
thread pools that excludes certain built-in thread pools as well as all
custom thread pools. These thread pools should be available via the cat
thread pool API.
This commit improves the cat thread pool API by listing all thread pools
(built-in or custom), and by listing them in a row-delimited
format. Finally, for each node, the output thread pools are sorted by
thread pool name.
Relates #19721
Currently both aggregations really share the same implementation. This commit
splits the implementations so that regular histograms can support decimal
intervals/offsets and compute correct buckets for negative decimal values.
However the response API is still the same. So for intance both regular
histograms and date histograms will produce an
`org.elasticsearch.search.aggregations.bucket.histogram.Histogram`
aggregation.
The optimization to compute an identifier of the rounded value and the
rounded value itself has been removed since it was only used by regular
histograms, which now do the rounding themselves instead of relying on the
Rounding abstraction.
Closes#8082Closes#4847
* Rename operation to result and reworking responses
* Rename DocWriteResponse.Operation enum to DocWriteResponse.Result
These are just easier to interpret names.
Closes#19664
The current heuristic to compute a default shard size is pretty aggressive,
it returns `max(10, number_of_shards * size)` as a value for the shard size.
I think making it less aggressive has the benefit that it would reduce the
likelyness of running into OOME when there are many shards (yearly
aggregations with time-based indices can make numbers of shards in the
thousands) and make the use of breadth-first more likely/efficient.
This commit replaces the heuristic with `size * 1.5 + 10`, which is enough
to have good accuracy on zipfian distributions.
* Update gateway.asciidoc
Added a note to clarify that, in cases where nodes in a cluster have different setting, the node that is the elected master takes precedence over anything else.
* Update gateway.asciidoc
Updated as per @bleskes's comments
Performing the bulk request shown in #19267 now results in the following:
```
{"_index":"test","_type":"test","_id":"1","_version":1,"_operation":"create","forced_refresh":false,"_shards":{"total":2,"successful":1,"failed":0},"status":201}
{"_index":"test","_type":"test","_id":"1","_version":1,"_operation":"noop","forced_refresh":false,"_shards":{"total":2,"successful":1,"failed":0},"status":200}
```
This change adds a new special path to the buckets_path syntax
`_bucket_count`. This new option will return the number of buckets for a
multi-bucket aggregation, which can then be used in pipeline
aggregations.
Closes#19553
This adds new circuit breaking with the "request" breaker, which adds
circuit breaks based on the number of buckets created during
aggregations. It consists of incrementing during AggregatorBase creation
This also bumps the REQUEST breaker to 60% of the JVM heap now.
The output when circuit breaking an aggregation looks like:
```json
{
"shard" : 0,
"index" : "i",
"node" : "a5AvjUn_TKeTNYl0FyBW2g",
"reason" : {
"type" : "exception",
"reason" : "java.util.concurrent.ExecutionException: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: CircuitBreakingException[[request] Data too large, data for [<agg [otherthings]>] would be larger than limit of [104857600/100mb]];",
"caused_by" : {
"type" : "execution_exception",
"reason" : "QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: CircuitBreakingException[[request] Data too large, data for [<agg [myagg]>] would be larger than limit of [104857600/100mb]];",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[request] Data too large, data for [<agg [otherthings]>] would be larger than limit of [104857600/100mb]",
"bytes_wanted" : 104860781,
"bytes_limit" : 104857600
}
}
}
}
```
Relates to #14046
With #19140 we started persisting the node ID across node restarts. Now that we have a "stable" anchor, we can use it to generate a stable default node name and make it easier to track nodes over a restarts. Sadly, this means we will not have those random fun Marvel characters but we feel this is the right tradeoff.
On the implementation side, this requires a bit of juggling because we now need to read the node id from disk before we can log as the node node is part of each log message. The PR move the initialization of NodeEnvironment as high up in the starting sequence as possible, with only one logging message before it to indicate we are initializing. Things look now like this:
```
[2016-07-15 19:38:39,742][INFO ][node ] [_unset_] initializing ...
[2016-07-15 19:38:39,826][INFO ][node ] [aAmiW40] node name set to [aAmiW40] by default. set the [node.name] settings to change it
[2016-07-15 19:38:39,829][INFO ][env ] [aAmiW40] using [1] data paths, mounts [[ /(/dev/disk1)]], net usable_space [5.5gb], net total_space [232.6gb], spins? [unknown], types [hfs]
[2016-07-15 19:38:39,830][INFO ][env ] [aAmiW40] heap size [1.9gb], compressed ordinary object pointers [true]
[2016-07-15 19:38:39,837][INFO ][node ] [aAmiW40] version[5.0.0-alpha5-SNAPSHOT], pid[46048], build[473d3c0/2016-07-15T17:38:06.771Z], OS[Mac OS X/10.11.5/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_51/25.51-b03]
[2016-07-15 19:38:40,980][INFO ][plugins ] [aAmiW40] modules [percolator, lang-mustache, lang-painless, reindex, aggs-matrix-stats, lang-expression, ingest-common, lang-groovy, transport-netty], plugins []
[2016-07-15 19:38:43,218][INFO ][node ] [aAmiW40] initialized
```
Needless to say, settings `node.name` explicitly still works as before.
The commit also contains some clean ups to the relationship between Environment, Settings and Plugins. The previous code suggested the path related settings could be changed after the initial Environment was changed. This did not have any effect as the security manager already locked things down.
Add parser for anonymous char_filters/tokenizer/token_filters
Using Settings in AnalyzeRequest for anonymous definition
Add breaking changes document
Closed#8878
Remove `ParseField` constants used for names where there are no deprecated
names and just use the `String` version of the registration method instead.
This is step 2 in cleaning up the plugin interface for extending
search time actions. Aggregations are next.
This is breaking for plugins because those that register a new query should
now implement `SearchPlugin` rather than `onModule(SearchModule)`.
Previously if the size of the search request was greater than zero we would not cache the request in the request cache.
This change retains the default behaviour of not caching requests with size > 0 but also allows the `request_cache=true` query parameter
to enable the cache for requests with size > 0
This is a tentative to revive #15939 motivated by elastic/beats#1941.
Half-floats are a pretty bad option for storing percentages. They would likely
require 2 bytes all the time while they don't need more than one byte.
So this PR exposes a new `scaled_float` type that requires a `scaling_factor`
and internally indexes `value*scaling_factor` in a long field. Compared to the
original PR it exposes a lower-level API so that the trade-offs are clearer and
avoids any reference to fixed precision that might imply that this type is more
accurate (actually it is *less* accurate).
In addition to being more space-efficient for some use-cases that beats is
interested in, this is also faster that `half_float` unless we can improve the
efficiency of decoding half-float bits (which is currently done using software)
or until Java gets first-class support for half-floats.
Today the default precision for the cardinality aggregation depends on how many
parent bucket aggregations it had. The reasoning was that the more parent bucket
aggregations, the more buckets the cardinality had to be computed on. And this
number could be huge depending on what the parent aggregations actually are.
However now that we run terms aggregations in breadth-first mode by default when
there are sub aggregations, it is less likely that we have to run the cardinality
aggregation on kagilions of buckets. So we could use a static default, which will
be less confusing to users.
* Removed `Template` class and unified script & template parsing logic. Templates are scripts, so they should be defined as a script. Unless there will be separate template infrastructure, templates should share as much code as possible with scripts.
* Removed ScriptParseException in favour for ElasticsearchParseException
* Moved TemplateQueryBuilder to lang-mustache module because this query is hard coded to work with mustache only
This should make them easier to read and adds them to the test suite
I changed the example from a two node cluster to a single node cluster
because that is what we have running in the integration tests. It is also
what a user just starting out is likely to see so I think that is ok.
Invocation counts can be used to help judge the selectivity of individual query components in the context of the entire query. E.g. a query may not look selective when run by itself (matches most of the index), but when run in context of a full search request, is evaluated only rarely due to execution order
Since this is modifying the base timing class, it'll enrich both query and agg profiles (as well as future profile results)
Today `node.mode` and `node.local` serve almost the same purpose, they
are a shortcut for `discovery.type` and `transport.type`. If `node.local: true`
or `node.mode: local` is set elasticsearch will start in _local_ mode which means
only nodes within the same JVM are discovered and a non-network based transport
is used. The _local_ mode it only really used in tests or if nodes are embedded.
For both, embedding and tests explicit configuration via `discovery.type` and `transport.type`
should be preferred.
This change removes all the usage of these settings and by-default doesn't
configure a default transport implemenation since netty is now a module. Yet, to make
the user expericence flawless, plugins or modules can set a `http.type.default` and
`transport.type.default`. Plugins set this via `PluginService#additionalSettings()`
which enforces _set-once_ which prevents node startup if set multiple times. This means
that our distributions will just startup with netty transport since it's packaged as a
module unless `transport.type` or `http.transport.type` is explicitly set.
This change also found a bunch of bugs since several NamedWriteables were not registered if a
transport client is used. Now that we don't rely on the `node.mode` leniency which is inherited
instead of using explicit settings, `TransportClient` uses `AssertingLocalTransport` which detects these problems since it serializes all messages.
Closes#16234
This commit removes support for properties syntax and config files:
- removed support for elasticsearch.properties
- removed support for logging.properties
- removed support for properties content detection in REST APIs
- removed support for properties content detection in Java API
Relates #19398
Switches most search behavior extensions from push (`onModule(SearchModule)`)
to pull (`implements SearchPlugin`). This effort in general gives plugin
authors a much cleaner view of how to extend Elasticsearch and starts to
set up portions of Elasticsearch as "the plugin API". This commit in
particular does that for search-time behavior like customized suggesters,
highlighters, score functions, and significance heuristics.
It also switches most such customization to being done at search module
construction time which is much, much easier to reason about from a testing
perspective. It also helps significantly in the process of de-guice-ing
Elasticsearch's startup.
There are at least two major search time extensions that aren't covered in
this commit that will simply have to wait for the next commit on the topic
because this one has already grown large: custom aggregations and custom
queries. These will likely live in the same SearchPlugin interface as well.
If there are percolator queries containing `range` queries with ranges based on the current time then this can lead to incorrect results if the `percolate` query gets cached. These ranges are changing each time the `percolate` query gets executed and if this query gets cached then the results will be based on how the range was at the time when the `percolate` query got cached.
The ExtractQueryTermsService has been renamed `QueryAnalyzer` and now only deals with analyzing the query (extracting terms and deciding if the entire query is a verified match) . The `PercolatorFieldMapper` is responsible for adding the right fields based on the analysis the `QueryAnalyzer` has performed, because this is highly dependent on the field mappings. Also the `PercolatorFieldMapper` is responsible for creating the percolate query.
Today when a thread encounters a fatal unrecoverable error that
threatens the stability of the JVM, Elasticsearch marches on. This
includes out of memory errors, stack overflow errors and other errors
that leave the JVM in a questionable state. Instead, the Elasticsearch
JVM should die when these errors are encountered. This commit causes
this to be the case.
Relates #19272
* master: (192 commits)
[TEST] Fix rare OBOE in AbstractBytesReferenceTestCase
Reindex from remote
Rename writeThrowable to writeException
Start transport client round-robin randomly
Reword Refresh API reference (#19270)
Update fielddata.asciidoc
Fix stored_fields message
Add missing footer notes in mapper size docs
Remote BucketStreams
Add doc values support to the _size field in the mapper-size plugin
Bump version to 5.0.0-alpha5.
Update refresh.asciidoc
Update shrink-index.asciidoc
Change Debian repository for Vagrant debian-8 box
[TEST] fix test to account for internal empyt reference optimization
Upgrade to netty 3.10.6.Final (#19235)
[TEST] fix histogram test when extended bounds overlaps data
Remove redundant modifier
Simplify TcpTransport interface by reducing send code to a single send method (#19223)
Fix style violation in InstallPluginCommand.java
...
This adds a remote option to reindex that looks like
```
curl -POST 'localhost:9200/_reindex?pretty' -d'{
"source": {
"remote": {
"host": "http://otherhost:9200"
},
"index": "target",
"query": {
"match": {
"foo": "bar"
}
}
},
"dest": {
"index": "target"
}
}'
```
This reindex has all of the features of local reindex:
* Using queries to filter what is copied
* Retry on rejection
* Throttle/rethottle
The big advantage of this version is that it goes over the HTTP API
which can be made backwards compatible.
Some things are different:
The query field is sent directly to the other node rather than parsed
on the coordinating node. This should allow it to support constructs
that are invalid on the coordinating node but are valid on the target
node. Mostly, that means old syntax.
This change activates the doc_values on the _size field for indices created after 5.0.0-alpha4.
It also adds a note in the breaking changes that explain the situation and how to get around it.
Closes#18334
Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys.
The first approach I considered was to use the node's published address as the base for the id. We already [treat nodes with the same address as the same](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java#L387) so this is a simple change (see [here](https://github.com/elastic/elasticsearch/compare/master...bleskes:node_persistent_id_based_on_address)). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same.
Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I
It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join.
Last, and most importantly, this change requires that *all* nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking.
Other less important notes:
- DummyTransportAddress is removed as we need a unique network address per node. Use `LocalTransportAddress.buildUnique()` instead.
- I renamed `node.add_lid_to_custom_path` to `node.add_lock_id_to_custom_path` to avoid confusion with the node ID which is now part of the `NodeEnvironment` logic.
- I removed the `version` paramater from `MetaDataStateFormat#write` , it wasn't really used and was just in the way :)
- TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.
Rename `fields` to `stored_fields` and add `docvalue_fields`
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
We introduced a special response_body assertion to test our docs snippets. The match assertion does the same job though and can be reused and adapted where needed. ResponseBodyAssertion contains provides much better and accurate errors though, which can be now utilized in MatchAssertion so that many more REST tests can benefit from readable error messages.
Each response body gets always stashed and can be retrieved for later evaluations already. Instead of providing the response body as strings that get parsed to json objects separately, then converted to maps as ResponseBodyAssertion did, we parse everything once, the json is part of the yaml test, which is supported. The only downside is that json comments cannot be used, rather yaml comments should be used (// C style vs # ). There were only two docs tests that were using comments in ingest-node.asciidoc where I went ahead and remove the comments which didn't seem that useful anyways.
Update-By-Query and Delete-By-Query use internal versioning to update/delete documents. But documents can have a version number equal to zero using the external versioning... making the UBQ/DBQ request fail because zero is not a valid version number and they only support internal versioning for now. Sequence numbers might help to solve this issue in the future.
As discussed at https://github.com/elastic/elasticsearch-cloud-azure/issues/91#issuecomment-229113595, we know that the current `discovery-azure` plugin only works with Azure Classic VMs / Services (which is somehow Legacy now).
The proposal here is to rename `discovery-azure` to `discovery-azure-classic` in case some users are using it.
And deprecate it for 5.0.
Closes#19144.
`RestHandler`s are highly tied to actions so registering them in the
same place makes sense.
Removes the need to for plugins to check if they are in transport client
mode before registering a RestHandler - `getRestHandlers` isn't called
at all in transport client mode.
This caused guice to throw a massive fit about the circular dependency
between NodeClient and the allocation deciders. I broke the circular
dependency by registering the actions map with the node client after
instantiation.
This pull request adds two util functions to the Mustache templating engine:
- {{#toJson}}my_map{{/toJson}} to render a Map parameter as a JSON string
- {{#join}}my_iterable{{/join}} to render any iterable (including arrays) as a comma separated list of values like `1, 2, 3`. It's also possible de change the default delimiter (comma) to something else.
closes#18970
These are useful methods in groovy that give you control over
the replacements used:
```
'the quick brown fox'.replaceAll(/[aeiou]/,
m -> m.group().toUpperCase(Locale.ROOT))
```
Instead of implementing onModule(ActionModule) to register actions,
this has plugins implement ActionPlugin to declare actions. This is
yet another step in cleaning up the plugin infrastructure.
While I was in there I switched AutoCreateIndex and DestructiveOperations
to be eagerly constructed which makes them easier to use when
de-guice-ing the code base.
This commit modifies TimeValue parsing to keep the input time unit. This
enables round-trip parsing from instances of String to instances of
TimeValue and vice-versa. With this, this commit removes support for the
unit "w" representing weeks, and also removes support for fractional
values of units (e.g., 0.5s).
Relates #19102