Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.
This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.
NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.
This is a 7.0 only change.
Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that.
This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.
Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.
The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:
"_clusters" : {
"total" : 3,
"successful" : 2,
"skipped" : 1
}
Such section won't be part of the response if no clusters have been skipped.
The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
The current script service has a script compilation limit for a one
minute window. This is set to a small default value of 15. Instead of
increasing that default value, this commit introduces a new setting
that allows to configure a rate per time unit, so that the script service can deal with bursts better.
The new setting is named `script.max_compilations_rate`,
requires a nonnegative number and a positive time value.
The default is `75/5m`, which is equivalent to the existing 15 per minute.
* Migrate migration docs from 6.0 to 7.0
Since we only keep one version of migration docs and master is now on 7.0, we
should migrate these so breaking changes can be added in the right place.
* Remove release notes as well
They link to the migration guides, so they have to go.
* Add placeholder notes for 7.0 so doc build is happy
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
* Adds nodes usage API to monitor usages of actions
The nodes usage API has 2 main endpoints
/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
Still to do:
* [x] Create usage service to store usage statistics
* [x] Record usage in REST layer
* [x] Add Transport Actions
* [x] Add REST Actions
* [x] Tests
* [x] Documentation
* Rafactors UsageService so counts are done by the handlers
* Fixing up docs tests
* Adds a name to all rest actions
* Addresses review comments
* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results.
Closes#23674
Adds CONSOLE to cross-cluster-search docs but skips them for testing
because we don't have a second cluster set up. This gets us the
`VIEW IN CONSOLE` and `COPY AS CURL` links and makes sure that they
are valid yaml (not json, technically) but doesn't get testing.
Which is better than we had before.
Adds CONSOLE to the dynamic templates docs and ingest-node docs.
The ingest-node docs contain a *ton* of non-console snippets. We
might want to convert them to full examples later, but that can be
a separate thing.
Relates to #18160
Rewrites most of the snippets in the `innert_hits` docs to be
complete examples and enables `VIEW IN CONSOLE`, `COPY AS CURL`,
and automatic testing of the snippets.
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
They needed to be updated now that Painless is the default and
the non-sandboxed scripting languages are going away or gone.
I dropped the entire section about customizing the classloader
whitelists. In master this barely does anything (exposes more
things to expressions).
Converts the analysis docs to that were marked as json into `CONSOLE`
format. A few of them were in yaml but marked as json for historical
reasons. I added more complete examples for a few of the less obvious
sounding ones.
Relates to #18160
The pattern-analyzer docs contained a snippet that was an expanded
regex that was marked as `[source,js]`. This changes it to
`[source,regex]`.
The htmlstrip-charfilter and pattern-replace-charfilter docs had
examples that were actually a list of tokens but marked `[source,js]`.
This marks them as `[source,text]` so they don't count as unconverted
CONSOLE snippets.
The pattern-replace-charfilter also had a doc who's test was
skipped because of funny interaction with the test framework. This
fixes the test.
Three more down, eighty-two to go.
Relates to #18160
CONSOLEifies the lang-analyzer docs and replaces the (invalid)
empty `keyword_marker` setups that were on the page with one
that contains the word "example" translated into the appropriate
language.
Relates to #18160
All the docs for the `exists` query that aren't marked as `CONSOLE`
aren't actually `CONSOLE`-worthy so this marks them as `NOTCONSOLE`.
It also rewrites the text around `missing` query. Since it was
removed in 5.0 we don't need to talk about it in the 6.0 docs.
Relates to #18160
Turns the top example in each of the geo aggregation docs into a working
example that can be opened in CONSOLE. Subsequent examples can all also
be opened in console and will work after you've run the first example.
All examples are tested as part of the build.
Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to.
Closes#21980
* [TEST] fix indentation in matrix_stats yaml tests
* [TEST] fix indentation in painless yaml test
* [TEST] fix indentation in analysis yaml tests
* [TEST] fix indentation in generated docs yaml tests
* [TEST] fix indentation in multi_cluster_search yaml tests
This change exposes the new Lucene graph based word delimiter token filter in the analysis filters.
Unlike the `word_delimiter` this token filter named `word_delimiter_graph` correctly handles multi terms expansion at query time.
Closes#23104
Gradle's finalizedBy on tasks only ensures one task runs after another,
but not immediately after. This is problematic for our integration tests
since it allows multiple project's integ test clusters to be
simultaneously. While this has not been a problem thus far (gradle 2.13
happened to keep the finalizedBy tasks close enough that no clusters
were running in parallel), with gradle 3.3 the task graph generation has
changed, and numerous clusters may be running simultaneously, causing
memory pressure, and thus generally slower tests, or even failure if the
system has a limited amount of memory (eg in a vagrant host).
This commit reworks how integ tests are configured. It adds an
`integTestCluster` extension to gradle which is equivalent to the current
`integTest.cluster` and moves the rest test runner task to
`integTestRunner`. The `integTest` task is then just a dummy task,
which depends on the cluster runner task, as well as the cluster stop
task. This means running `integTest` in one project will both run the
rest tests, and shut down the cluster, before running `integTest` in
another project.
This commit brings the snapshot documentation in conformity
with the CONSOLE format, and fixes the docs so that the documentation
tests can be run against them.
These need to be CONSOLEified *now* because we're starting to
require Content-Type headers and they didn't have any.
* cluster/reroute: Marked as CONSOLE but skipped because the docs
build runs with a single node.
* docs/bulk: Marked as NOTCONSOLE because the snippets describe
either examples or `curl` commands. Fixed the `curl` command to
include the `Content-Type` header.
* query-dsl/terms-query: Marked as CONSOLE.
* search/request/rescore: Marked as CONSOLE. Fixed deprecated
syntax.
Relates #23001
Relates #18160
This adds the `COPY AS CURL` and `VIEW IN CONSOLE` links to the docs
and causes the snippets to be tested during Elasticsearch's build.
Relates to #18160
This adds the `COPY AS CURL` and `VIEW IN CONSOLE` buttons to the
docs and makes the build execute the snippets as part of `docs:check`.
Relates to #18160
This change also removes the reference to the difference bewteen full name and index name.
They are always the same since 2.x and `name` does not refer anymore to `author.name` automatically.
A simple pattern must be used instead.
Remove redundant code that checks the field name twice.
Added missing CONSOLE scripts to documentation for sampler and diversified_sampler aggs.
Includes new StackOverflow index setup in build.gradle
Closes#22746
* Formatting tweaks
* Add top hits collapsing to search request
The field collapsing is done with a custom top docs collector that "collapse" search hits with same field value.
The distributed aspect is resolve using the two passes that the regular search uses. The first pass "collapse" the top hits, then the coordinating node merge/collapse the top hits from each shard.
```
GET _search
{
"collapse": {
"field": "category",
}
}
```
This change also adds an ExpandCollapseSearchResponseListener that intercepts the search response and expands collapsed hits using the CollapseBuilder#innerHit} options.
The retrieval of each inner_hits is done by sending a query to all shards filtered by the collapse key.
```
GET _search
{
"collapse": {
"field": "category",
"inner_hits": {
"size": 2
}
}
}
```
Adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the snippets
in the `value_count` docs and causes the build to execute the snippets
for testing.
Release #18160
This adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the
snippets in the docs for the `date_range` aggregation and tests
those snippets as part of the build.
Relates to #18160
This adds the `VIEW IN SENSE` and `COPY AS CURL` links and has
the build automatically execute the snippets and verify that they
work.
Relates to #18160
Adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the example
`global` aggregation. Also improves the example by adding a
non-`global` aggregation to compare it to.
Relates to #18160
This change allows specifying alias/wildcard expression in indices_boost.
And added another format for specifying indices_boost. It accepts array of index name and boost pair.
If an index is included in multiple aliases/wildcard expressions, the first match will be used.
With new format, old format is marked as deprecated.
Closes#4756
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.
Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.
When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
Integrate the patch from LUCENE-6664 into elasticsearch and
add support for handling a graph token stream in match/multi-match
queries.
This fixes longstanding bugs with multi-token synonyms returning
incorrect results with proximity queries.
We plan to deprecate `_suggest` during 5.0 so it isn't worth fixing
it to support the `_source` parameter for `_source` filtering. But we
should fix the docs so they are accurate.
Since this removes the last non-`// CONSOLE` line in
`completion-suggest.asciidoc` this also removes it from the list of
files that have non-`// CONSOLE` docs.
Closes#20482
Converts docs for `_cat/segments`, `_cat/plugins` and `_cat/repositories`
from `curl` to `// CONSOLE` so they are tested as part of the build and
are cleaner to use in Console. They should work fine with `curl` with
the `COPY AS CURL` link.
Also swaps the `source` type of the response from `js` to `txt` because
that is more correct. The syntax highlighter doesn't care. It looks at
the text to figure out the language. So it looks a little funny for `_cat`
responses regardless.
Relates to #18160
This allows you to whitelist `localhost:*` or `127.0.10.*:9200`.
It explicitly checks for patterns like `*` in the whitelist and
refuses to start if the whitelist would match everything. Beyond
that the user is on their own designing a secure whitelist.
Relates to #18160
Uses the new sorting (#20658) in the `_cat` API to support all use
cases natively. We can still resort to piping things through `sort`
if we need to, but we don't have to for basic stuff like sorting!
Removes a non-existant file from the non-`// CONSOLE` list. If you
remove all the non-`// CONSOLE` snippets from a file it must be
removed from the whitelist for the build to pass. Removing the file
entirely counts as removing the snippets.
Related to #18160
This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.
Relates to #18160
* plugins/discovery-azure-class.asciidoc
* reference/cluster.asciidoc
* reference/modules/cluster/misc.asciidoc
* reference/modules/indices/request_cache.asciidoc
After this is merged there will be no unconvereted snippets outside
of `reference`.
Related to #18160
This tracks the snippets that probably should be converted to
`// CONSOLE` or `// TESTRESPONSE` and fails the build if the list
of files with such snippets doesn't match the list in `docs/build.gradle`.
Setting the file looks like
```
/* List of files that have snippets that probably should be converted to
* `// CONSOLE` and `// TESTRESPONSE` but have yet to be converted. Try and
* only remove entries from this list. When it is empty we'll remove it
* entirely and have a party! There will be cake and everything.... */
buildRestTests.expectedUnconvertedCandidates = [
'plugins/discovery-azure-classic.asciidoc',
...
'reference/search/suggesters/completion-suggest.asciidoc',
]
```
This list is in `build.gradle` because we expect it to be fairly
temporary. In a few months we'll have converted all of the docs and won't
ned it any more.
From now on if you add now docs that contain a snippet that shows an
interaction with elasticsearch you have three choices:
1. Stick `// CONSOLE` on the interactions and `// TESTRESPONSE` on the
responses. The build (specifically (`gradle docs:check`) will test that
these interactions "work". If there isn't a `// TESTRESPONSE` snippet
then "work" just means "Elasticsearch responds with a 200-level response
code and no `WARNING` headers. This is way better than nothing.
2. Add `// NOTCONSOLE` if the snippet isn't actually interacting with
Elasticsearch. This should only be required for stuff like javascript
source code or `curl` against an external service like AWS or GCE. The
snippet will not get "OPEN IN CONSOLE" or "COPY AS CURL" buttons or be
tested.
3. Add `// TEST[skip:reason]` under the snippet. This will just skip the
snippet in the test phase. This should really be reserved for snippets
where we can't test them because they require an external service that
we don't have at testing time.
Please, please, please, please don't add more things to the list. After
all, it sais there'll be cake when we remove it entirely!
Relates to #18160
Most of the examples in the pipeline aggregation docs use a small
"sales" test data set and I converted all of the examples that use
it to `// CONSOLE`. There are still a bunch of snippets in the pipeline
aggregation docs that aren't `// CONSOLE` so they aren't tested. Most
of them are "this is the most basic form of this aggregation" so they
are more immune to errors and bit rot then the examples that I converted.
I'd like to do something with them as well but I'm not sure what.
Also, the moving average docs and serial diff docs didn't get a lot of
love from this pass because they don't use the test data set or follow
the same general layout.
Relates to #18160