Upgrade to lucene-7.4.0-snapshot-1ed95c097b
This version contains:
* An Analyzer for Korean
* An IntervalQuery and IntervalsSource that retrieve minimum intervals of positional queries.
* A new API to retrieve matches (offsets and positions) of a query for a single document.
* Support for soft deletes in the index writer.
* A fixed shingle filter that handles index time synonyms.
* Support for emoji sequence in ICUTokenizer (with an upgrade to icu 61.1)
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
This new snapshot mostly brings a change to TopFieldCollector which can now
early terminate collection when trackTotalHits is `false`.
As a follow-up, we should replace our usage of
`EarlyTerminatingSortingCollector` with this new option.
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
The current script service has a script compilation limit for a one
minute window. This is set to a small default value of 15. Instead of
increasing that default value, this commit introduces a new setting
that allows to configure a rate per time unit, so that the script service can deal with bursts better.
The new setting is named `script.max_compilations_rate`,
requires a nonnegative number and a positive time value.
The default is `75/5m`, which is equivalent to the existing 15 per minute.
Today we expose `IndexFieldDataService` outside of IndexService to do maintenance
or lookup field data in different ways. Yet, we have a streamlined way to access IndexFieldData
via `QueryShardContext` that should encapsulate all access to it. This also ensures that we control all other functionality like cache clearing etc.
This change also removes the `recycler` option from `ClearIndicesCacheRequest` this option is a no-op and should have been removed long ago.
Today when we aggregate on the `_index` field the cross cluster search
alias is not taken into account. Neither is it respected when we search
on the field. This change adds support for cluster alias when the cluster
alias is present on the `_index` field.
Closes#25606
Also has updates to ScriptMetaData for allowing the old namespace format to be loaded all the way back through 5.0; however, it will throw an exception if two scripts share the same id but different languages.
Requests that execute a stored script will no longer be allowed to specify the lang of the script. This information is stored in the cluster state making only an id necessary to execute against. Putting a stored script will still require a lang.
Most notable changes:
- better update concurrency: LUCENE-7868
- TopDocs.totalHits is now a long: LUCENE-7872
- QueryBuilder does not remove the boolean query around multi-term synonyms:
LUCENE-7878
- removal of Fields: LUCENE-7500
For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding
of vInts and vLongs are compatible: you can write and read with any of them as
long as the value can be represented by a positive int.
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
This commit renames the needsScores method so as to make it
automatically generatable, based on the name of the `_score` variable
which is available in search scripts. It also adds documentation to
ScriptContext to explain the naming and signature of such methods.
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
ScriptContexts currently understand a FactoryType that can produce
instances of the script InstanceType. However, for search scripts, this
does not work as we have the concept of LeafSearchScript that is created
per lucene segment. This commit effectively renames the existing
SearchScript class into SearchScript.LeafFactory, which is a new,
optional, class that can be defined within a ScriptContext.
LeafSearchScript is effectively renamed back into SearchScript. This
change allows the model of stateless factory -> stateful factory ->
script instance to continue, but in a generic way that any script
context may take advantage of.
relates #20426
This commit renames the concept of the "compiled type" to a "factory
type", along with all implementations of this class to be named Factory.
This brings it inline with the classes purpose.
This commit adds collection of all contexts to the parameters of
getScriptEngine. This will allow script engines like painless to
precache extra information about the contexts.
This is a simple refactoring to move the context definitions into the
type that they use. While we have multiple context names for the same
class at the moment, this will eventually become one ScriptContext per
instance type, so the pattern of a static member on the interface called
CONTEXT can be used. This commit also moves the consolidated list of
contexts provided by core ES into ScriptModule.
This commit changes the compile method of ScriptEngine to be generic in
the same way it is on ScriptService. This moves the shim of handling the
two existing context classes into each script engine, so that each
engine can be worked on independently to convert to real handling of
contexts.
This commit modifies the compile method of ScriptService to be context
aware. The ScriptContext is now a generic class which contains both the
instance type and compiled type for a script. Instance type may be
stateful (for example, pre loading field information for the index a
script will execute on, like in expressions), while the compiled type is
stateless and used to construct instance type instances. This change is
only a first step to cutover ScriptService to the new paradigm. It only
converts callers to the script service, and has a small shim to wrap
compilation from the script engines to support the current two fixed
instance types, SearchScript and ExecutableScript.
Since groovy was removed, we no longer have any ScriptEngines with
resources to release. We may want to keep the option open for a script
engine to close resources, but this would not be common. This commit
adds a default implementation to ScriptEngine for `close()` to reduce
the boiler plate that must be added for a ScriptEngine implementation.
ScriptEngine implementations have an overridable method to indicate they
are safe to use as inline scripts. Since groovy was removed fro 6.0,
there are no longer any implementations which used the default false
value. Furthermore, the value was not actually read anywhere. This
commit removes the method. The ScriptEngineRegistry was also no longer
necessary as it only was used to build a map from language to engine.
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
This commit renames ScriptEngineService to ScriptEngine. It is often
confusing because we have the ScriptService, and then
ScriptEngineService implementations, but the latter are not services as
we see in other places in elasticsearch.
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.
Some notes about the change:
- it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
- it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
- two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
The getProperty method is an internal method needed to run pipeline aggregations and retrieve info by path from the aggs tree. It is not needed in the MultiBucketsAggregation.Bucket interface, which is returned to users running aggregations from the transport client. The method is moved to the InternalMultiBucketAggregation class as that's where it belongs.
Gradle's finalizedBy on tasks only ensures one task runs after another,
but not immediately after. This is problematic for our integration tests
since it allows multiple project's integ test clusters to be
simultaneously. While this has not been a problem thus far (gradle 2.13
happened to keep the finalizedBy tasks close enough that no clusters
were running in parallel), with gradle 3.3 the task graph generation has
changed, and numerous clusters may be running simultaneously, causing
memory pressure, and thus generally slower tests, or even failure if the
system has a limited amount of memory (eg in a vagrant host).
This commit reworks how integ tests are configured. It adds an
`integTestCluster` extension to gradle which is equivalent to the current
`integTest.cluster` and moves the rest test runner task to
`integTestRunner`. The `integTest` task is then just a dummy task,
which depends on the cluster runner task, as well as the cluster stop
task. This means running `integTest` in one project will both run the
rest tests, and shut down the cluster, before running `integTest` in
another project.
In order to support the evolving GeoPoint encodings in Lucene 5 and 6, ES 2.x and 5.x implements an abstraction layer to the GeoPointFieldMapper classes. As of 5.x the geo_point field mapper settled on using Lucene's more performant LatLonPoint field type and deprecated all other encodings. In 6.0 all encodings except LatLonPoint have been removed rendering this abstraction layer useless. This commit removes the abstraction layer and renames the LatLonPointFieldMapper back to GeoPointFieldMapper to mantain consistency with ES field naming.