This new snapshot contains the following JIRAs that we're interested in:
- [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525)
Better handling of small documents. This should improve retrieval times
when documents are less than ~1kB.
- [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510)
Faster flushes when index sorting is enabled by not compressing the
temporary files that store stored fields and term vectors.
This commit removes `integTest` task from all es-plugins.
Most relevant projects have been converted to use yamlRestTest, javaRestTest,
or internalClusterTest in prior PRs.
A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)
related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.
To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.
As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.
With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.
Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.
This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.
Relates to #59332
Co-authored-by: Nik Everett <nik9000@gmail.com>
This commit moves the modules REST tests to the
newly introduced yamlRestTest source set. A few
tests have also been re-named to include the correct
IT suffix. Without changing the names, the testing
conventions task would fail since now that the YAML
tests are no longer present pacify the convention.
These tests have moved to the internalClusterTest
source set.
related: #56841
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.
The remaining cases in modules, plugins, and x-pack will be handled in followups.
This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.
The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.
Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).
As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.
Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
* Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build
* Fix compile usages in 7.x branch
MappedFieldType is a combination of two concerns:
* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.
Relates to #56814
Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.
After commit #53661 converted the lang-expressions module to using
DoubleValuesSource, we've seen a performance regression for expressions
that use geopoints. Some investigation suggests that this may be due to
GeoLatitudeValueSource and GeoLongitudeValueSource wrapping their
per-document values in a DoubleValues.withDefault() class. Values exposed
via expressions already have a '0' default value, so this extra wrapping is
unnecessary, and is directly on the hot path. This commit removes the extra
wrapping.
DoubleValuesSource is the type-safe replacement for ValueSource in the lucene
core. Most of elasticsearch has moved to use these, but lang-expressions is still
using the old version. This commit migrates lang-expressions as well.
Lucene 8.5.0 release candidates are imminent. This commit upgrades master to use
the latest snapshot to check that there are no last-minute bugs or regressions.
Avoid backwards incompatible changes for 8.x and 7.6 by removing type
restriction on compile and Factory. Factories may optionally implement
ScriptFactory. If so, then they can indicate determinism and thus
cacheability.
**Backport**
Relates: #49466
In order to cache script results in the query shard cache, we need to
check if scripts are deterministic. This change adds a default method
to the script factories, `isResultDeterministic() -> false` which is
used by the `QueryShardContext`.
Script results were never cached and that does not change here. Future
changes will implement this method based on whether the results of the
scripts are deterministic or not and therefore cacheable.
Refs: #49466
**Backport**
Adds `GET /_script_language` to support Kibana dynamic scripting
language selection.
Response contains whether `inline` and/or `stored` scripts are
enabled as determined by the `script.allowed_types` settings.
For each scripting language registered, such as `painless`,
`expression`, `mustache` or custom, available contexts for the language
are included as determined by the `script.allowed_contexts` setting.
Response format:
```
{
"types_allowed": [
"inline",
"stored"
],
"language_contexts": [
{
"language": "expression",
"contexts": [
"aggregation_selector",
"aggs"
...
]
},
{
"language": "painless",
"contexts": [
"aggregation_selector",
"aggs",
"aggs_combine",
...
]
}
...
]
}
```
Fixes: #49463
**Backport**
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.
Closes#43599
While function scores using scripts do allow explanations, they are only
creatable with an expert plugin. This commit improves the situation for
the newer script score query by adding the ability to set the
explanation from the script itself.
To set the explanation, a user would check for `explanation != null` to
indicate an explanation is needed, and then call
`explanation.set("some description")`.
_score was removed from use in aggregations using expressions
unintentionally when script contexts were added. This allows _score to once
again be used.
Backport of #39350
Contains the following:
* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.
Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.
The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
Includes the following:
* Reversion of doc-values changes in LUCENE-8374; we are interested in seeing if this
has an effect on benchmarks for node-stats and index-stats
* More improvements to docvalues updates
Includes:
LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates
Relates #36286
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
This change fixes#35351. Users were no longer able to return types of numbers other than doubles for bucket aggregation scripts. This change reverts to the previous behavior of being able to return any type of number and having it converted to a double outside of the script.
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
* SCRIPTING: Add Expr. Compile for TermSetQuery Ctx.
* Follow up to #33602 adding the ability to compile TermsSetQuery
scripts with the expressions engine in the same way we support
SearchScript in Expressions
* Duplicated the code here for now to make the change less complex,
the only difference to SearchScript is that `_score` and `_value` are not handled for TermsSetQuery
* remove redundant check
* MINOR: Drop Redundant Ctx. Check in ScriptService
* This check is completely redundant, the expression script
engine will throw anyway (and with a similar message) for
those contexts that it cannot compile. Moreover, the update context
is not the only context that is not suported by the expression engine
at this point so handling the update context separately here makes
no sense.
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.
Relates #22868
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
The main highlight is the removal of the reclaim_deletes_weight in the TieredMergePolicy.
The es setting index.merge.policy.reclaim_deletes_weight is deprecated in this commit and the value is ignored. The new merge policy setting setDeletesPctAllowed should be added in a follow up.
* Add basic support for field aliases in index mappings. (#31287)
* Allow for aliases when fetching stored fields. (#31411)
* Add tests around accessing field aliases in scripts. (#31417)
* Add documentation around field aliases. (#31538)
* Add validation for field alias mappings. (#31518)
* Return both concrete fields and aliases in DocumentFieldMappers#getMapper. (#31671)
* Make sure that field-level security is enforced when using field aliases. (#31807)
* Add more comprehensive tests for field aliases in queries + aggregations. (#31565)
* Remove the deprecated method DocumentFieldMappers#getFieldMapper. (#32148)
* remove left-over comment
* make sure of the property for plugins
* skip installing modules if these exist in the distribution
* Log the distrbution being ran
* Don't allow running with integ-tests-zip passed externally
* top level x-pack/qa can't run with oss distro
* Add support for matching objects in lists
Makes it possible to have a key that points to a list and assert that a
certain object is present in the list. All keys have to be present and
values have to match. The objects in the source list may have additional
fields.
example:
```
match: { 'nodes.$master.plugins': { name: ingest-attachment } }
```
* Update plugin and module tests to work with other distributions
Some of the tests expected that the integration tests will always be ran
with the `integ-test-zip` distribution so that there will be no other
plugins loaded.
With this change, we check for the presence of the plugin without
assuming exclusivity.
* Allow modules to run on other distros as well
To match the behavior of tets.distributions
* Add and use a new `contains` assertion
Replaces the previus changes that caused `match` to do a partial match.
* Implement PR review comments
Currently this class takes care of moth selecting the relevant value, and
replacing missing values if any. This is fine for sorting, which always needs
to do both at the same time, but we also have a number of aggregations and
script utils that need to retain information about missing values so this change
proposes to decouple selection of the relevant value and replacement of missing
values.
This snapshot includes:
- LUCENE-8341: Record soft deletes in SegmentCommitInfo which will resolve#30851
- LUCENE-8335: Enforce soft-deletes field up-front