This commit adds the SPDX Apache-2.0 license header along with an additional
copyright header for all modifications.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit refactors the remaining o.e.index and o.e.test packages in the
test/fixtures module. References throughout the codebase are also refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
This commit refactors the following test packages from the o.e namespace:
* o.e.action
* o.e.bootstrap
* o.e.cli
* o.e.client
* o.e.cluster
* o.e.common
to the o.opensearch namespace. Any references throughout the codebase are also
refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Refactor the code in the `libs/x-content` module and any references to those in the entire code base. The refactoring is done as part of the renaming to OpenSearch work.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit refactors o.e.common.settings package to the
o.opensearch.common.setttings namespace. All references throughout the codebase
are refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* Rename org.elasticsearch.gateway to org.opensearch.gateway
Signed-off-by: Harold Wang <harowang@amazon.com>
* Rename org.elasticsearch.http to org.opensearch.http
Signed-off-by: Harold Wang <harowang@amazon.com>
* Renames org.elasticsearch.plugins to org.opensearch.plugins
Signed-off-by: Harold Wang <harowang@amazon.com>
Refactor the remaining classes in the `org.elasticsearch.search` package in the server module,
- Rename `org.elasticserach.search.aggregations` to `org.opensearch.search.aggregations`
- Rename instances of `org.elasticsearch.search` `org.opensearch.search`
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit refactors the o.e.common.util package to the
o.opensearch.common.util namespace. All references throughout the codebase have
been refactored.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Rename `org.elasticserach.search` to `org.opensearch.search` in package names and references for top level classes in the search package.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
Rename `org.elasticserach.search.aggregations.support` to `org.opensearch.search.aggregations.support` in package names and references.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
Rename `org.elasticserach.search.aggregations.metrics` to `org.opensearch.search.aggregations.metrics` in package names and references.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
* [Rename] o.e.common subpackages round 1
This commit refactors the following subpackages of o.e.common:
* o.e.common.joda
* o.e.common.lease
* o.e.common.metrics
* o.e.common.network
* o.e.common.path
* o.e.common.recycling
* o.e.common.regex
* o.e.common.rounding
* o.e.common.text
* o.e.common.time
* o.e.common.transport
to the o.opensearch namespace. All references throughout the codebase have been
refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
* fix imports 1
Signed-off-by: Nicholas Knize <nknize@amazon.com>
Rename `org.elasticserach.search.aggregations.pipeline` to `org.opensearch.search.aggregations.pipeline` in package names and references.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit refactors the following packages:
* o.e.common.geo
* o.e.common.hash
* o.e.common.io
into the o.opensearch.common namespace. All references throughout the codebase
have been refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
As part of this commit we refactor the following in the o.e.search package:
- rename `org.elasticsearch.search.fetch` to `org.opensearch.search.fetch`
- rename `org.elasticsearch.search.internal` to `org.opensearch.search.internal`
- rename `org.elasticsearch.search.profile` to `org.opensearch.search.profile`
- rename `org.elasticsearch.search.query` to `org.opensearch.search.query`
- rename `org.elasticsearch.search.suggest` to `org.opensearch.search.suggest`
- rename other instances of Elasticsearch to OpenSearch in these packages.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit refactors classes under o.e.common to o.opensearch.common. All
references throughout the codebase have also been refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit refactors all OpenSearch classes in the root server package to
o.opensearch. All references throughout the codebase are also refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
Refactor the package`org.elasticsearch.script` in server module to rename it to`org.opensearch.script`.
Signed-off-by: Rabi Panda <adnapibar@gmail.com>
This commit refactors the ElasticsearchException class located in the server module
to OpenSearchException. References and usages throughout the rest of the
codebase are fully refactored.
Signed-off-by: Nicholas Knize <nknize@amazon.com>
This commit removes the ability to test the top level result of an aggregator
before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search
are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test
the final output (the one sent to the end user) rather than an intermediary result
that could be different.
This change also removes spurious commits triggered on top of a random index writer.
These commits slow down the tests and are redundant with the commits that the
random index writer performs.
This commit moves the modules REST tests to the
newly introduced yamlRestTest source set. A few
tests have also been re-named to include the correct
IT suffix. Without changing the names, the testing
conventions task would fail since now that the YAML
tests are no longer present pacify the convention.
These tests have moved to the internalClusterTest
source set.
related: #56841
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
sub-aggregations live under multi-bucket aggregations. This is
normally true but `parentCardinality` is properly carried forward
for single bucket aggregations like `filter` and for multi-bucket
aggregations configured in single-bucket for like `range` with a
single range.
While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.
Relates to #56487
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
MappedFieldType is a combination of two concerns:
* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.
Relates to #56814
Some aggregations, such as the Terms* family, will use an alternate
class to represent unmapped shard results (while the rest of the aggs
use the same object but with some form of "empty" or "nullish" values
to represent unmapped).
This was problematic with AbstractWireSerializingTestCase because it
expects the instanceReader to always match the original class. Instead,
we need to use the NamedWriteable version so that the registry
can be consulted for the proper deserialization reader.
* Add ValuesSource Registry and associated logic (#54281)
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638)
* ValuesSourceRegistry Prototype (#48758)
* Remove generics from ValuesSource related classes (#49606)
* fix percentile aggregation tests (#50712)
* Basic thread safety for ValuesSourceRegistry (#50340)
* Remove target value type from ValuesSourceAggregationBuilder (#49943)
* Cleanup default values source type (#50992)
* CoreValuesSourceType no longer implements Writable (#51276)
* Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131)
* Put values source types on fields (#51503)
* Remove VST Any (#51539)
* Rewire terms agg to use new VS registry (#51182)
Also adds some basic AggTestCases for untested code
paths (and boilerplate for future tests once the IT are
converted over)
* Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337)
* Wire Percentiles aggregator into new VS framework (#51639)
This required a bit of a refactor to percentiles itself. Before,
the Builder would switch on the chosen algo to generate an
algo-specific factory. This doesn't work (or at least, would be
difficult) in the new VS framework.
This refactor consolidates both factories together and introduces
a PercentilesConfig object to act as a standardized way to pass
algo-specific parameters through the factory. This object
is then used when deciding which kind of aggregator to create
Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will
be moved in a subsequent PR.
* Remove generics and target value type from MultiVSAB (#51647)
* fix checkstyle after merge (#52008)
* Plumb ValuesSourceRegistry through to QuerySearchContext (#51710)
* Convert RareTerms to new VS registry (#52166)
* Wire up Value Count (#52225)
* Wire up Max & Min aggregations (#52219)
* ValuesSource refactoring: Wire up Sum aggregation (#52571)
* ValuesSource refactoring: Wire up SigTerms aggregation (#52590)
* Soft immutability for VSConfig (#52729)
* Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734)
Also fixes Percentiles which was incorrectly specified to only accept
numeric, but in fact also accepts Boolean and Date (because those are
numeric on master - thanks `testSupportedFieldTypes` for catching it!)
* VS refactoring: Wire up stats aggregation (#52891)
* ValuesSource refactoring: Wire up string_stats aggregation (#52875)
* VS refactoring: Wire up median (MAD) aggregation (#52945)
* fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java
this commit implements `getValuesSourceType` for
the ConstantKeyword field type.
master was merged into feature/extensible-values-source
introducing a new field type that was not implementing
`getValuesSourceType`.
* ValuesSource refactoring: Wire up Avg aggregation (#52752)
* Wire PercentileRanks aggregator into new VS framework (#51693)
* Add a VSConfig resolver for aggregations not using the registry (#53038)
* Vs refactor wire up ranges and date ranges (#52918)
* Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034)
This commit updates the geo_bounds aggregation to depend
on registering itself in the ValuesSourceRegistry
relates #42949.
* VS refactoring: convert Boxplot to new registry (#53132)
* Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037)
This commit updates the geo*_grid aggregations to depend
on registering itself in the ValuesSourceRegistry
relates to the values-source refactoring meta issue #42949.
* Wire-up geo_centroid agg to ValuesSourceRegistry (#53040)
This commit updates the geo_centroid aggregation to depend
on registering itself in the ValuesSourceRegistry.
relates to the values-source refactoring meta issue #42949.
* Fix type tests for Missing aggregation (#53501)
* ValuesSource Refactor: move histo VSType into XPack module (#53298)
- Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core.
- This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept
- Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs
* Wire up DateHistogram to the ValuesSourceRegistry (#53484)
* Vs refactor parser cleanup (#53198)
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
* First batch of easy fixes
* Remove List.of from ValuesSourceRegistry
Note that we intend to have a follow up PR dealing with the mutability
of the registry, so I didn't even try to address that here.
* More compiler fixes
* More compiler fixes
* More compiler fixes
* Precommit is happy and so am I
* Add new Core VSTs to tests
* Disabled supported type test on SigTerms until we can backport it's fix
* fix checkstyle
* Fix test failure from semantic merge issue
* Fix some metaData->metadata replacements that got lost
* Fix list of supported types for MinAggregator
* Fix list of supported types for Avg
* remove unused import
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
This removes pipeline aggregators from the aggregation result tree
except for a single field used for backwards compatibility with pre-7.8
versions of Elasticsearch. That field isn't populated unless we are
serializing to pre-7.8 Elasticsearch. So, good news! We no longer build
pipeline aggregators on the data node. Most of the time.
Removes pipeline aggregations from the aggregation result tree as they
are no longer used. This stops us from building the pipeline aggregators
at all on data nodes except for backwards compatibility serialization.
This will save a tiny bit of space in the aggregation tree which is
lovely, but the biggest benefit is that it is a step towards simplifying
pipeline aggregators.
This only does about half of the work to remove the pipeline aggs from
the tree. Removing all of it would, well, double the size of the change
and make it harder to review.
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.
This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.
This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.
Closes#53215
This begins to clean up how `PipelineAggregator`s and executed.
Previously, we would create the `PipelineAggregator`s on the data nodes
and embed them in the aggregation tree. When it came time to execute the
pipeline aggregation we'd use the `PipelineAggregator`s that were on the
first shard's results. This is inefficient because:
1. The data node needs to make the `PipelineAggregator` only to
serialize it and then throw it away.
2. The coordinating node needs to deserialize all of the
`PipelineAggregator`s even though it only needs one of them.
3. You end up with many `PipelineAggregator` instances when you only
really *need* one per pipeline.
4. `PipelineAggregator` needs to implement serialization.
This begins to undo these by building the `PipelineAggregator`s directly
on the coordinating node and using those instead of the
`PipelineAggregator`s in the aggregtion tree. In a follow up change
we'll stop serializing the `PipelineAggregator`s to node versions that
support this behavior. And, one day, we'll be able to remove
`PipelineAggregator` from the aggregation result tree entirely.
Importantly, this doesn't change how pipeline aggregations are declared
or parsed or requested. They are still part of the `AggregationBuilder`
tree because *that* makes sense.
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to these
parsers.
I found the non-final parsers with this:
```
diff \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
2>&1 | grep '^<'
```
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to a bunch
of these parsers, mostly the ones in xpack and their "paired" parsers in
the high level rest client. I picked these just to have somewhere to
break the up the change so it wouldn't be huge.
I found the non-final parsers with this:
```
diff \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
2>&1 | grep '^<'
```
Historically only two things happened in the final reduction:
empty buckets were filled, and pipeline aggs were reduced (since it
was the final reduction, this was safe). Usage of the final reduction
is growing however. Auto-date-histo might need to perform
many reductions on final-reduce to merge down buckets, CCS
may need to side-step the final reduction if sending to a
different cluster, etc
Having pipelines generate their output in the final reduce was
convenient, but is becoming increasingly difficult to manage
as the rest of the agg framework advances.
This commit decouples pipeline aggs from the final reduction by
introducing a new "top level" reduce, which should be called
at the beginning of the reduce cycle (e.g. from the SearchPhaseController).
This will only reduce pipeline aggs on the final reduce after
the non-pipeline agg tree has been fully reduced.
By separating pipeline reduction into their own set of methods,
aggregations are free to use the final reduction for whatever
purpose without worrying about generating pipeline results
which are non-reducible
This is a pure code rearrangement refactor. Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility). ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.
This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them.
The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`.
Relates #46523
AggregatorFactory was generic over itself, but it doesn't appear we
use this functionality anywhere (e.g. to allow the super class
to declare arguments/return types generically for subclasses to
override). Most places use a wildcard constraint, and even when a
concrete type is specified it wasn't used.
But since AggFactories are widely used, this led to
the generic touching many pieces of code and making type signatures
fairly complex
This PR is a backport a of #43214 from v8.0.0
A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java).
Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods.
This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.
MatrixStatsResults is the "final" result object, and runs an additional
computation in it's ctor to calculate covariance, etc. This means
it should only run on the final reduction instead of on every reduce.
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.
Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.
The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
This adds a set of helper classes to determine if an agg "has a value".
This is needed because InternalAggs represent "empty" in different
manners according to convention. Some use `NaN`, `+/- Inf`, `0.0`, etc.
A user can pass the Internal agg type to one of these helper methods
and it will report if the agg contains a value or not, which allows the
user to differentiate "empty" from a real `NaN`.
These helpers are best-effort in some cases. For example, several
pipeline aggs share a single return class but use different conventions
to mark "empty", so the helper uses the loosest definition that applies
to all the aggs that use the class.
Sums in particular are unreliable. The InternalSum simply returns 0.0
if the agg is empty (which is correct, no values == sum of zero). But this
also means the helper cannot differentiate from "empty" and `+1 + -1`.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028