* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
* A `terms` source, values are extracted from a field or a script.
* A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:
````
"test_agg": {
"terms": {
"field": "field1"
},
"aggs": {
"nested_test_agg":
"terms": {
"field": "field2"
}
}
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:
````
"composite_agg": {
"composite": {
"sources": [
{
"field1": {
"terms": {
"field": "field1"
}
}
},
{
"field2": {
"terms": {
"field": "field2"
}
}
},
}
}
````
The response of the aggregation looks like this:
````
"aggregations": {
"composite_agg": {
"buckets": [
{
"key": {
"field1": "alabama",
"field2": "almanach"
},
"doc_count": 100
},
{
"key": {
"field1": "alabama",
"field2": "calendar"
},
"doc_count": 1
},
{
"key": {
"field1": "arizona",
"field2": "calendar"
},
"doc_count": 1
}
]
}
}
````
By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:
````
"composite_agg": {
"composite": {
"after": {"field1": "alabama", "field2": "calendar"},
"size": 100,
"sources": [
{
"field1": {
"terms": {
"field": "field1"
}
}
},
{
"field2": {
"terms": {
"field": "field2"
}
}
}
}
}
````
This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:
````
"settings": {
"index.sort.field": ["field1", "field2"]
}
````
In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
Today if nested docs are used in an index that is split the operation
will only work correctly if the index is not routing partitioned or
unless routing is used. This change fixes the query that selectes the docs
to delete to also select all parents nested docs as well.
Closes#27378
This section was removed to hide this ability to new users.
This change restores the section and adds a warning regarding the expected performance.
Closes#27336
Right now our different transport implementations must duplicate
functionality in order to stay compliant with the requirements of
TcpTransport. They must all implement common logic to open channels,
close channels, keep track of channels for eventual shutdown, etc.
Additionally, there is a weird and complicated relationship between
Transport and TransportService. We eventually want to start merging
some of the functionality between these classes.
This commit starts moving towards a world where TransportService retains
all the application logic and channel state. Transport implementations
in this world will only be tasked with returning a channel when one is
requested, calling transport service when a channel is accepted from
a server, and starting / stopping itself.
Specifically this commit changes how channels are opened and closed. All
Transport implementations now return a channel type that must comply with
the new TcpChannel interface. This interface has the methods necessary
for TcpTransport to completely manage the lifecycle of a channel. This
includes setting the channel up, waiting for connection, adding close
listeners, and eventually closing.
This commit adds a note regarding not storing a plugin distribution in
the plugins directory during installation or instllation will fail.
Relates #27400
Right now we have unnecessary implementations of `TransportChannel`.
Additionally, there are methods on the interface that are not used. This
commit removes unnecessary implementations and methods.
The build currently does not work with Gradle 4.3.1 as the Gradle team stopped publishing the gradle-logging dependency to jcenter, starting with 4.3.1 (not sure why). There are two options:
- Add the repository managed by Gradle team (https://repo.gradle.org/gradle/libs-releases-local) to our build
- Use an older version (4.3) of the dependency when running with Gradle 4.3.1.
Not to depend on another external repo, I've chosen solution 2. Note that this solution could break on future versions, but as this is a compileOnly dependency, and the interface we use has been super stable since forever, I don't envision this to be an issue (and easily detected by a breaking build).
When the vagrant box is very very slow, the elasticsearch service can
take more than 60 sec to start. This commit changes the timeout to 120.
closes#27372
The existing log rotation configuration allowed the index
and search slow log to grow unbounded. This commit removes the
date based rotation and adds the same size based rotation, that
the depreciation log already has.
This commit changes the DefaultHttpRequestInitializer in order to make
it create new HttpIOExceptionHandler and HttpUnsuccessfulResponseHandler
for every new HTTP request instead of reusing the same two handlers for
all requests.
Closes#27092
* REST: Rename ingest.processor.grok to ingest.processor_grok
* REST: Rename remote.info to cluster.remote_info
* REST: Fixed bad YAML comments
* REST: Force dummy scripts to be strings, not numbers
* REST: Fix bad YAML in search/110_field_collapsing.yml
* REST: Adjust percentile tests to work with Perl number handling
The field data cache can come under heavy contention in cases when lots
of search threads are hitting it for doc values. This commit reduces the
amount of contention here by using a double-checked locking strategy to
only lock when the cache needs to be initialized.
Relates #27365
The Json Processor originally only supported parsing field values into Maps even
though the JSON spec specifies that strings, null-values, numbers, booleans, and arrays
are also valid JSON types. This commit enables parsing these values now.
response to #25972.
The toXContent method for IndexGraveYard (which is a collection of tombstones for explicitly marking indices as deleted in the cluster state) confused timeValue with dateField, resulting in output of the form "delete_date" : "23424.3d" instead of "delete_date":"2017-11-13T15:50:51.614Z".
The AWS SDK has a transitive dependency on Jackson Databind. While the
AWS SDK was recently upgraded, the Jackson Databind dependency was not
pulled along with it to the version that the AWS SDK depends on. This
commit upgrades the dependencies for discovery-ec2 and repository-s3
plugins to match versions on the AWS SDK transitive dependencies.
Relates #27361
When the current master node is shutting down, it sends a leave request to the other nodes so that they can eagerly start a fresh master election. Unfortunately, it was still possible for the master node that was shutting down to respond to ping requests, possibly influencing the election decision as it still appeared as an active master in the ping responses. This commit ensures that UnicastZenPing does not respond to ping requests once it's been closed. ZenDiscovery.doStop() continues to ensure that the pinging component is first closed before it triggers a master election.
Closes#27328
We use affix settings to group settings / values under a certain namespace.
In some cases like login information for instance a setting is only valid if
one or more other settings are present. For instance `x.test.user` is only valid
if there is an `x.test.passwd` present and vice versa. This change allows to specify
such a dependency to prevent settings updates that leave settings in an inconsistent
state.
This commit removes an unnecessary logger instance creation from the
constructor for doc values field data. This construction is expensive
for this oft-created class because of a synchronized block in the
constructor for the logger.
Relates #27349
This is the first step to supporting WKT (and other future) format(s). The ShapeBuilders are quite messy and can be simplified by decoupling the parse logic from the build logic. This commit refactors the parsing logic into its own package separate from the Shape builders. It also decouples the GeoShapeType into a standalone enumerator that is responsible for validating the parsed data and providing the appropriate builder. This future-proofs the code making it easier to maintain and add new shape types.
This is a followup to #26521. This commit expands the alias added for
the elasticsearch client codebase to all codebases. The original full
jar name property is left intact. This only adds an alias without the
version, which should help ease the pain in updating any versions (ES
itself or dependencies).
Queries that create a scroll context cannot use the cache.
They modify the search context during their execution so using the cache
can lead to duplicate result for the next scroll query.
This change fails the entire request if the request_cache option is explictely set
on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never
use the cache for these queries when the option is not explicitely used.
For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint
will be ignored (forced to false).
PR #26911 set minimum_master_nodes from number_of_nodes to (number_of_nodes / 2) + 1 in our REST tests. This has led to test failures (see #27233) as the REST tests only configure the first node in its unicast.hosts pinging list (see explanation here: #27233 (comment)). Until we have a proper fix for this, I'm reverting the change in #26911.
The order in which double values are added in java can give different results
for the sum, so we need to allow a certain delta in the test assertions. The
current value was still a bit too low, which manifested itself in occasional
test failures.
Now the blob size information is available before writing anything,
the repository implementation can know upfront what will be the
more suitable API to upload the blob to S3.
This commit removes the DefaultS3OutputStream and S3OutputStream
classes and moves the implementation of the upload logic directly in the
S3BlobContainer.
related #26993closes#26969
the query would be marked as verified candidate match. This is wrong as it can only marked as verified candidate match
on indices created on or after 6.1, due to the use of the CoveringQuery.
extract all clauses from a conjunction query.
When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.
Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.
This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.
https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extractionCloses#26307
This commit adds a parent pipeline aggregation that allows
sorting the buckets of a parent multi-bucket aggregation.
The aggregation also offers [from] and [size] parameters
in order to truncate the result as desired.
Closes#14928
Sometimes systems like Beats would want to extract the date's timezone and/or locale
from a value in a field of the document. This PR adds support for mustache templating
to extract these values.
Closes#24024.
We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.
This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
In order to avoid churn when applying a new `ClusterState`, there are some checks that compare parts of the old and new states and, if equal, the new object is discarded and the old one reused. Since `ClusterState` updates are now largely diff-based, this code is unnecessary: applying a diff also reuses any old objects if unchanged. Moreover, the code compares the parts of the `ClusterState` using their `version()` values which is not guaranteed to be correct, because of a lack of consensus.
This change removes this optimisation, and tests that objects are still reused as expected via the diff mechanism.
* Decouple `ChannelFactory` from Tcp classes
This is related to #27260. Currently `ChannelFactory` is tightly coupled
to classes related to the elasticsearch Tcp binary protocol. This commit
modifies the factory to be able to construct http or other protocol
channels.
We look for the remote by scanning the output of "git remote -v" but we
were not actually looking at the output since standard output was not
redirected anywhere. This commit fixes this issue.
Relates #27308