Pulls the way that the `ParentJoinAggregator` collects global ordinals
into a strategy object so it is a little simpler to reason about and
it'll be simpler to save memory by removing `asMultiBucketAggregator` in
the future.
Relates to #56487
SigTerms cannot run on fields that are not searchable, and SigText
cannot run on fields that do not have analyzers. Both of these
situations fail today with an esoteric exception, so this just formalizes
the constraint by throwing an IllegalArgumentException up front.
In practice, the only affected field seems to be the `binary` field,
which is neither searchable or has a default analyzer (e.g. even numeric
and keyword fields have a default analyzer despite not being tokenized)
Adds supported-type tests, and makes some changes to the test itself
to allow testing sigtext (indexing _source).
Also a few tweaks to the test to avoid bad randomization (negative
numbers, etc).
This commit highlights the ability for geo_point fields to be
used in geo_shape queries. It also adds an explicit geo_point
example in the geo_shape query documentation
Closes#56927.
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in #57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.
We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.
Closes#57407
Several APIs support options that can be specified as a query parameter or a
request body parameter.
Currently, this is documented using notes, which can get rather lengthy. This
replaces those multiple notes with a single note and a footnote.
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.
With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.
This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.
Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
* Move classes from build scripts to buildSrc
- move Run task
- move duplicate SanEvaluator
* Remove :run workaround
* Some little cleanup on build scripts on the way
As the datastream information is stored in the `ClusterState.Metadata` we exposed
the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for
the steps to be able to identify when a managed index is part of a DataStream.
If a managed index is part of a DataStream the rollover target is the DataStream
name and the highest generation index is the write index (ie. the rolled index).
(cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Allow for a fairer distribution of snapshot and restore operations
to enable parallel snapshots and improve behaviour for parallel snapshot + restore.
Closes#55803
Some BI tools (i.e. Tableau) would try to cast strings where the time
part is separated from the date part with a whitespace instead of `T`.
Adjust type conversion used by CAST to support this.
(cherry picked from commit 0e18321e7ad9f779c42855efbf93f171b9128a5e)
Add basic support for `TOP X` as a synonym to LIMIT X which is used
by [MS-SQL server](https://docs.microsoft.com/en-us/sql/t-sql/queries/top-transact-sql?view=sql-server-ver15),
e.g.:
```
SELECT TOP 5 a, b, c FROM test
```
TOP in SQL server also supports the `PERCENTAGE` and `WITH TIES`
keywords which this implementation doesn't.
Don't allow usage of both TOP and LIMIT in the same query.
Refers to #41195
(cherry picked from commit 2f5ab81b9ad884434d1faa60f4391f966ede73e8)
In #51459 DEBUG-level logging was removed from the default log4j
configuration. However, our docker build has its own log4j configuration
which was missed in that change. This commit removes the same from the
docker log4j configuration.
relates #51459
relates #51198
Generally we don't advocate for using `stored_fields`, and we're interested in
eventually removing the need for this parameter. So it's best to avoid using
stored fields in our docs examples when it's not actually necessary.
Individual changes:
* Avoid using 'stored_fields' in our docs.
* When defining script fields in top-hits, de-emphasize stored fields.
There are several mapping settings that are currently re-parsed every
time they are read. This can be quite frequent, for example within every
document ingestion. This commit moves the parsed versions of these
mapping settings to be stored in IndexSettings, just as other index settings
are already.
closes#57395
The shared buildResources task is a catch all for resources needing to
be copied from the build-tools jar at runtime. Utilizing this for all
resources causes any tasks using resources from this to be triggered on
any changes to any of those files. This commit creates separate export
tasks per usage, and removes the buildResources task.
Reworks the `from / size` content to `Paginate search results`.
Moves those docs from the request body search API page (slated for
deletion) to the `Run a search` tutorial docs.
Also adds some notes to the `from` and `size` param docs.
Co-authored-by: debadair <debadair@elastic.co>
The old description mentions a setting that we ended up not merging.
The periodic real-memory checks are automatic and do not require
the user to configure any setting.
**Goal**
Create a top-level search section. This will let us clean up our search
API reference docs, particularly content from [`Request body search`][0].
**Changes**
* Creates a top-level `Search your data` page. This page is designed to
house concept and tutorial docs related to search.
* Creates a `Run a search` page under `Search your data`. For now, This
contains a basic search tutorial. The goal is to add content from
[`Request body search`][0] to this in the future.
* Relocates `Long-running searches` and `Search across clusters` under
`Search your data`. Increments several headings in that content.
* Reorders the top-level TOC to move `Search your data` higher. Also
moves the `Query DSL`, `EQL`, and `SQL access` chapters immediately
after.
Relates to #48194
[0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-body.html
The `routingNodes` variable is unused. Replace `clusterState.getRoutingNodes()` with `routingNodes`.
Co-authored-by: Boice Huang <boicehuang@tencent.com>
* Moves `Discovery and cluster formation` content from `Modules` to
`Set up Elasticsearch`.
* Combines `Adding and removing nodes` with `Adding nodes to your
cluster`. Adds related redirect.
* Removes and redirects the `Modules` page.
* Rewrites parts of `Discovery and cluster formation` to remove `module`
references and meta references to the section.
This commit moves the configuration of all test jvms for fips to a
script plugin. Fips testing is something very specific to the
Elasticsearch build and does not need to be passed on to plugin authors.
At some point, we changed the supported-type test to also catch
assertion errors. This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.
This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.
Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.
Also includes a variety of fixes to other tests which were failing
but being silently smothered.
Relocates the "Remote Clusters" documentation from the "Modules" section to the "Set up Elasticsearch" section.
Supporting changes:
* Reorders the "Bootstrap checks for X-Pack" section to immediately follow the "Bootstrap checks"chapter.
* Removes an outdated X-Pack `idef` from the "Remote Clusters" intro.