Adds allow_partial_search_results flag to search requests with default setting = true.
When false, will error if search either timeouts, has partial errors or has missing shards rather
than returning partial search results. A cluster-level setting provides a default for search requests with no flag.
Closes#27435
This is related to #27933. It introduces a jar named elasticsearch-core
in the lib directory. This commit moves the JarHell class from server to
elasticsearch-core. Additionally, PathUtils and some of Loggers are
moved as JarHell depends on them.
We have agreed to introduce the Gradle wrapper to simplify workflows for
developers, and managing infrastructure (e.g., CI, release builds, etc.)
as well as consideration for the fact that other projects in our stack
use Gradle and do not necessarily want to be tied to our Gradle version.
Relates #28065
This commit adds the infrastructure to plugin building and loading to
allow one plugin to extend another. That is, one plugin may extend
another by the "parent" plugin allowing itself to be extended through
java SPI. When all plugins extending a plugin are finished loading, the
"parent" plugin has a callback (through the ExtensiblePlugin interface)
allowing it to reload SPI.
This commit also adds an example plugin which uses as-yet implemented
extensibility (adding to the painless whitelist).
This is related to #27802. This commit adds a jar called
elasticsearch-nio that contains the base nio classes that will be used
for the tcp nio transport and eventually the http nio transport.
The jar does not depend on elasticsearch:core, so all references to core
have been removed.
This change removes the module named aggs-composite and adds the `composite` aggs
as a core aggregation. This allows other plugins to use this new aggregation
and simplifies the integration in the HL rest client.
Projects the depend on the CLI currently depend on core. This should not
always be the case. The EnvironmentAwareCommand will remain in :core,
but the rest of the CLI components have been moved into their own
subproject of :core, :core:cli.
* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
* A `terms` source, values are extracted from a field or a script.
* A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:
````
"test_agg": {
"terms": {
"field": "field1"
},
"aggs": {
"nested_test_agg":
"terms": {
"field": "field2"
}
}
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:
````
"composite_agg": {
"composite": {
"sources": [
{
"field1": {
"terms": {
"field": "field1"
}
}
},
{
"field2": {
"terms": {
"field": "field2"
}
}
},
}
}
````
The response of the aggregation looks like this:
````
"aggregations": {
"composite_agg": {
"buckets": [
{
"key": {
"field1": "alabama",
"field2": "almanach"
},
"doc_count": 100
},
{
"key": {
"field1": "alabama",
"field2": "calendar"
},
"doc_count": 1
},
{
"key": {
"field1": "arizona",
"field2": "calendar"
},
"doc_count": 1
}
]
}
}
````
By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:
````
"composite_agg": {
"composite": {
"after": {"field1": "alabama", "field2": "calendar"},
"size": 100,
"sources": [
{
"field1": {
"terms": {
"field": "field1"
}
}
},
{
"field2": {
"terms": {
"field": "field2"
}
}
}
}
}
````
This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:
````
"settings": {
"index.sort.field": ["field1", "field2"]
}
````
In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
It is the exciting return of the global checkpoint background
sync. Long, long ago, in snapshot version far, far away we had and only
had a global checkpoint background sync. This sync would fire
periodically and send the global checkpoint from the primary shard to
the replicas so that they could update their local knowledge of the
global checkpoint. Later in time, as we sped ahead towards finalizing
the initial version of sequence IDs, we realized that we need the global
checkpoint updates to be inline. This means that on a replication
operation, the primary shard would piggy back the global checkpoint with
the replication operation to the replicas. The replicas would update
their local knowledge of the global checkpoint and reply with their
local checkpoint. However, this could allow the global checkpoint on the
primary to advance again and the replicas would fall behind in their
local knowledge of the global checkpoint. If another replication
operation never fired, then the replicas would be permanently behind. To
account for this, we added one more sync that would fire when the
primary shard fell idle. However, this has problems:
- the shard idle timer defaults to five minutes, a long time to wait
for the replicas to learn of the new global checkpoint
- if a replica missed the sync, there was no follow-up sync to catch
them up
- there is an inherent race condition where the primary shard could
fall idle mid-operation (after having sent the replication request to
the replicas); in this case, there would never be a background sync
after the operation completes
- tying the global checkpoint sync to the idle timer was never natural
To fix this, we add two additional changes for the global checkpoint to
be synced to the replicas. The first is that we add a post-operation
sync that only fires if there are no operations in flight and there is a
lagging replica. This gives us a chance to sync the global checkpoint to
the replicas immediately after an operation so that they are always kept
up to date. The second is that we add back a global checkpoint
background sync that fires on a timer. This timer fires every thirty
seconds, and is not configurable (for simplicity). This background sync
is smarter than what we had previously in the sense that it only sends a
sync if the global checkpoint on at least one replica is lagging that of
the primary. When the timer fires, we can compare the global checkpoint
on the primary to its knowledge of the global checkpoint on the replicas
and only send a sync if there is a shard behind.
Relates #26591
This commit reenables the BWC tests after they were disabled for
backporting the change to track global checkpoints of shard copies on
the primary.
Relates #26666
This commit adds local tracking of the global checkpoints on all shard
copies when a global checkpoint tracker is operating in primary
mode. With this, we relay the global checkpoint on a shard copy back to
the primary shard during replication operations. This serves as another
step towards adding a background sync of the global checkpoint to the
shard copies.
Relates #26666
The new ops based recovery, introduce as part of #10708, is based on the assumption that all operations below the global checkpoint known to the replica do not need to be synced with the primary. This is based on the guarantee that all ops below it are available on primary and they are equal. Under normal operations this guarantee holds. Sadly, it can be violated when a primary is restored from an old snapshot. At the point the restore primary can miss operations below the replica's global checkpoint, or even worse may have total different operations at the same spot. This PR introduces the notion of a history uuid to be able to capture the difference with the restored primary (in a follow up PR).
The History UUID is generated by a primary when it is first created and is synced to the replicas which are recovered via a file based recovery. The PR adds a requirement to ops based recovery to make sure that the history uuid of the source and the target are equal. Under normal operations, all shard copies will stay with that history uuid for the rest of the index lifetime and thus this is a noop. However, it gives us a place to guarantee we fall back to file base syncing in special events like a restore from snapshot (to be done as a follow up) and when someone calls the truncate translog command which can go wrong when combined with primary recovery (this is done in this PR).
We considered in the past to use the translog uuid for this function (i.e., sync it across copies) and thus avoid adding an extra identifier. This idea was rejected as it removes the ability to verify that a specific translog really belongs to a specific lucene index. We also feel that having a history uuid will serve us well in the future.
Javadoc linking between projects currently relies on
projectSubstitutions. However, that is an extension variable that is not
part of BuildPlugin. This commit moves the javadoc linking into the root
build.gradle, alongside where projectSubstitutions are defined.
Adds support for bulk items to be aborted before they are processed by the TransportShardBulkAction.
This can be used by an ActionFilter to reject a subset of the items in a bulk action without rejecting the whole action (or all the items for a shard).
This commit adds files to the build output called build_metadata which
contain key/value pairs of metadata associated with the build. The first
use of this metadata are the git hashes associated with bwc checkouts.
These metadata files will be picked up by CI intake jobs and stored
along with last-good-commit, and then passed back in throug the
BUILD_METADATA env var on periodic jobs.
At current, we do not feel there is enough of a reason to shade the low
level rest client. It caused problems with commons logging and IDE's
during the brief time it was used. We did not know exactly how many
users will need this, and decided that leaving shading out until we
gather more information is best. Users can still shade the jar
themselves. For information and feeback, see issue #26366.
Closes#26328
This reverts commit 3a20922046.
This reverts commit 2c271f0f22.
This reverts commit 9d10dbea39.
This reverts commit e816ef89a2.
We currently add the apache license/notice for elasticsearch to any
plugin that uses our ES plugin gradle plugin. However, each plugin
should be able to use their own license. This commit adds a licenseFile
and noticeFile property to the root of project's using BuildPlugin,
which is added to jar files for that project.
The build was ignoring suffixes like "beta1" and "rc1" on the version numbers which was causing the backwards compatibility packaging tests to fail because they expected to be upgrading from 6.0.0 even though they were actually upgrading from 6.0.0-beta1. This adds the suffixes to the information that the build scrapes from Version.java. It then uses those suffixes when it resolves artifacts build from the bwc branch and for testing.
Closes#26017
This commit updates the version for master to 7.0.0-alpha1. It also adds
the 6.1 version constant, and fixes many tests, as well as marking some
as awaits fix.
Closes#25893Closes#25870
This commit removes all external dependencies from the rest client jar
and shades them in an 'org.elasticsearch.client' package within the jar
using shadowJar gradle plugin. All projects that depended on the
existing jar have been converted to using the 'org.elasticsearch.client'
package prefixes to interact with the rest client.
Closes#25208
Removes the primary term from the replication request and pushes it into the transport envelope. This makes it possible to remove the term from the ReplicationOperation universe. The primary term that is to be used for a replication operation is now determined in the reroute phase when the node decides to execute a primary action (and validated once the primary action gets to execute). This makes it possible to validate that the primary action was sent to the correct primary shard instance that it was meant to be sent to (currently we only validate primary actions using the allocation id, which can be reused for failed and reallocated primaries).
It was brought up that our current client artifacts have generic names like 'rest' that may cause conflicts with other artifacts.
This commit renames:
- rest -> elasticsearch-rest-client
- sniffer -> elasticsearch-rest-client-sniffer
- rest-high-level -> elasticsearch-rest-high-level-client
A couple of small changes are also preparing the high level client for its first release.
Closes#20248
Removes the `assemble` task from the `build` task when we have
removed `assemble` from the project. We removed `assemble` from
projects that aren't published so our releases will be faster. But
That broke CI because CI builds with `gradle precommit build` and,
it turns out, that `build` includes `check` and `assemble`. With
this change CI will only run `check` for projects without an
`assemble`.
Removes the `assemble` task from projects that are not published.
This should speed up `gradle assemble` by skipping projects that
don't need to be built. Which is useful because `gradle assemble`
is how we cut releases.
This commit adds a gradle project, set inside the root build.gradle,
which controls all our bwc tests. This allows for seamless (ie no errant
CI failures) backporting of behavior.
This commit adds a new `branchConsistency` task which will run in CI
once a day, instead of on every commit. This allows `verifyVersions` to
not break immediately once a new version is released in maven.
Removes the `distribution:bwc` project in favor of
`distribution:bwc-release-snapshot` and
`distribution:bwc-stable-snapshot`.
`distribution:bwc-release-snapshot` builds a snapshot of the
latest release branch (5.4 now) if needed for backwards
compatibility. `distribution:bwc-stable-snapshot` builds a
snapshot of the latest stable branch (5.x now) if needed for
backwards compatibility.
Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
Some packaging tests depend on snapshot versions of packaging
distributions yet the build does not use a repository that includes such
distributions. While we could add such a repository, a better strategy
is to follow our approach for other BWC tests where we depend on a
locally-compiled archive distribution. This commit adds a local
compilation of packaging artifacts and substitutes these anywhere that
we would otherwise depend on a snapshot of these artifacts.
Relates #24861
This will be useful for the high level client to add support for the matrix stats aggregation, as we will ship with this jar by default like we do for parent-join-client which is aligned with distributing core with the modules already included.
Relates to #24796
Now that we generate the versions list from Versions.java we can
drop the list of versions maintained for vagrant testing. One nice
thing that the vagrant testing did was to check if the list of
versions was out of date. This moves that test to the core
project.
This commit changes the rolling upgrade test to create a set of rest
test tasks per wire compat version. The most recent wire compat version
is always tested with the `integTest` task, and all versions can be
tested with `bwcTest`.
This commit expands the logic for version extraction from Version.java
to include a list of all versions for backcompat purposes. The tests
using bwcVersion are converted to use this list, but those tests
(rolling upgrade and backwards-5.0) are still not randomized; that will
happen in another followup.
* Add parent-join module
This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.
Relates #20257
We currently have the last minor version of the previous major hardcoded
in tests like rolling upgrade. This change programatically finds this
during gradle initialization by parsing versions from Version.java.
The current rest backcompat tests, which run against a mixed cluster of
5.x and 6.0 nodes, depend on snapshot builds of 5.x. However, this has
the potential for inconsistency that results in CI failures, and happens
quite often, whenever some backcompat logic is added to 5.x, but the bwc
test on master fails because the 5.x code has not yet been published as
a snapshot.
This change creates a git clone of the 5.x branch,
builds the zip distribution, and ties that into gradle substitutions for
the 5.x version.
The extra plugins that may be attached to the elasticsearch build
contain their own license. In the past, the ASL license elasticsearch
uses was avoided by specially checking for the gradle project prefix of
`:x-plugins`. However, since refactoring to the elasticsearch-extra dir
structure, this mechanism was broken. This change fixes the pom license
adding to only be applied to projects that fall under the root project
(ie elasticsearch).
* Build: Remove old maven deploy support
This change removes the old maven deploy that we have in parallel to
maven-publish, and makes maven-publish fully work with publishing to
maven local. Using `gradle publishToMavenLocal` should be used to
publish to .m2.
Note that there is an unfortunate hack that means for
zip artifacts we must first create/publish a dummy pom file, and then
follow that with the real pom file. It would be nice to have the pom
file contains packaging=zip, but maven central then requires sources and
javadocs. But our zips are really just attached artifacts, so we already
set the packaging type to pom for our zip files. This change just works
around a limitation of the underlying maven publishing library which
silently skips attached artifacts when the packaging type is set to pom.
relates #20164closes#20375
* Remove unnecessary extra spacing
This change removes the multiple ways that plugins can be added to the
integ test cluster. It also removes the use of the default
configuration, and instead adds a zip configuration to all plugins. This
will enable using project substitutions with plugins, which must be done
with the default configuration.
:client ---------> :client:rest
:client-sniffer -> :client:sniffer
:client-test ----> :client:test
This lines the client up with how we do things like modules and
plugins.
The lucene-test dependency caused issues with IDEs as they would always load the lucene 5 jar although they shouldn't have, which caused jarhell in es core tests.
If we depend directly on randomized runner we don't have this problem. It is luckily still compatible with java 1.7. This requires though adding a thin module that includes the base test class which can be shared between client and client-sniffer.
This will let things that don't depend on :test:framework like the
client use it.
Also skip initializing the classes we check because we don't care
about their initialization behavior because we're not executing them.
This makes the naming conventions check pretty close to instant
from a "human eye" perspective.
Create a new subproject called client-sniffer that contains the o.e.client.sniff package. Since it is going to go to a separate jar, due to its additional functionalities and dependency on jackson, it makes sense to have it as a separate project that depends on client. This way we make sure that client doesn't depend on it etc.
Gradle has "rules" for certain task names, and clean is one of these.
When you run clean, it searches for any tasks named cleanX, and tries to
reverse engineer the X task. For eclipse, this means it finds
cleanEclipse, and internally runs it (but this does not show up as a
dependency of clean in tasks list!!). Since we added .settings as an
additional file to delete with cleanEclipse, this gets deleted when
running just "clean". It doesn't delete the other files because those
have their own clean methods, and dependencies are not followed in this
insanity. This change simply makes a separate task for cleaning eclipse
settings.
It came out with improvements around idea integration and language levels. This will make it possible to have the upcoming java client as a new project compiled against java 7 and have idea working on the right language level.
We have have a virtual build-tools project which pulls in all the same
IDE settings we have for other projects, so there is no longer a need
for hacking buildSrc into our IDEs.
In preparation for a unified release process, we need to be able to
generate the pom files independently of trying to actually publish. This
change adds back the maven-publish plugin just for that purpose. The
nexus plugin still exists for now, so that we do not break snapshots,
but that can be removed at a later time once snapshots are happenign
through the unified tools. Note I also changed the dir jars are written
into so that all our artifacts are under build/distributions.
This fixes the build for folks that build without git installed locally
and should speed up the general case because we aren't trying to resolve
git information when it isn't really needed.
This allows for a local file based deploy without needed nexus
auth information.
Also signing of packages has been added, either via gradle.properties
or using system properties as a fallback.
The property build.repository allows to configure another endpoint if no
snapshot build is done.
Fix creation of .asc file for tar.gz distribution
Closes#17405
Intellij has a model of "everything is a source dir unless you say
otherwise". This change fixes the intellij configuration to not think
the gradle or eclipse build dirs are source dirs.
The build currently uses the old maven support in gradle. This commit
switches to use the newer maven-publish plugin. This will allow future
changes, for example, easily publishing to artifactory.
An additional part of this change makes publishing of build-tools part
of the normal publishing, instead of requiring a separate upload step
from within buildSrc. That also sets us up for a follow up to enable
precomit checks on the buildSrc code itself.
This groups like projects together which is nice. It creates two weirdly
named projects:
1. buildSrc - its still just called buildSrc and it doesn't match. I don't
see why we import it into Eclipse anyway. Its groovy and easier to just edit
in vim or whatever.
2. elasticsearch - this is the name of the root project. It's also not
particularly useful to import into eclipse but we've always named it this way
and the name ':' was even more confusing so we just kept the name.
This new task allows setting code, similar to a doLast or doFirst,
except it is specifically geared at running ant (and thus called doAnt).
It adjusts the ant logging while running the ant so that the log
level/behavior can be tweaked, and automatically buffers based on gradle
logging level, and dumps the ant output upon failure.
Migrated from ES-Hadoop. Contains several improvements regarding:
* Security
Takes advantage of the pluggable security in ES 2.2 and uses that in order
to grant the necessary permissions to the Hadoop libs. It relies on a
dedicated DomainCombiner to grant permissions only when needed only to the
libraries installed in the plugin folder
Add security checks for SpecialPermission/scripting and provides out of
the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1)
* Testing
Uses a customized Local FS to perform actual integration testing of the
Hadoop stack (and thus to make sure the proper permissions and ACC blocks
are in place) however without requiring extra permissions for testing.
If needed, a MiniDFS cluster is provided (though it requires extra
permissions to bind ports)
Provides a RestIT test
* Build system
Picks the build system used in ES (still Gradle)
This adds the standalone tests so they will compile (and thus can be
modified with import completion) within IntelliJ. It also explicitly
sets up buildSrc as a module.
Note that this does *not* mean eg evil-tests can be run from intellij.
These are special tests that require special settings (eg disabling
security manager). They need to be run from the command line.
closes#15075
Due to how intellij imports gradle projects, the tasks which setup
resources are not run. This means we must be sure to run gradle idea
from the command line before importing into elasticsearch. This change
adds a simple marker file to indicate we have run from the command line
before importing. It won't help for new projects that add plugin
metadata, but it will at least make sure the initial project is set up
correctly.
Currently we use the "gradle project attachment plugin" to support
building elasticsearch as part of another project. However, this plugin
has a number of issues, a large part of which is requiring consistent
use of the projectsPrefix.
This change removes projectsPrefix, and adds support for a special
extra-plugins directory in the root of elasticsearch. Any projects
checked out within this directory will be automatically added to
elasticsearch.
Gradle ensures task dependencies are executed in the correct order.
However, project dependencies only build what is needed for the
dependency. This means the order of higher level tasks are not
guaranteed. This change adds task ordering between test and integTest
for a project and its dependencies.
If there is a failure in the elasticsearch start script, we currently
completely lose the failure. This is due to how spawning works with ant.
This change avoids the issue by introducing an intermediate script,
built dynamically before running ant exec, which runs elasticsearch and
redirects the output to a log file. This essentially causes us to run
elasticsearch in the foreground and capture the output, but at the same
time keep a running script which ant can pump streams from (which will
always be empty).
Sometimes when running elasticsearch, it is useful to attach a remote
debugger. This change adds a --debug-jvm option (the same name gradle
uses for its tests debug option), which adds java agent config for a
remote debugger. The configuration is set to hava java suspend until the
remove debugger is attached.
closes#14772
We recently got a run command with gradle, but it is sometimes useful to
run ES with a specific plugin. This is a start, by making each esplugin
have a run command which installs the plugin and runs elasticsearch in
the foreground.
With gradle, deploying to maven means first generating poms. These are
filled in based on dependencies of the project. Recently, we started
disallowing transitive dependencies. However, this configuration does
not translate to maven poms because maven has no concept of excluding
all transitive dependencies.
This change adds exclusions for each of the transitive deps of each
dependency being added to the maven pom. It does so by creating dummy
configurations for each direct dependency (which does not have
transitive deps excluded), so that we can iterate the transitive deps
when building the pom.
Note, this should be simpler (just modifying maven's pom model), but
gradle tries to hide that from their api, causing us to need to
manipulate the xml directly.
https://discuss.gradle.org/t/modifying-maven-pom-generation-to-add-excludes/12744
If we use JAVA_HOME consistently for tests, we can run tests with a
different version of java than gradle runs with. For example, this
enables running tests with jigsaw, but building with java 8. The only
caveat is intellij does not set JAVA_HOME. This change enforces
JAVA_HOME is set, but ignores for intellij.
This moves the min java version used by elasticsearch to one place, a
constant in BuildPlugin. For me on java 9, this fixed my jar to have the
correct target/source versions.
closes#14702
Transitive dependencies can be confusing and hard to deal with when
conflicts arise between them. This change removes transitive
dependencies from elasticsearch, and forces any dependency conflicts to
be resolved manually, instead of automatically by gradle.
closes#14627
Some dependencies must be specified in a couple places in the build.
e.g. randomized runner is specified both in buildSrc (for the gradle
wrapper plugin), as well as in the test-framework.
This change creates buildSrc/versions.properties which acts similar to
the set of shared version properties we used to have in the maven parent
pom.
run.sh and run.bat were calling out to the old maven build system.
This is no longer in place, so we've created new gradle tasks to
start an elasticsearch node from the current codebase.
fixed#14423
This makes it a groovy project that works in eclipse.
You will have to install a plugin for groovy language support
(I used a snapshot build from https://github.com/groovy/groovy-eclipse/wiki)
Eclipse does not have the ability to differentiate test dependencies
from main dependencies. This causes what looks like a circular
dependency through test-framework. This change sets up an additional
core-tests project for eclipse only, which removes this problem.
This adds a generated-resources dir that the plugin properties are
generated into. This must be outside of the build dir, since intellij
has build as "excluded".
closes#14392