This PR introduces a subproject in test/fixtures that contains a Vagrantfile used for standing up a
KRB5 KDC (Kerberos). The PR also includes helper scripts for provisioning principals, a few
changes to the HDFS Fixture to allow it to interface with the KDC, as well as a new suite of
integration tests for the HDFS Repository plugin.
The HDFS Repository plugin senses if the local environment can support the HDFS Fixture
(Windows is generally a restricted environment). If it can use the regular fixture, it then tests if
Vagrant is installed with a compatible version to determine if the secure test fixtures should be
enabled. If the secure tests are enabled, then we create a Kerberos KDC fixture, tasks for adding
the required principals, and an HDFS fixture configured for security. A new integration test task is
also configured to use the KDC and secure HDFS fixture and to run a testing suite that uses
authentication. At the end of the secure integration test the fixtures are torn down.
Adds a new "icu_collation" field type that exposes lucene's
ICUCollationDocValuesField. ICUCollationDocValuesField is the replacement
for ICUCollationKeyFilter which has been deprecated since Lucene 5.
This changes the way we register pre-configured token filters so that
plugins can declare them and starts to move all of the pre-configured
token filters out of core. It doesn't finish the job because doing
so would make the change unreviewably large. So this PR includes
a shim that keeps the "old" way of registering pre-configured token
filters around.
The Lowercase token filter is special because there is a "special"
interaction between it and the lowercase tokenizer. I'm not sure
exactly what to do about it so for now I'm leaving it alone with
the intent of figuring out what to do with it in a followup.
This also renames these pre-configured token filters from
"pre-built" to "pre-configured" because that seemed like a more
descriptive name.
This is a part of #23658
Changes the scope of the AllocationService dependency injection hack so that it is at least contained to the AllocationService and does not leak into the Discovery world.
Added missing permissions required for authenticating with Kerberos to HDFS. Also implemented
code to support authentication in the form of using a Kerberos keytab file. In order to support
HDFS authentication, users must install a Kerberos keytab file on each node and transfer it to the
configuration directory. When a user specifies a Kerberos principal in the repository settings the
plugin automatically enables security for Hadoop and begins the login process. There will be a
separate PR and commit for the testing infrastructure to support these changes.
This commit cleans up some cases where a list or map was being
constructed, and then an existing collection was copied into the new
collection. The clean is to instead use an appropriate constructor to
directly copy the existing collection in during collection
construction. The advantage of this is that the new collection is sized
appropriately.
Relates #24409
Separates cluster state publishing from applying cluster states:
- ClusterService is split into two classes MasterService and ClusterApplierService. MasterService has the responsibility to calculate cluster state updates for actions that want to change the cluster state (create index, update shard routing table, etc.). ClusterApplierService has the responsibility to apply cluster states that have been successfully published and invokes the cluster state appliers and listeners.
- ClusterApplierService keeps track of the last applied state, but MasterService is stateless and uses the last cluster state that is provided by the discovery module to calculate the next prospective state. The ClusterService class is still kept around, which now just delegates actions to ClusterApplierService and MasterService.
- The discovery implementation is now responsible for managing the last cluster state that is used by the consensus layer and the master service. It also exposes the initial cluster state which is used by the ClusterApplierService. The discovery implementation is also responsible for adding the right cluster-level blocks to the initial state.
- NoneDiscovery has been renamed to TribeDiscovery as it is exclusively used by TribeService. It adds the tribe blocks to the initial state.
- ZenDiscovery is synchronized on state changes to the last cluster state that is used by the consensus layer and the master service, and does not submit cluster state update tasks anymore to make changes to the disco state (except when becoming master).
Control flow for cluster state updates is now as follows:
- State updates are sent to MasterService
- MasterService gets the latest committed cluster state from the discovery implementation and calculates the next cluster state to publish
- MasterService submits the new prospective cluster state to the discovery implementation for publishing
- Discovery implementation publishes cluster states to all nodes and, once the state is committed, asks the ClusterApplierService to apply the newly committed state.
- ClusterApplierService applies state to local node.
The tribe service can take a while to initialize, depending on how many cluster it needs to connect to. This change moves writing the ports file used by tests to before the tribe service is started.
Most of these settings should always be pulled from the repository
settings. A couple were leftover that should be moved to client
settings. The path style access setting should be removed altogether.
This commit adds deprecations for all of these existing settings, as
well as adding new client specific settings for max retries and
throttling.
relates #24143
Start moving built in analysis components into the new analysis-common
module. The goal of this project is:
1. Remove core's dependency on lucene-analyzers-common.jar which should
shrink the dependencies for transport client and high level rest client.
2. Prove that analysis plugins can do all the "built in" things by moving all
"built in" behavior to a plugin.
3. Force tests not to depend on any oddball analyzer behavior. If tests
need anything more than the standard analyzer they can use the mock
analyzer provided by Lucene's test infrastructure.
This commit removes the deprecated cloud.aws.* settings. It also removes
backcompat for specifying `discovery.type: ec2`, and unused aws signer
code which was removed in a previous PR.
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.
Some notes about the change:
- it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
- it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
- two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
After splitting integ tests into cluster configuration and the test
runner task, we still have dependencies of the test runner added as deps
of the cluster. This commit adds dependencies directly to the cluster,
so that the runner can have other dependencies independent of what is
needed for the cluster.
The S3 repostiory has many levels of settings it looks at to create a
repository, and these settings were read at repository creation time.
This meant secure settings like access and secret keys had to be
available after node construction. This change makes setting loading for
every except repository level settings eager, so that secure settings
can be stashed, and the keystore can once again be closed after
bootstrapping the node is complete.
This commit removes passing the repository metadata object through to
s3 client creation. It is not needed, and in fact in tests was confusing
because you could create the metadata but have it contain different
settings than were passed in as repository settings.
This commit removes the "legacy" feature of secure settings, which setup
a parallel setting that was a fallback in the insecure
elasticsearch.yml. This was previously used to allow the new secure
setting name to be that of the old setting name, but is now not in use
due to other refactorings. It is much cleaner to just have all secure
settings use new setting names. If in the future we want to reuse the
previous setting name, once support for the insecure settings have been
removed, we can then rename the secure setting. This also adds a test
for the behavior.
This change adds secure settings for access/secret keys and proxy
username/password to ec2 discovery. It adds the new settings with the
prefix `discovery.ec2`, copies other relevant ec2 client settings to the
same prefix, and deprecates all other settings (`cloud.aws.*` and
`cloud.aws.ec2.*`). Note that this is simpler than the client configs
in repository-s3 because discovery is only initialized once for the
entire node, so there is no reason to complicate the configuration with
the ability to have multiple sets of client settings.
relates #22475
Currently, both the Amazon S3 client provides a retry mechanism, and the
S3 blob store also attempts retries for failed read/write requests.
Both retry mechanisms are controlled by the
`repositories.s3.max_retries` setting. However, the S3 blob store retry
mechanism is unnecessary because the Amazon S3 client provided by the
Amazon SDK already handles retries (with exponential backoff) based on
the provided max retry configuration setting (defaults to 3) as long as
the request is retryable. Hence, this commit removes the unneeded retry
logic in the S3 blob store and the S3OutputStream.
Closes#22845
This commit puts all the classes in the repository-s3 plugin into a
single package. In addition to simplifying the plugin, it will make it
easier to test as things that should be package private will not be
difficult to use inside tests alone.
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
With this commit, Azure repositories are now using an Exponential Backoff policy before failing the backup.
It uses Azure SDK default values for this policy:
* `30s` delta backoff base with
* `3s` min
* `90s` max
* `3` retries max
Users can define the number of retries they wish by setting `cloud.azure.storage.xxx.max_retries` where `xxx` is the azure named account.
Closes#22728.
Removed `parse(String index, String type, String id, BytesReference source)` in DocumentMapper.java and replaced all of its use in Test files with `parse(SourceToParse source)`.
`parse(String index, String type, String id, BytesReference source)` was only used in test files and never in the main code so it was removed. All of the test files that used it was then modified to use `parse(SourceToParse source)` method that existing in DocumentMapper.java
After the removal of the joda time hack we used to have, we can cleanup
the codebase handling in security, jarhell and plugins to be more picky
about uniqueness. This was originally in #18959 which was never merged.
closes#18959
Previously, the Azure blob store would depend on a 404 StorageException
coming back from Azure if trying to open an input stream to a
non-existent blob. This works for Azure repositories which access a
primary location path. For those configured to access a secondary
location path, the Azure SDK keeps trying for a long while before
returning a 404 StorageException, causing potential delays in the
snapshot APIs. This commit makes an initial check if the blob exists in
Azure and returns immediately with a NoSuchFileException, instead of
trying to open the input stream to the blob.
Closes#23480
Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to.
Closes#21980
* [TEST] fix indentation in matrix_stats yaml tests
* [TEST] fix indentation in painless yaml test
* [TEST] fix indentation in analysis yaml tests
* [TEST] fix indentation in generated docs yaml tests
* [TEST] fix indentation in multi_cluster_search yaml tests
This commit sets the version on the repository-hdfs Guava dependency to
version 11.0.2. This change is made to align the version here with the
version that is defined in the POM for Hadoop 2.7.1, the version of
Hadoop that the repository-hdfs plugin is based on. See HADOOP-10101 and
HADOOP-11319 for the ridiculous history of trying to upgrade Guava past
this version in the Hadoop project.
Relates #23420
This commit adds a convenience method for simultaneously asserting
settings deprecations and other warnings and fixes some tests where
setting deprecations and general warnings were present.
The warning header used by Elasticsearch for delivering deprecation
warnings has a specific format (RFC 7234, section 5.5). The format
specifies that the warning header should be of the form
warn-code warn-agent warn-text [warn-date]
Here, the warn-code is a three-digit code which communicates various
meanings. The warn-agent is a string used to identify the source of the
warning (either a host:port combination, or some other identifier). The
warn-text is quoted string which conveys the semantic meaning of the
warning. The warn-date is an optional quoted date that can be in a few
different formats.
This commit corrects the warning header within Elasticsearch to follow
this specification. We use the warn-code 299 which means a
"miscellaneous persistent warning." For the warn-agent, we use the
version of Elasticsearch that produced the warning. The warn-text is
unchanged from what we deliver today, but is wrapped in quotes as
specified (this is important as a problem that exists today is that
multiple warnings can not be split by comma to obtain the individual
warnings as the warnings might themselves contain commas). For the
warn-date, we use the RFC 1123 format.
Relates #23275
Load the geoip database the first time a pipeline gets created that has a geoip processor.
This saves memory (measured ~150MB for the city db) in cases when the plugin is installed, but not used.