While I was working on #18703, I discovered a bad behavior when people don't provide AWS key/secret as part as their `elasticsearch.yml` but rely on SysProps or env. variables...
In [`InternalAwsS3Service#getClient(...)`](d4366f8493/plugins/repository-s3/src/main/java/org/elasticsearch/cloud/aws/InternalAwsS3Service.java (L76-L141)), we have:
```java
Tuple<String, String> clientDescriptor = new Tuple<>(endpoint, account);
AmazonS3Client client = clients.get(clientDescriptor);
```
But if people don't provide credentials, `account` is `null`.
Even if it actually could work, I think that we should use the `AWSCredentialsProvider` we create later on and extract from it the `account` (AWS KEY actually) and then use it as the second value of the tuple.
Closes#19557.
This makes it obvious that these tests are for running the client yaml
suites. Now that there are other ways of running tests using the REST
client against a running cluster we can't go on calling the shared
client yaml tests "REST tests". They are rest tests, but they aren't
**the** rest tests.
This adds a header that looks like `Location: /test/test/1` to the
response for the index/create/update API. The requirement for the header
comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.htmlhttps://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative
URIs are OK. So we use an absolute path which should resolve to the
appropriate location.
Closes#19079
This makes large changes to our rest test infrastructure, allowing us
to write junit tests that test a running cluster via the rest client.
It does this by splitting ESRestTestCase into two classes:
* ESRestTestCase is the superclass of all tests that use the rest client
to interact with a running cluster.
* ESClientYamlSuiteTestCase is the superclass of all tests that use the
rest client to run the yaml tests. These tests are shared across all
official clients, thus the `ClientYamlSuite` part of the name.
In #15950#15080#16084 we added the support of TimeOut for Requests with a default client`setTimeoutIntervalInMs`.
So we can remove this useless todo which was added for only one method.
Closes#18617.
Azure and Google cloud blob containers, as the APIs for both return
a 404 in the case of a missing object, which we already handle through
a NoSuchFileFoundException.
Follow up for #18662 and #18690.
* For consistency, we rename method parameters and use `key` and `secret` instead of `account` and `key`.
* We add some tests to check that settings are correctly applied.
* Tests revealed that some checks are bad like for #18662.
Add test and fix issue for getting the right S3 endpoint
Test when Repository, Repositories or global settings are defined
But ignore testAWSCredentialsWithSystemProviders test
Add tests for AWS Client Configuration
Fix NPE when no region is set
We used to transform region="" to region=null but it's not needed anymore and would actually cause NPE from now.
Follow up for #18662
We add some tests to check that settings are correctly applied.
Tests revealed that some checks were missing.
But we ignore `testAWSCredentialsWithSystemProviders` test for now.
Added BlobContainer tests for HDFS storage
and caught a bug at the same time in which
deleteBlob was not raising an IOException
when the blobName did not exist.
Added BlobContainer tests for Azure storage
and caught a bug at the same time in which
deleteBlob was not raising an IOException
when the blobName did not exist.
Follow up for https://github.com/elastic/elasticsearch/pull/17784#discussion_r64575845
Today we are registering repository settings when `S3RepositoryPlugin` starts:
```java
settingsModule.registerSetting(S3Repository.Repository.KEY_SETTING);
settingsModule.registerSetting(S3Repository.Repository.SECRET_SETTING);
settingsModule.registerSetting(S3Repository.Repository.BUCKET_SETTING);
settingsModule.registerSetting(S3Repository.Repository.ENDPOINT_SETTING);
settingsModule.registerSetting(S3Repository.Repository.PROTOCOL_SETTING);
settingsModule.registerSetting(S3Repository.Repository.REGION_SETTING);
settingsModule.registerSetting(S3Repository.Repository.SERVER_SIDE_ENCRYPTION_SETTING);
settingsModule.registerSetting(S3Repository.Repository.BUFFER_SIZE_SETTING);
settingsModule.registerSetting(S3Repository.Repository.MAX_RETRIES_SETTING);
settingsModule.registerSetting(S3Repository.Repository.CHUNK_SIZE_SETTING);
settingsModule.registerSetting(S3Repository.Repository.COMPRESS_SETTING);
settingsModule.registerSetting(S3Repository.Repository.STORAGE_CLASS_SETTING);
settingsModule.registerSetting(S3Repository.Repository.CANNED_ACL_SETTING);
settingsModule.registerSetting(S3Repository.Repository.BASE_PATH_SETTING);
```
We don't need to register those settings as they are repository level settings and not node level settings.
Closes#18945.
creation in the REST tests, as we no longer need it due
to index creation now waiting for active shard copies
before returning (by default, it waits for the primary of
each shard, which is the same as ensuring yellow health).
Relates #19450
Also introduced a `Processor.Parameters` class that is holder for several services processors rely on,
the IngestPlugin#getProcessors(...) method has been changed to accept `Processor.Parameters` instead
of each service seperately.
This commit renames the Netty 3 transport module from transport-netty to
transport-netty3. This is to make room for a Netty 4 transport module,
transport-netty4.
Relates #19439
Today `node.mode` and `node.local` serve almost the same purpose, they
are a shortcut for `discovery.type` and `transport.type`. If `node.local: true`
or `node.mode: local` is set elasticsearch will start in _local_ mode which means
only nodes within the same JVM are discovered and a non-network based transport
is used. The _local_ mode it only really used in tests or if nodes are embedded.
For both, embedding and tests explicit configuration via `discovery.type` and `transport.type`
should be preferred.
This change removes all the usage of these settings and by-default doesn't
configure a default transport implemenation since netty is now a module. Yet, to make
the user expericence flawless, plugins or modules can set a `http.type.default` and
`transport.type.default`. Plugins set this via `PluginService#additionalSettings()`
which enforces _set-once_ which prevents node startup if set multiple times. This means
that our distributions will just startup with netty transport since it's packaged as a
module unless `transport.type` or `http.transport.type` is explicitly set.
This change also found a bunch of bugs since several NamedWriteables were not registered if a
transport client is used. Now that we don't rely on the `node.mode` leniency which is inherited
instead of using explicit settings, `TransportClient` uses `AssertingLocalTransport` which detects these problems since it serializes all messages.
Closes#16234
Some tests still start http implicitly or miss configuring the transport clients correctly.
This commit fixes all remaining tests and adds a depdenceny to `transport-netty` from
`qa/smoke-test-http` and `modules/reindex` since they need an http server running on the nodes.
This also moves all required permissions for netty into it's module and out of core.
This change adds a createComponents() method to Plugin implementations
which they can use to return already constructed componenents/services.
Eventually this should be just services ("components" don't really do
anything), but for now it allows any object so that preconstructed
instances by plugins can still be bound to guice. Over time we should
add basic services as arguments to this method, but for now I have left
it empty so as to not presume what is a necessary service.
The DiscoveryNodeService exists to register CustomNodeAttributes which
plugins can add. This is not necessary, since plugins can already add
additional attributes, and use the node attributes prefix.
This change removes the DiscoveryNodeService, and converts the only
consumer, the ec2 discovery plugin, to add the ec2 availability zone
in additionalSettings().
Repository plugins currently use a lot of custom classes like
RepositoryName and RepositorySettings in order to use guice to construct
repository implementations. But repositories now only really need their
settings to be constructed. Anything else they need (eg a cloud client)
can be constructed within the plugin, instead of via guice.
This change makes repository plugins use the new pull model. It removes
guice from the construction of Repository objects (no more child
injectors) and also from all repository plugins.
The api for snapshot/restore was split up between two interfaces,
Repository and IndexShardRepository. There was also complex
initialization and injection between the two. However, there is always a
one to one relationship between the two.
This change moves the IndexShardRepository api into Repository, as well
as updates the API so as not to require any services to be injected for
sublcasses.
* master: (192 commits)
[TEST] Fix rare OBOE in AbstractBytesReferenceTestCase
Reindex from remote
Rename writeThrowable to writeException
Start transport client round-robin randomly
Reword Refresh API reference (#19270)
Update fielddata.asciidoc
Fix stored_fields message
Add missing footer notes in mapper size docs
Remote BucketStreams
Add doc values support to the _size field in the mapper-size plugin
Bump version to 5.0.0-alpha5.
Update refresh.asciidoc
Update shrink-index.asciidoc
Change Debian repository for Vagrant debian-8 box
[TEST] fix test to account for internal empyt reference optimization
Upgrade to netty 3.10.6.Final (#19235)
[TEST] fix histogram test when extended bounds overlaps data
Remove redundant modifier
Simplify TcpTransport interface by reducing send code to a single send method (#19223)
Fix style violation in InstallPluginCommand.java
...
This change activates the doc_values on the _size field for indices created after 5.0.0-alpha4.
It also adds a note in the breaking changes that explain the situation and how to get around it.
Closes#18334
Today throughout the codebase, catch throwable is used with reckless
abandon. This is dangerous because the throwable could be a fatal
virtual machine error resulting from an internal error in the JVM, or an
out of memory error or a stack overflow error that leaves the virtual
machine in an unstable and unpredictable state. This commit removes
catch throwable from the codebase and removes the temptation to use it
by modifying listener APIs to receive instances of Exception instead of
the top-level Throwable.
Relates #19231
Rename `fields` to `stored_fields` and add `docvalue_fields`
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
The only reason for LifecycleComponent taking a generic type was so that
it could return that type on its start and stop methods. However, this
chaining has no practical necessity. Instead, start and stop can be
void, and a whole bunch of confusing generics disappear.
Before, a repository would maintain an index file (named 'index') per
repository, that contained the current snapshots in the repository.
This file was not atomically written, so repositories had to depend on
listing the blobs in the repository to determine what the current
snapshots are, and only rely on the index file if the repository does
not support the listBlobs operation. This could cause an incorrect view
of the current snapshots in the repository if any prior snapshot delete
operations failed to delete snapshot metadata files.
This commit introduces the atomic writing of the index file, and because
atomic writes are not guaranteed if the file already exists, we write to
a generational index file (index-N, where N is the current generation).
We also maintain an index-latest file that contains the current
generation, for those repositories that cannot list blobs.
Closes#19002
Relates #18156
BytesReference should be a really simple interface, yet it has a gazillion
ways to achieve the same this. Methods like `#hasArray`, `#toBytesArray`, `#copyBytesArray`
`#toBytesRef` `#bytes` are all really duplicates. This change simplifies the interface
dramatically and makes implementations of it much simpler. All array access has been removed
and is streamlined through a single `#toBytesRef` method. Utility methods to materialize a
compact byte array has been added too for convenience.
Repository-S3 needs a special permission because of problems in AmazonS3Client: when no region is set on a AmazonS3Client instance, the AWS SDK loads all known partitions from a JSON file and uses a Jackson's ObjectMapper for that: this one, in version 2.5.3 with the default binding options, tries to suppress access checks of ctor/field/method and thus requires this special permission. AWS must be fixed to uses Jackson correctly and have the correct modifiers on binded classes.
This must be fixed in aws sdk (see https://github.com/aws/aws-sdk-java/issues/766) but in the meanwhile we have no choice.
closes#18539
Raise IOException on deleteBlob if the blob doesn't exist
This commit raises an IOException on BlobContainer#deleteBlob
if the blob does not exist, in conformance with the BlobContainer
interface contract. Each implementation of BlobContainer now
conforms to this contract (file system, S3, Azure, HDFS). This
commit also contains blob container tests for each of the
repository implementations.
Closes#18530
As discussed at https://github.com/elastic/elasticsearch-cloud-azure/issues/91#issuecomment-229113595, we know that the current `discovery-azure` plugin only works with Azure Classic VMs / Services (which is somehow Legacy now).
The proposal here is to rename `discovery-azure` to `discovery-azure-classic` in case some users are using it.
And deprecate it for 5.0.
Closes#19144.
The factory for ingest processor is generic, but that is only for the
return type of the create mehtod. However, the actual consumer of the
factories only cares about Processor, so generics are not needed.
This change removes the generic type from the factory. It also removes
AbstractProcessorFactory which only existed in order pull the optional
tag from config. This functionality is moved to the caller of the
factories in ConfigurationUtil, and the create method now takes the tag.
This allows the covariant return of the implementation to work with
tests not needing casts.
When GCE region is empty we get back from the API something like:
```
{
"id": "dummy"
}
```
instead of:
```
{
"id": "dummy",
"items":[ ]
}
```
This generates a NPE when we aggregate all the lists into a single one.
Closes#16967.
Previously all rest handlers would take Client in their injected ctor.
However, it was only to hold the client around for runtime. Instead,
this can be done just once in the HttpService which handles rest
requests, and passed along through the handleRequest method. It also
should always be a NodeClient, and other types of Clients (eg a
TransportClient) would not work anyways (and some handlers can be
simplified in follow ups like reindex by taking NodeClient).
This change allows Plugin implementions to implement Closeable when they
have resources that should be released. As a first example of how this
can be used, I switched over ingest plugins, which just had the geoip
processor. The ingest framework had chains of closeable to support this,
which is now removed.
The discovery-plugin has been broken since 2.x because the code was not compliant with the security manager and because plugins have been refactored.
closes#18637, #15630
Since the Settings infrastructure has been improved, a group setting must be registered by the repository-azure plugin to allow settings like "cloud.azure.storage.my_account.account" to be coherent with Azure plugin documentation.
Instead of plugins calling `registerTokenizer` to extend the analyzer
they now instead have to implement `AnalysisPlugin` and override
`getTokenizer`. This lines up extending plugins in with extending
scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry`
immediately as part of its constructor which makes testing anslysis
much simpler.
This also moves the default analysis configuration into `AnalysisModule`
which is how search is setup.
Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`.
Instead it is only responsible for building `AnslysisRegistry`. We still
bind `AnalysisRegistry` but we only do so in `Node`. This is means it
is available at module construction time so we slowly remove the need to
bind it in guice.
Related to #18945 and to this 35d3bdab84 (commitcomment-17914150)
In GCS Repository plugin we defined a `service_account` setting which is defined as `Property.Filtered`.
It's not needed as it's only a path to a file.
Closes#18946
* master: (416 commits)
docs: removed obsolete information, percolator queries are not longer loaded into jvm heap memory.
Upgrade JNA to 4.2.2 and remove optionality
[TEST] Increase timeouts for Rest test client (#19042)
Update migrate_5_0.asciidoc
Add ThreadLeakLingering option to Rest client tests
Add a MultiTermAwareComponent marker interface to analysis factories. #19028
Attempt at fixing IndexStatsIT.testFilterCacheStats.
Fix docs build.
Move templates out of the Search API, into lang-mustache module
revert - Inline reroute with process of node join/master election (#18938)
Build valid slices in SearchSourceBuilderTests
Docs: Convert aggs/misc to CONSOLE
Docs: migration notes for _timestamp and _ttl
Group client projects under :client
[TEST] Add client-test module and make client tests use randomized runner directly
Move upgrade test to upgrade from version 2.3.3
Tasks: Add completed to the mapping
Fail to start if plugin tries broken onModule
Remove duplicated read byte array methods
Rename `fields` to `stored_fields` and add `docvalue_fields`
...
This is the same as what Lucene does for its analysis factories, and we hawe
tests that make sure that the elasticsearch factories are in sync with
Lucene's. This is a first step to move forward on #9978 and #18064.
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
This changes adds a MapperPlugin interface which allows pull style
retrieval of mappers and metadata mappers added by plugins. For now, I
have kept the MapperRegistry, but this should be removed in the future
as it is just a silly container for 2 maps which could themselves be
passed around.
We pretended to be able to ackt like a different version node for so long it's
time to be honest and remove this ability. It's just confusing and where needed
and tested we should build dedicated extension points.
This change removes some unnecessary dependencies from ClusterService
and cleans up ClusterName creation. ClusterService is now not created
by guice anymore.
Today we have a push model for registering basically anything. All our extension points
are defined on modules which we pass in to plugins. This is harder to maintain and adds
unnecessary dependencies on the modules itself. This change moves towards a pull model
where the plugin offers a getter kind of method to get the extensions. This will also
help in the future if we need to pass dependencies to the extension points which can
easily be defined on the method as arguments if a pull model is used.
Registering a script engine or native scripts still uses Guice today
and is much more complicated than needed. This change moves to a pull
based model where script plugins have to implement a dedicated interface
`ScriptPlugin` and defines simple getter returning instances rather than
classes.
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
The database files have been doubled in size compared to the previous files being used.
For this reason the database files are now gzip compressed, which required using
`GZIPInputStream` when loading database files.
* master: (51 commits)
Switch QueryBuilders to new MatchPhraseQueryBuilder
Added method to allow creation of new methods on-the-fly.
more cleanups
Remove cluster name from data path
Remove explicit parallel new GC flag
rehash the docvalues in DocValuesSliceQuery using BitMixer.mix instead of the naive Long.hashCode.
switch FunctionRef over to methodhandles
ingest: Move processors from core to ingest-common module.
Fix some typos (#18746)
Fix ut
convert FunctionRef/Def usage to methodhandles.
Add the ability to partition a scroll in multiple slices. API:
use painless types in FunctionRef
Update ingest-node.asciidoc
compute functional interface stuff in Definition
Use method name in bootstrap check might fork test
Make checkstyle happy (add Lookup import, line length)
Don't hide LambdaConversionException and behave like real javac compiled code when a conversion fails. This works anyways, because fallback is allowed to throw any Throwable
Pass through the lookup given by invokedynamic to the LambdaMetaFactory. Without it real lambdas won't work, as their implementations are private to script class
checkstyle have your upper L
...
Folded grok processor into ingest-common module.
The rest tests have been moved to ingest-common module as well, because these tests don't run in the rest-api-spec module but in the distribution:integ-test-zip module
and adding a test plugin there felt just wrong to me. I think this is ok. I left a tiny ingest rest test behind in that tests with an empty pipeline.
Removed messy tests, these tests were already covered in the rest tests
Added ingest test plugin in test infra so that each module testing integration with ingest doesn't need write its own plugin
Moved reindex ingest tests to qa module
Closes#18490
This commit refactors the handling of thread pool settings so that the
individual settings can be registered rather than registering the top
level group. With this refactoring, individual plugins must now register
their own settings for custom thread pools that they need, but a
dedicated API is provided for this in the thread pool module. This
commit also renames the prefix on the thread pool settings from
"threadpool" to "thread_pool". This enables a hard break on the settings
so that:
- some of the settings can be given more sensible names (e.g., the max
number of threads in a scaling thread pool is now named "max" instead
of "size")
- change the soft limit on the number of threads in the bulk and
indexing thread pools to a hard limit
- the settings names for custom plugins for thread pools can be
prefixed (e.g., "xpack.watcher.thread_pool.size")
- remove dynamic thread pool settings
Relates #18674
* master: (184 commits)
Add back pending deletes (#18698)
refactor matrix agg documentation from modules to main agg section
Implement ctx.op = "delete" on _update_by_query and _reindex
Close SearchContext if query rewrite failed
Wrap lines at 140 characters (:qa projects)
Remove log file
painless: Add support for the new Java 9 MethodHandles#arrayLength() factory (see https://bugs.openjdk.java.net/browse/JDK-8156915)
More complete exception message in settings tests
Use java from path if JAVA_HOME is not set
Fix uncaught checked exception in AzureTestUtils
[TEST] wait for yellow after setup doc tests (#18726)
Fix recovery throttling to properly handle relocating non-primary shards (#18701)
Fix merge stats rendering in RestIndicesAction (#18720)
[TEST] mute RandomAllocationDeciderTests.testRandomDecisions
Reworked docs for index-shrink API (#18705)
Improve painless compile-time exceptions
Adds UUIDs to snapshots
Add test rethrottle test case for delete-by-query
Do not start scheduled pings until transport start
Adressing review comments
...
* master: (911 commits)
[TEST] wait for yellow after setup doc tests (#18726)
Fix recovery throttling to properly handle relocating non-primary shards (#18701)
Fix merge stats rendering in RestIndicesAction (#18720)
[TEST] mute RandomAllocationDeciderTests.testRandomDecisions
Reworked docs for index-shrink API (#18705)
Improve painless compile-time exceptions
Adds UUIDs to snapshots
Add test rethrottle test case for delete-by-query
Do not start scheduled pings until transport start
Adressing review comments
Only filter intial recovery (post API) when shrinking an index (#18661)
Add tests to check that toQuery() doesn't return null
Removing handling of null lucene query where we catch this at parse time
Handle empty query bodies at parse time and remove EmptyQueryBuilder
Mute failing assertions in IndexWithShadowReplicasIT until fix
Remove allow running as root
Add upgrade-not-supported warning to alpha release notes
remove unrecognized javadoc tag from matrix aggregation module
set ValuesSourceConfig fields as private
Adding MultiValuesSource support classes and documentation to matrix stats agg module
...
This commit adds a UUID for each snapshot, in addition to the already
existing repository and snapshot name. The addition of UUIDs will enable
more robust handling of the deletion of previous snapshots and lingering
files from partially failed delete operations, on top of being able to
uniquely track each snapshot.
Closes#18228
Relates #18156
This commit clarifies the behavior that must be adhered to by any
implementors of the BlobContainer interface. This is done through
expanded Javadocs.
Closes#18157Closes#15580
This adds a low level primitive operations to shrink an existing
index into a new index with a single shard. This primitive expects
all shards of the source index to allocated on a single node. Once the target index is initializing on the shrink node it takes a snapshot of the source index shards and copies all files into the target indices data folder. An [optimization](https://issues.apache.org/jira/browse/LUCENE-7300) coming in Lucene 6.1 will also allow for optional constant time copy if hard-links are supported by the filesystem. All mappings are merged into the new indexes metadata once the snapshots have been taken on the merge node.
To shrink an existing index all shards must be moved to a single node (one instance of each shard) and the index must be read-only:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
"settings" : {
"index.routing.allocation.require._name" : "shrink_node_name",
"index.blocks.write" : true
}
}
```
once all shards are started on the shrink node. the new index can be created via:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
"settings" : {
"index.codec" : "best_compression",
"index.number_of_replicas" : 1
}
}'
```
This API will perform all needed check before the new index is created and selects the shrink node based on the allocation of the source index. This call returns immediately, to monitor shrink progress the recovery API should be used since all copy operations are reflected in the recovery API with byte copy progress etc.
The shrink operation does not modify the source index, if a shrink operation should
be canceled or if the shrink failed, the target index can simply be deleted and
all resources are released.