With this commit, Azure repositories are now using an Exponential Backoff policy before failing the backup.
It uses Azure SDK default values for this policy:
* `30s` delta backoff base with
* `3s` min
* `90s` max
* `3` retries max
Users can define the number of retries they wish by setting `cloud.azure.storage.xxx.max_retries` where `xxx` is the azure named account.
Closes#22728.
Removed `parse(String index, String type, String id, BytesReference source)` in DocumentMapper.java and replaced all of its use in Test files with `parse(SourceToParse source)`.
`parse(String index, String type, String id, BytesReference source)` was only used in test files and never in the main code so it was removed. All of the test files that used it was then modified to use `parse(SourceToParse source)` method that existing in DocumentMapper.java
After the removal of the joda time hack we used to have, we can cleanup
the codebase handling in security, jarhell and plugins to be more picky
about uniqueness. This was originally in #18959 which was never merged.
closes#18959
Previously, the Azure blob store would depend on a 404 StorageException
coming back from Azure if trying to open an input stream to a
non-existent blob. This works for Azure repositories which access a
primary location path. For those configured to access a secondary
location path, the Azure SDK keeps trying for a long while before
returning a 404 StorageException, causing potential delays in the
snapshot APIs. This commit makes an initial check if the blob exists in
Azure and returns immediately with a NoSuchFileException, instead of
trying to open the input stream to the blob.
Closes#23480
Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to.
Closes#21980
* [TEST] fix indentation in matrix_stats yaml tests
* [TEST] fix indentation in painless yaml test
* [TEST] fix indentation in analysis yaml tests
* [TEST] fix indentation in generated docs yaml tests
* [TEST] fix indentation in multi_cluster_search yaml tests
This commit sets the version on the repository-hdfs Guava dependency to
version 11.0.2. This change is made to align the version here with the
version that is defined in the POM for Hadoop 2.7.1, the version of
Hadoop that the repository-hdfs plugin is based on. See HADOOP-10101 and
HADOOP-11319 for the ridiculous history of trying to upgrade Guava past
this version in the Hadoop project.
Relates #23420
This commit adds a convenience method for simultaneously asserting
settings deprecations and other warnings and fixes some tests where
setting deprecations and general warnings were present.
The warning header used by Elasticsearch for delivering deprecation
warnings has a specific format (RFC 7234, section 5.5). The format
specifies that the warning header should be of the form
warn-code warn-agent warn-text [warn-date]
Here, the warn-code is a three-digit code which communicates various
meanings. The warn-agent is a string used to identify the source of the
warning (either a host:port combination, or some other identifier). The
warn-text is quoted string which conveys the semantic meaning of the
warning. The warn-date is an optional quoted date that can be in a few
different formats.
This commit corrects the warning header within Elasticsearch to follow
this specification. We use the warn-code 299 which means a
"miscellaneous persistent warning." For the warn-agent, we use the
version of Elasticsearch that produced the warning. The warn-text is
unchanged from what we deliver today, but is wrapped in quotes as
specified (this is important as a problem that exists today is that
multiple warnings can not be split by comma to obtain the individual
warnings as the warnings might themselves contain commas). For the
warn-date, we use the RFC 1123 format.
Relates #23275
Load the geoip database the first time a pipeline gets created that has a geoip processor.
This saves memory (measured ~150MB for the city db) in cases when the plugin is installed, but not used.
This is fallout from #23297. That commit wrapped
`InstanceProfileCredentialsProvider` to ensure that the `getCredentials`
and `refresh` methods had privileged access. However, it looks like
there was a test ensuring that `buildCredentials` returned the correct
clazz type. This commit adjusts that test to check that the correct
wrapper is returned.
The test setup for hdfs is a little complicated for windows, needing to
check if the hdfs fixture can be run at all. This was unfortunately not
updated when the integ tests were reorganized into separate runner and
cluster setups.
This commit fixes an issue that was missed in #22534.
`AWSCredentialsProvider.getCredentials()` appears to potentially open a
socket connect. This operation needed to be wrapped in `doPrivileged()`.
This should fix issue #23271.
Gradle's finalizedBy on tasks only ensures one task runs after another,
but not immediately after. This is problematic for our integration tests
since it allows multiple project's integ test clusters to be
simultaneously. While this has not been a problem thus far (gradle 2.13
happened to keep the finalizedBy tasks close enough that no clusters
were running in parallel), with gradle 3.3 the task graph generation has
changed, and numerous clusters may be running simultaneously, causing
memory pressure, and thus generally slower tests, or even failure if the
system has a limited amount of memory (eg in a vagrant host).
This commit reworks how integ tests are configured. It adds an
`integTestCluster` extension to gradle which is equivalent to the current
`integTest.cluster` and moves the rest test runner task to
`integTestRunner`. The `integTest` task is then just a dummy task,
which depends on the cluster runner task, as well as the cluster stop
task. This means running `integTest` in one project will both run the
rest tests, and shut down the cluster, before running `integTest` in
another project.
Today we have multiple ways to define settings when a user needs to create a repository:
* in `elasticsearch.yml` file using `repositories.azure` prefix
* when creating the repository itself with `PUT _snaphot/repo`
The plan is to:
* Deprecate `repositories.azure` settings in 5.x (done with #22856)
* Remove in 6.x (this PR)
Related to #22800
This commit adds the elasticsearch LICENSE.txt to all plugins that
released with elasticsearch, as well as a generated NOTICE.txt specific
to the dependencies of each plugin.
We have a bunch of interfaces that have only a single implementation
for 6 years now. These interfaces are pretty useless from a SW development
perspective and only add unnecessary abstractions. They also require
lots of casting in many places where we expect that there is only one
concrete implementation. This change removes the interfaces, makes
all of the classes final and removes the duplicate `foo` `getFoo` accessors
in favor of `getFoo` from these classes.
This is related to #22116. This commit adds calls that require
SocketPermission connect to forbidden APIs.
The following calls are now forbidden:
- java.net.URL#openStream()
- java.net.URLConnection#connect()
- java.net.URLConnection#getInputStream()
- java.net.Socket#connect(java.net.SocketAddress)
- java.net.Socket#connect(java.net.SocketAddress, int)
- java.nio.channels.SocketChannel#open(java.net.SocketAddress)
- java.nio.channels.SocketChannel#connect(java.net.SocketAddress)
Secure settings from the elasticsearch keystore were not yet validated.
This changed improves support in Settings so that secure settings more
seamlessly blend in with normal settings, allowing the existing settings
validation to work. Note that the setting names are still not validated
(yet) when using the elasticsearc-keystore tool.
As part of #22116 we are going to forbid usage of api
java.net.URL#openStream(). However in a number of places across the
we use this method to read files from the local filesystem. This commit
introduces a helper method openFileURLStream(URL url) to read files
from URLs. It does specific validation to only ensure that file:/
urls are read.
Additionlly, this commit removes unneeded method
FileSystemUtil.newBufferedReader(URL, Charset). This method used the
openStream () method which will soon be forbidden. Instead we use the
Files.newBufferedReader(Path, Charset).
This is related to #22116. Core no longer needs `SocketPermission`
`connect`.
This permission is relegated to these modules/plugins:
- transport-netty4 module
- reindex module
- repository-url module
- discovery-azure-classic plugin
- discovery-ec2 plugin
- discovery-gce plugin
- repository-azure plugin
- repository-gcs plugin
- repository-hdfs plugin
- repository-s3 plugin
And for tests:
- mocksocket jar
- rest client
- httpcore-nio jar
- httpasyncclient jar
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates #22960
Let's make our life easier when debugging/testing.
Also having a flat dir helps us to compare or "synchronize" more easily with Tika project files.
Closes#22958.
Actually we never supported Visio files but we are failing hard (kill a node) when that kind of file is provided.
See https://github.com/elastic/elasticsearch/pull/22079#issuecomment-277035357
This commits excludes Visio parsing from Tika so it does not fail anymore but returns empty content instead.
As a side effect, it also removes support for POTM files.
Closes#22077.
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.
The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388
This change removes the ability to set region for s3 repositories.
Endpoint should be used instead if a custom s3 location needs to be
used.
closes#22758
Follow up of #22857 where we deprecate automatic creation of azure containers.
BTW I found that the `AzureSnapshotRestoreServiceIntegTests` does not bring any value because it runs basically a Snapshot/Restore operation on local files which we already test in core.
So instead of trying to fix it to make it pass with this PR, I simply removed it.
This is related to #22116. The repository-hdfs plugin opens socket
connections. As SocketPermission is transitioned out of core, hdfs
will require connect permission. This pull request wraps operations
that require this permission in doPrivileged blocks.
* S3 repository: Add named configurations
This change implements named configurations for s3 repository as
proposed in #22520. The access/secret key secure settings which were
added in #22479 are reverted, and the only secure settings are those
with the new named configs. All other previously used settings for the
connection are deprecated.
closes#22520
This PR adds a new option for `host_type`: `tag:TAGNAME` where `TAGNAME` is the tag field you defined for your ec2 instance.
For example if you defined a tag `my-elasticsearch-host` in ec2 and set it to `myhostname1.mydomain.com`, then
setting `host_type: tag:my-elasticsearch-host` will tell Discovery Ec2 plugin to read the host name from the
`my-elasticsearch-host` tag. In this case, it will be resolved to `myhostname1.mydomain.com`.
Closes#22566.
This changes build files so that building Elasticsearch works with both Gradle 2.13 as well as higher versions of Gradle (tested 2.14 and 3.3), enabling a smooth transition from Gradle 2.13 to 3.x.
In some cases (apparently with outlook files), mime4j library is needed.
We removed it in the past which can cause elasticsearch to crash when you are using ingest-attachment (and probably mapper-attachments as well in 2.x series) with a file which requires this library.
Similar problem as the one reported at #22077.
This commit replaces specialized functional interfaces in various
plugins with generic options. Instead of creating `StorageRunnable`
interfaces in every plugin we can just use `Runnable` or `CheckedRunnable`.
This commit adds a SpecialPermission constant and uses that constant
opposed to introducing new instances everywhere.
Additionally, this commit introduces a single static method to check that
the current code has permission. This avoids all the duplicated access
blocks that exist currently.
* Upgrade to Lucene 6.4.0
`ValueSource`s are now converted to `DoubleValueSource`s using the Lucene adapter made for the migration to the new API in 6.4.0.
There are presently 7 ctor args used in any rest handlers:
* `Settings`: Every handler uses it to initialize a logger and
some other strange things.
* `RestController`: Every handler registers itself with it.
* `ClusterSettings`: Used by `RestClusterGetSettingsAction` to
render the default values for cluster settings.
* `IndexScopedSettings`: Used by `RestGetSettingsAction` to get
the default values for index settings.
* `SettingsFilter`: Used by a few handlers to filter returned
settings so we don't expose stuff like passwords.
* `IndexNameExpressionResolver`: Used by `_cat/indices` to
filter the list of indices.
* `Supplier<DiscoveryNodes>`: Used to fill enrich the response
by handlers that list tasks.
We probably want to reduce these arguments over time but
switching construction away from guice gives us tighter
control over the list of available arguments.
These parameters are passed to plugins using
`ActionPlugin#initRestHandlers` which is expected to build and
return that handlers immediately. This felt simpler than
returning an reference to the ctors given all the different
possible args.
Breaks java plugins by moving rest handlers off of guice.
* S3 repository: Deprecate specifying credentials through env vars and sys props
This is a follow up to #22479, where storing credentials secure way was
added.
This commit fixes an issue with deprecation logging for lenient
booleans. The underlying issue is that adding deprecation logging for
lenient booleans added a static deprecation logger to the Settings
class. However, the Settings class is initialized very early and in CLI
tools can be initialized before logging is initialized. This leads to
status logger error messages. Additionally, the deprecation logging for
a lot of the settings does not provide useful context (for example, in
the token filter factories, the deprecation logging only produces the
name of the setting, but gives no context which token filter factory it
comes from). This commit addresses both of these issues by changing the
call sites to push a deprecation logger through to the lenient boolean
parsing.
Relates #22696
This changes build files so that building Elasticsearch works with both Gradle 2.13 as well as higher versions of Gradle (tested 2.14 and 3.3), enabling a smooth transition from Gradle 2.13 to 3.x.
This PR removes all leniency in the conversion of Strings to booleans: "true"
is converted to the boolean value `true`, "false" is converted to the boolean
value `false`. Everything else raises an error.
This is related to #22116. Certain plugins (discovery-azure-classic,
discovery-ec2, discovery-gce, repository-azure, repository-gcs, and
repository-s3) open socket connections. As SocketPermissions are
transitioned out of core, these plugins will require connect
permission. This pull request wraps operations that require these
permissions in doPrivileged blocks.
Before, the default chunk size for Azure repositories was
-1 bytes, which meant that if the chunk_size was not set on
the Azure repository, nor as a node setting, then no data
files would get written as part of the snapshot (because
the BlobStoreRepository's PartSliceStream does not know
how to process negative chunk sizes).
This commit fixes the default chunk size for Azure repositories
to be the same as the maximum chunk size. This commit also
adds tests for both the Azure and Google Cloud repositories to
ensure only valid chunk sizes can be set.
Closes#22513
* Settings: Make s3 repository sensitive settings use secure settings
This change converts repository-s3 to use the new secure settings. In
order to support the multiple ways we allow aws creds to be configured,
it also moves the main methods for the keystore wrapper into a
SecureSettings interface, in order to allow settings prefixing to work.
Affix settings are useful to namespace a certain setting. Yet, affix settings
must be specialized for their concrete type which causes lot of code duplication.
This commit allows to reuse an existing setting with and affix setting as soon as
a concrete key is available.
This integrates the mocksocket jar with elasticsearch tests. Mocksocket wraps actions requiring SocketPermissions in doPrivilege blocks. This will eventually allow SocketPermissions to be assigned to the mocksocket jar opposed to the entire elasticsearch codebase.
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.
I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
As the translog evolves towards a full operations log as part of the
sequence numbers push, there is a need for the translog to be able to
represent operations for which a sequence number was assigned, but the
operation did not mutate the index. Examples of how this can arise are
operations that fail after the sequence number is assigned, and gaps in
this history that arise when an operation is assigned a sequence number
but the operation never completed (e.g., a node crash). It is important
that these operations appear in the history so that they can be
replicated and replayed during recovery as otherwise the history will be
incomplete and local checkpoints will not be able to advance. This
commit introduces a no-op to the translog to set the stage for these
efforts.
Relates #22291
The `UnicastZenPing` shows it's age and is the result of many small changes. The current state of affairs is confusing and is hard to reason about. This PR cleans it up (while following the same original intentions). Highlights of the changes are:
1) Clear 3 round flow - no interleaving of scheduling.
2) The previous implementation did a best effort attempt to wait for ongoing pings to be sent and completed. The pings were guaranteed to complete because each used the total ping duration as a timeout. This did make it hard to reason about the total ping duration and the flow of the code. All of this is removed now and ping should just complete within the given duration or not be counted (note that it was very handy for testing, but I move the needed sync logic to the test).
3) Because of (2) the pinging scheduling changed a bit, to give a chance for the last round to complete. We now ping at the beginning, 1/3 and 2/3 of the duration.
4) To offset for (3) a bit, incoming ping requests are now added to on going ping collections.
5) UnicastZenPing never establishes full blown connections (but does reuse them if there). Relates to #22120
6) Discovery host providers are only used once per pinging round. Closes#21739
7) Usage of the ability to open a connection without connecting to a node ( #22194 ) and shorter connection timeouts helps with connections piling up. Closes#19370
8) Beefed up testing and sped them up.
9) removed light profile from production code
Introduces `XContentParser#namedObject which works a little like
`StreamInput#readNamedWriteable`: on startup components register
parsers under names and a superclass. At runtime we look up the
parser and call it to parse the object.
Right now the parsers take a context object they use to help with
the parsing but I hope to be able to eliminate the need for this
context as most what it is used for at this point is to move
around parser registries which should be replaced by this method
eventually. I make no effort to do so in this PR because it is
big enough already. This is meant to the a start down a road that
allows us to remove classes like `QueryParseContext`,
`AggregatorParsers`, `IndicesQueriesRegistry`, and
`ParseFieldRegistry`.
The goal here is to reduce the amount of plumbing required to
allow parsing pluggable things. With this you don't have to pass
registries all over the place. Instead you must pass a super
registry to fewer places and use it to wrap the reader. This is
the same tradeoff that we use for NamedWriteable and it allows
much, much simpler binary serialization. We think we want that
same thing for xcontent serialization.
The only parsing actually converted to this method is parsing
`ScoreFunctions` inside of `FunctionScoreQuery`. I chose this
because it is relatively self contained.
We are currenlty checking that no deprecation warnings are emitted in our query tests. That can be moved to ESTestCase (disabled in ESIntegTestCase) as it allows us to easily catch where our tests use deprecated features and assert on the expected warnings.
With this commit, we introduce a cache to the geoip ingest processor.
The cache is enabled by default and caches the 1000 most recent items.
The cache size is controlled by the setting `ingest.geoip.cache_size`.
Closes#22074
In some cases, it might happen that the `_all` field gets a field type that is
not totally configured, and in particular lacks analyzers. This is due to the
fact that `AllFieldMapper.TypeParser.getDefault` uses `Defaults.FIELD_TYPE` as
a default field type, which does not have any analyzers configured since it
does not know about the default analyzers.
While I was fixing a documentation issue (#22007), I looked at the code and discovered that we actually never read what the user entered as a `readonly` parameter when he creates an azure repository.
So if someone sends:
```
PUT _snapshot/my_backup4
{
"type": "azure",
"settings": {
"account": "my_account2",
"location_mode": "primary_only",
"readonly": true
}
}
```
The repository is not actually defined as `readonly`.
It's caused by the fact we are always overwriting `readonly`setting based on `location_mode`.
If a user sets it to `primary_only`, `readonly` is forced to `false`.
If a user sets it to `primary_then_secondary`, `readonly` is forced to `false`.
If a user sets it to `secondary_only`, `readonly` is forced to `false`.
Note that with this change, a user can force a `secondary_only` repository to `readonly: false` which will lead him to an error later on when we check the repository as per definition in Azure, a secondary repository is not writable.
Another option could have been to detect this mismatch and throw an exception in that case. Note sure it is worth writing more code though.
Closes#22053.
Since the removal of local discovery of #https://github.com/elastic/elasticsearch/pull/20960 we rely on minimum master nodes to be set in our test cluster. The settings is automatically managed by the cluster (by default) but current management doesn't work with concurrent single node async starting. On the other hand, with `MockZenPing` and the `discovery.initial_state_timeout` set to `0s` node starting and joining is very fast making async starting an unneeded complexity. Test that still need async starting could, in theory, still do so themselves via background threads.
Note that this change also removes the usage of `INITIAL_STATE_TIMEOUT_SETTINGS` as the starting of nodes is done concurrently (but building them is sequential)
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license
Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true
Adapt the expectations of some tests to the new format of the Lucene explain output
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.
Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.
When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.