Commit Graph

1859 Commits

Author SHA1 Message Date
Tanguy Leroux f27cb96a64
Use AmazonS3.doesObjectExist() method in S3BlobContainer (#27723)
This pull request changes the S3BlobContainer.blobExists() method implementation 
to make it use the AmazonS3.doesObjectExist() method instead of 
AmazonS3.getObjectMetadata(). The AmazonS3 implementation takes care of 
catching any thrown AmazonS3Exception and compares its response code with 404, 
returning false (object does not exist) or lets the exception be propagated.
2017-12-12 09:30:36 +01:00
Luca Cavanna f4fb4d3bf5
Add support for filtering mappings fields (#27603)
Add support for filtering fields returned as part of mappings in get index, get mappings, get field mappings and field capabilities API.

Plugins can plug in their own function, which receives the index as argument, and return a predicate which controls whether each field is included or not in the returned output.
2017-12-05 20:31:29 +01:00
Jason Tedor 42a4ad35da
Add node name to thread pool executor name
This commit adds the node name to the names of thread pool executors so
that the node name is visible in rejected execution exception messages.

Relates #27663
2017-12-05 07:45:40 -05:00
Henrik Lindström 7a44596446 Catch InvalidPathException in IcuCollationTokenFilterFactory (#27202)
Using custom rules in the icu_collation filter can fail on Windows. If the rules are interpreted 
as a file location, this leads to an InvalidPathException when trying to read the rules from a file.
2017-12-04 10:29:08 +01:00
Adrien Grand 6323bb0d97
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27619)
This new snapshot mostly brings a change to TopFieldCollector which can now
early terminate collection when trackTotalHits is `false`.

As a follow-up, we should replace our usage of
`EarlyTerminatingSortingCollector` with this new option.
2017-12-04 09:40:08 +01:00
James Baiera e16f1271b6
Fix SecurityException when HDFS Repository used against HA Namenodes (#27196)
* Sense HA HDFS settings and remove permission restrictions during regular execution.

This PR adds integration tests for HA-Enabled HDFS deployments, both regular and secured. 
The Mini HDFS fixture has been updated to optionally run in HA-Mode. A new test suite has 
been added for reproducing the effects of a Namenode failing over during regular repository 
usage. Going forward, the HDFS Repository will still be subject to its self imposed permission 
restrictions during normal use, but will no longer restrict them when running against an HA 
enabled HDFS cluster. Instead, the plugin will rely on the provided security policy and not 
further restrict the permissions so that the transparent operation to failover to a different 
Namenode in the client does not raise security exceptions. Additionally, we are now testing the 
secure mode with SASL based wire encryption of data between Elasticsearch and HDFS. This 
includes a missing library (commons codec) in order to support this change.
2017-12-01 14:26:05 -05:00
Jason Tedor d0cd18169e Remove stale awaits fix on azure master nodes test
This awaits fix has been there forever and no one seems to know what to
do with this test. I say let CI churn on it because it passed for me
three out of three times. If there is something wrong with it, we will
know quickly and can then address with the new information that we have.
2017-11-28 22:43:34 -05:00
Adrien Grand 996990ad1f
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496)
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
2017-11-28 14:52:42 +01:00
David Turner 7ac361d86e Update @AwaitsFix URL to point at an issue in the current repo 2017-11-28 09:35:46 +00:00
Tanguy Leroux 50a2459adf
Update Google SDK to version 1.23 (#27381)
This commit updates the google-api-client library to version 1.23.0.

Related to #26636
2017-11-15 15:30:27 +01:00
Tanguy Leroux dd51c231ac
Create new handlers for every new request in GoogleCloudStorageService (#27339)
This commit changes the DefaultHttpRequestInitializer in order to make
it create new HttpIOExceptionHandler and HttpUnsuccessfulResponseHandler
for every new HTTP request instead of reusing the same two handlers for
all requests.

Closes #27092
2017-11-14 11:43:32 +01:00
Jason Tedor d375cef73c
Upgrade AWS SDK Jackson Databind to 2.6.7.1
The AWS SDK has a transitive dependency on Jackson Databind. While the
AWS SDK was recently upgraded, the Jackson Databind dependency was not
pulled along with it to the version that the AWS SDK depends on. This
commit upgrades the dependencies for discovery-ec2 and repository-s3
plugins to match versions on the AWS SDK transitive dependencies.

Relates #27361
2017-11-13 12:05:14 -05:00
Simon Willnauer 2299c70371
Allow affix settings to specify dependencies (#27161)
We use affix settings to group settings / values under a certain namespace.
In some cases like login information for instance a setting is only valid if
one or more other settings are present. For instance `x.test.user` is only valid
if there is an `x.test.passwd` present and vice versa. This change allows to specify
such a dependency to prevent settings updates that leave settings in an inconsistent
state.
2017-11-13 12:06:36 +01:00
Tanguy Leroux f6c2ea0f7d [Test] Fix S3BlobStoreContainerTests.testNumberOfMultiparts() 2017-11-10 15:45:20 +01:00
Tanguy Leroux 9c4d6c629a
Remove S3 output stream (#27280)
Now the blob size information is available before writing anything, 
the repository implementation can know upfront what will be the 
more suitable API to upload the blob to S3.

This commit removes the DefaultS3OutputStream and S3OutputStream 
classes and moves the implementation of the upload logic directly in the 
S3BlobContainer.

related #26993
closes #26969
2017-11-10 12:22:33 +01:00
Guillaume Le Floch ac5fd6a7d9 Update Tika version to 1.15
This commit upgrades the Tika dependency to version 1.15.

Relates #25003
2017-11-09 13:16:44 -05:00
Tanguy Leroux 184dda9eb0
Update to AWS SDK 1.11.223 (#27278) 2017-11-09 13:25:51 +01:00
Jason Tedor 58a28dacbd
Remove colons from task and configuration names
Gradle 5.0 will remove support for colons in configuration and task
names. This commit fixes this for our build by removing all current uses
of colons in configuration and task names.

Relates #27305
2017-11-08 15:22:31 -05:00
David Roberts 749c3ec716
Remove the single argument Environment constructor (#27235)
Only tests should use the single argument Environment constructor.  To
enforce this the single arg Environment constructor has been replaced with
a test framework factory method.

Production code (beyond initial Bootstrap) should always use the same
Environment object that Node.getEnvironment() returns.  This Environment
is also available via dependency injection.
2017-11-04 13:25:09 +00:00
kel 0f21262b36 Do not create directories if repository is readonly (#26909)
For FsBlobStore and HdfsBlobStore, if the repository is read only, the blob store should be aware of the readonly setting and do not create directories if they don't exist.

Closes #21495
2017-11-03 13:10:50 +01:00
Colin Goodheart-Smithe c1b8140c83
Upgrade to Lucene 7.1 (#27225) 2017-11-02 13:25:33 +00:00
Colin Goodheart-Smithe 99aca9cdfc
Enhances exists queries to reduce need for `_field_names` (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
Christoph Büscher 9253ea8aec Fix beidermorse phonetic token filter for unspecified `languageset` (#27112)
Currently, when we create a BeiderMorseFilter with an unspecified `languageset`,
the filter will not guess the language, which should be the default behaviour.
This change fixes this and adds a simple test for the cases with and without
provided `languageset` settings.

Closes #26771
2017-10-27 10:07:36 +02:00
Tanguy Leroux 463e7e6fa3 Revert "Upgrade to Jackson 2.9.2 (#27032)"
This reverts commit 0b9acc5ace.
2017-10-20 08:25:41 +02:00
Tanguy Leroux f78b2e5bc9 Fix ingest-attachment yaml REST test 2017-10-19 22:20:50 +02:00
Simon Willnauer cdd7c1e6c2 Return List instead of an array from settings (#26903)
Today we return a `String[]` that requires copying values for every
access. Yet, we already store the setting as a list so we can also directly
return the unmodifiable list directly. This makes list / array access in settings
a much cheaper operation especially if lists are large.
2017-10-09 09:52:08 +02:00
Md. Abdulla-Al-Sun a40c474e10
Added Bengali Analyzer to Elasticsearch with respect to the lucene update(PR#238) 2017-10-05 13:25:05 +02:00
Simon Willnauer 00dfdf50cf Represent lists as actual lists inside Settings (#26878)
Today we represent each value of a list setting with it's own dedicated key
that ends with the index of the value in the list. Aside of the obvious
weirdness this has several issues especially if lists are massive since it
causes massive runtime penalties when validating settings. Like a list of 100k
words will literally cause a create index call to timeout and in-turn massive
slowdown on all subsequent validations runs.

With this change we use a simple string list to represent the list. This change
also forbids to add a settings that ends with a .0 which was internally used to
detect a list setting.  Once this has been rolled out for an entire major
version all the internal .0 handling can be removed since all settings will be
converted.

Relates to #26723
2017-10-05 09:27:08 +02:00
Martijn van Groningen dca787ed8a
upgrade to Lucene 7.1.0 snapshot version 2017-10-05 09:06:56 +02:00
David Pilato 84a3899550 Simplify Azure blobStore method signatures (#26752)
While working on #26751, I found that we are passing the container name on every single method although we don't need it as it is stored within the blobstore object already.

This commit simplifies a bit that part of the code.

It also removes `repositoryName` from AzureBlobStore which was not used anymore.
Also we move some properties in AzureBlobContainer to `private` members.
2017-10-04 20:17:50 +02:00
Simon Willnauer d1533e2397 Remove Settings#getAsMap() (#26845)
Since `#getAsMap` exposes internal representation we are trying to remove it
step by step. This commit is cleaning up some xcontent writing as well as
usage in tests
2017-10-04 01:21:38 -06:00
Simon Willnauer 7b8d036ab5 Replace group map settings with affix setting (#26819)
We use group settings historically instead of using a prefix setting which is more restrictive and type safe. The majority of the usecases needs to access a key, value map based on the _leave node_ of the setting ie. the setting `index.tag.*` might be used to tag an index with `index.tag.test=42` and `index.tag.staging=12` which then would be turned into a `{"test": 42, "staging": 12}` map. The group settings would always use `Settings#getAsMap` which is loosing type information and uses internal representation of the settings. Using prefix settings allows now to access such a method type-safe and natively.
2017-09-30 14:27:21 +02:00
David Pilato 9ba5e168e4 Don't use create static storage service
Even though you annotate the Test class with `@ThirdParty` the static
code is initialized.

In that case it fails with:

```
==> Test Info: seed=529C3C6977F695FC; jvms=3; suites=6
Suite: org.elasticsearch.repositories.azure.AzureSnapshotRestoreTests
ERROR   0.00s J2 | AzureSnapshotRestoreTests (suite) <<< FAILURES!
   > Throwable #1: java.lang.IllegalStateException: to run integration tests, you need to set -Dtests.thirdparty=true and -Dtests.azure.account=azure-account -Dtests.azure.key=azure-key
   >    at org.elasticsearch.cloud.azure.AzureTestUtils.generateMockSecureSettings(AzureTestUtils.java:37)
   >    at org.elasticsearch.repositories.azure.AzureSnapshotRestoreTests.generateMockSettings(AzureSnapshotRestoreTests.java:81)
   >    at org.elasticsearch.repositories.azure.AzureSnapshotRestoreTests.<clinit>(AzureSnapshotRestoreTests.java:84)
   >    at java.lang.Class.forName0(Native Method)
   >    at java.lang.Class.forName(Class.java:348)
Completed [1/6] on J2 in 2.21s, 0 tests, 1 error <<< FAILURES!
```

Closes #26812.

(cherry picked from commit eb6d714 for master branch)
2017-09-28 16:40:16 +01:00
Hendrik Muhs 0358bb5f34 fix of disabling #26812 2017-09-28 16:14:34 +02:00
Hendrik Muhs bf4d3123bd disable AzureSnapshotRestoreTests, see #26812 2017-09-28 15:42:53 +02:00
David Pilato 1ccb497c0d Use Azure upload method instead of our own implementation (#26751)
* Use Azure upload method instead of our own implementation

We are not following the Azure documentation about uploading blobs to Azure storage. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-java-how-to-use-blob-storage#upload-a-blob-into-a-container

Instead we are using our own implementation which might cause some troubles and rarely some blobs can be not immediately commited just after we close the stream. Using the standard implementation provided by Azure team should allow us to benefit from all the magic Azure SDK team already wrote.

And well... Let's just read the doc!

* Adapt integration tests to secure settings

That was a missing part in #23405.

* Simplify all the integration tests and *extends ESBlobStoreRepositoryIntegTestCase tests

    * removes IT `testForbiddenContainerName()` as it is useless. The plugin does not create anymore the container but expects that the user has created it before registering the repository
   * merges 2 IT classes so all IT tests are ran from one single class
   * We don't remove/create anymore the container between each single test but only for the test suite
2017-09-28 13:15:37 +02:00
olcbean 6952f7b560 Validate top-level keys for create index request (#23755) (#23869)
This commit ensures create index requests do not ignore unknown keys passed to the request.

closes #23755
2017-09-26 09:49:20 -07:00
David Pilato 3f71772cd2 Azure snapshots can not be restored anymore (#26778)
While working on #26751 and doing some manual integration testing I found that this #22858 removed an important line of our code:

`AzureRepository` overrides default `initializeSnapshot` method which creates metadata files and do other stuff.

But with PR #22858, I wrote:

```java
    @Override
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
    }
```

instead of

```java
    @Override
    public void initializeSnapshot(SnapshotId snapshotId, List<IndexId> indices, MetaData clusterMetadata) {
        if (blobStore.doesContainerExist(blobStore.container()) == false) {
            throw new IllegalArgumentException("The bucket [" + blobStore.container() + "] does not exist. Please create it before " +
                " creating an azure snapshot repository backed by it.");
        }
        super.initializeSnapshot(snapshotId, indices, clusterMetadata);
    }
```

As we never call `super.initializeSnapshot(...)` files are not created and we can't restore what we saved.

Closes #26777.
2017-09-25 14:35:27 -04:00
Simon Willnauer aab4655e63 Unify Settings xcontent reading and writing (#26739)
This change adds a fromXContent method to Settings that allows to read
the xcontent that is produced by toXContent. It also replaces the entire settings
loader infrastructure and removes the structured map representation. Future PRs will
also tackle the `getAsMap` that exposes the internal represenation of settings for
better encapsulation.
2017-09-25 13:23:01 +02:00
Jason Tedor 2e63a13c0a Upgrade to Log4j 2.9.1
This commit upgrades the Log4j dependency, picking up a fix for an issue
with handling stack traces on JDK 9.

Relates #26750
2017-09-22 11:57:06 -04:00
Jason Tedor e0db89bc35 Upgrade to Lucene 7.0.0
This commit upgrades to the GA release of Luence 7!

Relates #26744
2017-09-21 19:19:33 -04:00
James Baiera c760eec054 Add permission checks before reading from HDFS stream (#26716)
Add checks for special permissions before reading hdfs stream data. Also adds test from 
readonly repository fix. MiniHDFS will now start with an existing repository with a single snapshot 
contained within. Readonly Repository is created in tests and attempts to list the snapshots 
within this repo.
2017-09-21 11:55:07 -04:00
kel 601be4f83e Add azure storage endpoint suffix #26432 (#26568)
Allow specifying azure storage endpoint suffix for an azure client.
2017-09-20 22:26:19 -07:00
Ryan Ernst bebff47b5b File Discovery: Remove fallback with zen discovery (#26667)
When adding file based discovery, we added a fallback when the discovery
type was set to zen (the default, so everyone got this warning). This
commit removes the fallback for 6.0. Setting file discovery should now
happen explicitly through the hosts_provider setting.

closes #26661
2017-09-19 16:32:34 -07:00
Jason Tedor bdd9953aa4 Fix discovery-file plugin to use custom config path
The discovery-file plugin was not config path aware, so it always picked
up the default config path (from Elasticsearch home) rather than a
custom config path. This commit fixes the discovery-file plugin to
respect a custom config path.

Relates #26662
2017-09-16 11:00:33 -04:00
Claudio Bley 7184cf8b5b Fix kuromoji default stoptags (#26600)
Initialize the default stop-tags in `KuromojiPartOfSpeechFilterFactory` if the
`stoptags` are not given in the config. Also adding a test which checks that 
part-of-speech tokens are removed when using the kuromoji_part_of_speech 
filter.
2017-09-15 12:25:09 +02:00
Christoph Büscher c7c6443b10 [Docs] "The the" is a great band, but ... (#26644)
Removing several occurrences of this typo in the docs and javadocs, seems to be
a common mistake. Corrections turn up once in a while in PRs, better to correct
some of this in one sweep.
2017-09-14 15:08:20 +02:00
David Pilato b6c6effa2a Move all repository-azure classes under one single package (#26624)
As we did for S3, we can collapse all packages within one single `org.elasticsearch.repositories.azure` package name.

Follow up for https://github.com/elastic/elasticsearch/pull/23518#issuecomment-328903585
2017-09-14 11:56:02 +02:00
David Pilato a34db4e09f Support for accessing Azure repositories through a proxy (#23518)
You can define a proxy using the following settings:

```yml
azure.client.default.proxy.host: proxy.host
azure.client.default.proxy.port: 8888
azure.client.default.proxy.type: http
```

Supported values for `proxy.type` are `direct`, `http` or `socks`. Defaults to `direct` (no proxy).

Closes #23506

BTW I changed a test `testGetSelectedClientBackoffPolicyNbRetries` as it was using an old setting name `cloud.azure.storage.azure.max_retries` instead of `azure.client.azure1.max_retries`.
2017-09-13 11:51:55 +02:00
David Pilato b01b1c2a58 Remove azure deprecated settings (#26099)
Follow up for #23405.

We remove azure deprecated settings in 7.0:

* The legacy azure settings which where starting with `cloud.azure.storage.` prefix have been removed.
This includes `account`, `key`, `default` and `timeout`.
You need to use settings which are starting with `azure.client.` prefix instead.

* Global timeout setting `cloud.azure.storage.timeout` has been removed.
You must set it per azure client instead. Like `azure.client.default.timeout: 10s` for example.
2017-09-12 16:51:44 +02:00