This commit removes the method Strings#splitStringToArray and replaces
the call sites with invocations to String#split. There are only two
explanations for the existence of this method. The first is that
String#split is slightly tricky in that it accepts a regular expression
rather than a character to split on. This means that if s is a string,
s.split(".") does not split on the character '.', but rather splits on
the regular expression '.' which splits on every character (of course,
this is easily fixed by invoking s.split("\\.") instead). The second
possible explanation is that (again) String#split accepts a regular
expression. This means that there could be a performance concern
compared to just splitting on a single character. However, it turns out
that String#split has a fast path for the case of splitting on a single
character and microbenchmarks show that String#split has 1.5x--2x the
throughput of Strings#splitStringToArray. There is a slight behavior
difference between Strings#splitStringToArray and String#split: namely,
the former would return an empty array in cases when the input string
was null or empty but String#split will just NPE at the call site on
null and return a one-element array containing the empty string when the
input string is empty. There was only one place relying on this behavior
and the call site has been modified accordingly.
Now that the current uses of magical camelCase support have been
deprecated, we can remove these in master (sans remaining issues like
BulkRequest). This change removes camel case support from ParseField,
query types, analysis, and settings lookup.
see #8988
We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
We are using `2.0.0` today but Azure team now recommends:
```xml
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>4.0.0</version>
</dependency>
```
This new version fix the timeout issues we have seen with azure storage although #15080 adds a timeout support.
Azure storage client 2.0.0 was not passing correctly this value when it was calling Azure services.
Note that the timeout is a server side timeout and not client side timeout.
It means that it will raise only a timeout when:
* upload of blob is complete
* if azure service is not able to process the blob (and store it) within a given time range.
In which case it will raise an exception which elasticsearch can deal with:
```
java.io.IOException
at __randomizedtesting.SeedInfo.seed([91BC11AEF16E073F:6886FA5308FCE4D8]:0)
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:643)
at com.microsoft.azure.storage.blob.BlobOutputStream.writeBlock(BlobOutputStream.java:444)
at com.microsoft.azure.storage.blob.BlobOutputStream.access$000(BlobOutputStream.java:53)
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:388)
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:385)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.microsoft.azure.storage.StorageException: Operation could not be completed within the specified time.
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:305)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:175)
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1006)
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:978)
at com.microsoft.azure.storage.blob.BlobOutputStream.writeBlock(BlobOutputStream.java:438)
... 9 more
```
The following code was used to test this against Azure platform:
```java
public void testDumb() throws URISyntaxException, StorageException, IOException, InvalidKeyException {
String connectionString = "MY-AZURE-STRING";
CloudStorageAccount storageAccount = CloudStorageAccount.parse(connectionString);
CloudBlobClient client = storageAccount.createCloudBlobClient();
client.getDefaultRequestOptions().setTimeoutIntervalInMs(1000);
CloudBlobContainer container = client.getContainerReference("dumb");
container.createIfNotExists();
CloudBlockBlob blob = container.getBlockBlobReference("blob");
File sourceFile = File.createTempFile("sourceFile", ".tmp");
try {
int fileSize = 10000000;
byte[] buffer = new byte[fileSize];
Random random = new Random();
random.nextBytes(buffer);
logger.info("Generate local file");
FileOutputStream fos = new FileOutputStream(sourceFile);
fos.write(buffer);
fos.close();
logger.info("End generate local file");
FileInputStream fis = new FileInputStream(sourceFile);
logger.info("Start uploading");
blob.upload(fis, fileSize);
logger.info("End uploading");
}
finally {
if (sourceFile.exists()) {
sourceFile.delete();
}
}
}
```
With 2.0.0, the above code was not raising any exception. With 4.0.0, the exception is now thrown correctly.
The default timeout is 5 minutes. See https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/core/Utility.java#L352-L375Closes#12567.
Release notes from 2.0.0:
* Removed deprecated table AtomPub support.
* Removed deprecated constructors which take service clients in favor of constructors which take credentials.
* Added support for "Add" permissions on Blob SAS.
* Added support for "Create" permissions on Blob and File SAS.
* Added support for IP Restricted SAS and Protocol SAS.
* Added support for Account SAS to all services.
* Added support for Minute and Hour Metrics to FileServiceProperties and added support for File Metrics to CloudAnalyticsClient.
* Removed deprecated startCopyFromBlob() on CloudBlob. Use startCopy() instead.
* Removed deprecated Credentials and StorageKey classes. Please use the appropriate methods on StorageCredentialsAccountAndKey instead.
* Fixed a bug in table where a select on a non-existent field resulted in a null reference exception if the corresponding field in the TableEntity was not nullable.
* Fixed a bug in table where JsonParser was automatically closing the response stream before it was completely drained causing socket exhaustion.
* Fixed a bug in StorageCredentialsAccountAndKey.updateKey(String) which prevented valid keys from being set.
* Added CloudBlobContainer.listBlobs(final String, final boolean) method.
* Fixed a bug in blob where using AccessConditions on block blob uploads larger than 64MB done with the upload* methods or block blob uploads done openOutputStream with would fail if the blob did not already exist.
* Added support for setting a proxy per request. Proxy can be set on an OperationContext instance and will be used when that instance is passed to the request method.
* Added support for SAS to the Azure File service.
* Added support for Append Blob.
* Added support for Access Control Lists (ACL) to File Shares.
* Added support for getting and setting of CORS rules to File service.
* Added support for ShareStats to File Shares.
* Added support for copying an Azure File to another Azure File or a Block Blob asynchronously, and aborting Azure File copy operations asynchronously.
* Added support for copying a Blob to an Azure File asynchronously.
* Added support for setting a maximum quota property on a File Share.
* Removed deprecated AuthenticationScheme and its getter and setter. In the future only SharedKey will be used.
* Removed deprecated getter/setters for all request option properties on the service clients. Please use the default request options getter/setters instead.
* Removed getSubDirectoryReference() for blob directories and file directories. Use getDirectoryReference() instead.
* Removed getEntityClass() in TableQuery. Please use getClazzType() instead.
* Added client-side verification for lease duration and break periods.
* Deprecated the setters in table for timestamp as this property is only modifiable by the service.
* Deprecated startCopyFromBlob() on CloudBlob. Use startCopy() instead.
* Deprecated the Credentials and StorageKey classes. Please use the appropriate methods on StorageCredentialsAccountAndKey instead.
* Deprecated constructors which take service clients in favor of constructors which take credentials.
* Fixed a bug where the DateBackwardCompatibility flag was not applied if set on the CloudTableClient default request options.
* Changed library behavior to retry all exceptions thrown when parsing a response object.
* Changed behavior to stop removing query parameters passed in with the resource URI if that URI contains a SAS token. Some query parameters such as comp, restype, snapshot and api-version will still be removed.
* Added support for logging StringToSign to SharedKey and SAS.
* **Added a connect timeout to prevent hangs when establishing the network connection.**
* **Made performance enhancements to the BlobOutputStream class.**
* Fixed a bug where maximum execution time was ignored for file, queue, and table services.
* **Changed the socket timeout to be set to the service side timeout plus 5 minutes when maximum execution time is not set.**
* **Changed the socket timeout to default to 5 minutes rather than infinite when neither service side timeout or maximum execution time are set.**
* Fixed a bug where MD5 was calculated for commitBlockList even though UseTransactionalMD5 was set to false.
* Fixed a bug where selecting fields that did not exist returned an error rather than an EntityProperty with a null value.
* Fixed a bug where table entities with a single quote in their partition or row key could be inserted but not operated on in any other way.
* Fixed a bug for all listing API's where next() would sometimes throw an exception if hasNext() had not been called even if there were more elements to iterate on.
* Added sequence number to the blob properties. This is populated for page blobs.
* Creating a page blob sets its length property.
* Added support for page blob sequence numbers and sequence number access conditions.
* Fixed a bug in abort copy where the lease access condition was not sent to the service.
* Fixed an issue in startCopyFromBlob where if the URI of the source blob contained certain non-ASCII characters they would not be encoded appropriately. This would result in Authorization failures.
* Fixed a small performance issue in XML serialization.
* Fixed a bug in BlobOutputStream and FileOutputStream where flush added data to a request pool rather than immediately committing it to the Azure service.
* Refactored to remove the blob, queue, and file package dependency on table in the error handling code.
* Added additional client-side logging for REST requests, responses, and errors.
Closes#15976.
Instead of modifying methods each time we need to add a new behavior for settings, we can simply pass `SettingsProperty... properties` instead.
`SettingsProperty` could be defined then:
```
public enum SettingsProperty {
Filtered,
Dynamic,
ClusterScope,
NodeScope,
IndexScope
// HereGoesYours;
}
```
Then in setting code, it become much more flexible.
TODO: Note that we need to validate SettingsProperty which are added to a Setting as some of them might be mutually exclusive.
Now we have a nice Setting infra, we can define in Setting class if a setting should be filtered or not.
So when we register a setting, setting filtering would be automatically done.
Instead of writing:
```java
Setting<String> KEY_SETTING = Setting.simpleString("cloud.aws.access_key", false, Setting.Scope.CLUSTER);
settingsModule.registerSetting(AwsEc2Service.KEY_SETTING, false);
settingsModule.registerSettingsFilterIfMissing(AwsEc2Service.KEY_SETTING.getKey());
```
We could simply write:
```java
Setting<String> KEY_SETTING = Setting.simpleString("cloud.aws.access_key", false, Setting.Scope.CLUSTER, true);
settingsModule.registerSettingsFilterIfMissing(AwsEc2Service.KEY_SETTING.getKey());
```
It also removes `settingsModule.registerSettingsFilterIfMissing` method.
The plan would be to remove as well `settingsModule.registerSettingsFilter` method but it still used with wildcards. For example in Azure Repository plugin:
```java
module.registerSettingsFilter(AzureStorageService.Storage.PREFIX + "*.account");
module.registerSettingsFilter(AzureStorageService.Storage.PREFIX + "*.key");
```
Closes#16598.
This change rewrites the entire settings filtering mechanism to be immutable.
All filters must be registered up-front in the SettingsModule. Filters that are comma-sparated are
not allowed anymore and check on registration.
This commit also adds settings filtering to the default settings recently added to ensure we don't render
filtered settings.
This commit enableds strict settings validation on node startup. All settings
passed to elasticsearch either through system properties, yaml files or any other
way to pass settings must be registered and valid. Settings that are unknown ie. due to
typos or due to deprecation or removal will cause the node to NOT start up. Plugins
have to declare all their settings on the `SettingsModule#registerSetting` and settings for
plugins that are not installed must be removed.
This commit also removes the ability to specify the nodes name via `-Des.name` or just `name` in the
configuration files. The node name must be prefixed with the node prexif like `node.name: Boom`. Left over
usage of `name` will also cause startup to fail.
One test we forgot in #14843 and #13779 is the default client selection.
Most of the time, users won't define explicitly which client they want to use because they are providing only one connection to Azure storage:
```yml
cloud:
azure:
storage:
my_account:
account: your_azure_storage_account
key: your_azure_storage_key
```
Then using the default client like this:
```sh
# This one will use the default account (my_account1)
curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{
"type": "azure"
}'
```
This commit adds tests to check that the right client is still selected when no client is explicitly set when creating the snapshot.
This commit replaces server side timeout (which is BTW not correctly implemented in azure client 2.0.0 but fixed later #16084) with a client side timeout.
As a consequence, for each request sent to azure, azure client will raise an exception after a given amount of time (timeout).
Closes#12567
The rest test framework, because it used to be tightly integrated with
ESIntegTestCase, currently expects the addresses for the test cluster to
be passed using the transport protocol port. However, it only uses this
to then find the http address.
This change makes ESRestTestCase extend from ESTestCase instead of
ESIntegTestCase, and changes the sysprop used to tests.rest.cluster,
which now takes the http address.
closes#15459
Site plugins used to be used for things like kibana and marvel, but
there is no longer a need since kibana (and marvel as a kibana plugin)
uses node.js. This change removes site plugins, as well as the flag for
jvm plugins. Now all plugins are jvm plugins.
Using a single azure account is now rejected.
This commit fixes this issue and adds a test for it.
This regression was introduced with #13779. Hopefully no elasticsearch version has been released since then.
Needs to be merged in 2.2, 2.x and master branches.
By default, azure does not timeout. This commit adds support for a timeout settings which defaults to 5 minutes.
It's a timeout **per request** not a global timeout for a snapshot request.
It can be defined globally, per account or both. Defaults to `5m`.
```yml
cloud:
azure:
storage:
timeout: 10s
my_account1:
account: your_azure_storage_account1
key: your_azure_storage_key1
default: true
my_account2:
account: your_azure_storage_account2
key: your_azure_storage_key2
timeout: 30s
```
In this example, timeout will be 10s for `my_account1` and 30s for `my_account2`.
Closes#14277.
All those repository settings can also be defined globally in `elasticsearch.yml` file using prefix `repositories.azure.`. For example:
```yml
repositories.azure:
container: backup-container
base_path: backups
chunk_size: 32m
compress": true
```
Closes#13776.
Bug introduced in #13779: we don't filter anymore credentials because we were filtering `cloud.azure.storage.account` and `cloud.azure.storage.key` but now credentials are like `cloud.azure.storage.XXX.account` and `cloud.azure.storage.XXX.key` where `XXX` can be a storage setting id.
Closes#14843.
We actually want to keep the test when using deprecated setting in 3.0.
We will keep this setting deprecated as well so people will be able to update in a smoother way.
Also add the deprecating information to the migration documentation.
Follow up for #13228.
This commit adds support for a secondary storage account:
```yml
cloud:
azure:
storage:
my_account1:
account: your_azure_storage_account1
key: your_azure_storage_key1
default: true
my_account2:
account: your_azure_storage_account2
key: your_azure_storage_key2
```
When creating a repository, you can choose which azure account you want to use for it:
```sh
curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{
"type": "azure"
}'
curl -XPUT localhost:9200/_snapshot/my_backup2?pretty -d '{
"type": "azure",
"settings": {
"account" : "my_account2",
"location_mode": "secondary_only"
}
}'
```
`location_mode` supports `primary_only` or `secondary_only`. Defaults to `primary_only`. Note that if you set it
to `secondary_only`, it will force `read_only` to true.
`AbstractLegacyBlobContainer` was kept for historical reasons (see #13434).
We can migrate Azure and S3 repositories to use the new methods added in #13434 so we can remove `AbstractLegacyBlobContainer` class.
There are three ways `@Test` was used. Way one:
```java
@Test
public void flubTheBlort() {
```
This way was always replaced with:
```java
public void testFlubTheBlort() {
```
Or, maybe with a better method name if I was feeling generous.
Way two:
```java
@Test(throws=IllegalArgumentException.class)
public void testFoo() {
methodThatThrows();
}
```
This way of using `@Test` is actually pretty OK, but to get the tools to ban
`@Test` entirely it can't be used. Instead:
```java
public void testFoo() {
try {
methodThatThrows();
fail("Expected IllegalArgumentException");
} catch (IllegalArgumentException e ) {
assertThat(e.getMessage(), containsString("something"));
}
}
```
This is longer but tests more than the old ways and is much more precise.
Compare:
```java
@Test(throws=IllegalArgumentException.class)
public void testFoo() {
some();
copy();
and();
pasted();
methodThatThrows();
code(); // <---- This was left here by mistake and is never called
}
```
to:
```java
@Test(throws=IllegalArgumentException.class)
public void testFoo() {
some();
copy();
and();
pasted();
try {
methodThatThrows();
fail("Expected IllegalArgumentException");
} catch (IllegalArgumentException e ) {
assertThat(e.getMessage(), containsString("something"));
}
}
```
The final use of test is:
```java
@Test(timeout=1000)
public void testFoo() {
methodThatWasSlow();
}
```
This is the most insidious use of `@Test` because its tempting but tragically
flawed. Its flaws are:
1. Hard and fast timeouts can look like they are asserting that something is
faster and even do an ok job of it when you compare the timings on the same
machine but as soon as you take them to another machine they start to be
invalid. On a slow VM both the new and old methods fail. On a super-fast
machine the slower and faster ways succeed.
2. Tests often contain slow `assert` calls so the performance of tests isn't
sure to predict the performance of non-test code.
3. These timeouts are rude to debuggers because the test just drops out from
under it after the timeout.
Confusingly, timeouts are useful in tests because it'd be rude for a broken
test to cause CI to abort the whole build after it hits a global timeout. But
those timeouts should be very very long "backstop" timeouts and aren't useful
assertions about speed.
For all its flaws `@Test(timeout=1000)` doesn't have a good replacement __in__
__tests__. Nightly benchmarks like http://benchmarks.elasticsearch.org/ are
useful here because they run on the same machine but they aren't quick to check
and it takes lots of time to figure out the regressions. Sometimes its useful
to compare dueling implementations but that requires keeping both
implementations around. All and all we don't have a satisfactory answer to the
question "what do you replace `@Test(timeout=1000)`" with. So we handle each
occurrence on a case by case basis.
For files with `@Test` this also:
1. Removes excess blank lines. They don't help anything.
2. Removes underscores from method names. Those would fail any code style
checks we ever care to run and don't add to readability. Since I did this manually
I didn't do it consistently.
3. Make sure all test method names start with `test`. Some used to end in `Test` or start
with `verify` or `check` and they were picked up using the annotation. Without the
annotation they always need to start with `test`.
4. Organizes imports using the rules we generate for Eclipse. For the most part
this just removes `*` imports which is a win all on its own. It was "required"
to quickly remove `@Test`.
5. Removes unneeded casts. This is just a setting I have enabled in Eclipse and
forgot to turn off before I did this work. It probably isn't hurting anything.
6. Removes trailing whitespace. Again, another Eclipse setting I forgot to turn
off that doesn't hurt anything. Hopefully.
7. Swaps some tests override superclass tests to make them empty with
`assumeTrue` so that the reasoning for the skips is logged in the test run and
it doesn't "look like" that thing is being tested when it isn't.
8. Adds an oxford comma to an error message.
The total test count doesn't change. I know. I counted.
```bash
git checkout master && mvn clean && mvn install | tee with_test
git no_test_annotation master && mvn clean && mvn install | tee not_test
grep 'Tests summary' with_test > with_test_summary
grep 'Tests summary' not_test > not_test_summary
diff with_test_summary not_test_summary
```
These differ somewhat because some tests are skipped based on the random seed.
The total shouldn't differ. But it does!
```
1c1
< [INFO] Tests summary: 564 suites (1 ignored), 3171 tests, 31 ignored (31 assumptions)
---
> [INFO] Tests summary: 564 suites (1 ignored), 3167 tests, 17 ignored (17 assumptions)
```
These are the core unit tests. So we dig further:
```bash
cat with_test | perl -pe 's/\n// if /^Suite/;s/.*\n// if /IGNOR/;s/.*\n// if /Assumption #/;s/.*\n// if /HEARTBEAT/;s/Completed .+?,//' | grep Suite > with_test_suites
cat not_test | perl -pe 's/\n// if /^Suite/;s/.*\n// if /IGNOR/;s/.*\n// if /Assumption #/;s/.*\n// if /HEARTBEAT/;s/Completed .+?,//' | grep Suite > not_test_suites
diff <(sort with_test_suites) <(sort not_test_suites)
```
The four tests with lower test numbers are all extend `AbstractQueryTestCase`
and all have a method that looks like this:
```java
@Override
public void testToQuery() throws IOException {
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
super.testToQuery();
}
```
It looks like this method was being double counted on master and isn't anymore.
Closes#14028
Until now we had a cloud-azure plugin which is providing 3 distinct features:
* discovery on Azure
* snapshot/restore on Aure
* SMB store
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.