Some tests still start http implicitly or miss configuring the transport clients correctly.
This commit fixes all remaining tests and adds a depdenceny to `transport-netty` from
`qa/smoke-test-http` and `modules/reindex` since they need an http server running on the nodes.
This also moves all required permissions for netty into it's module and out of core.
This change adds a createComponents() method to Plugin implementations
which they can use to return already constructed componenents/services.
Eventually this should be just services ("components" don't really do
anything), but for now it allows any object so that preconstructed
instances by plugins can still be bound to guice. Over time we should
add basic services as arguments to this method, but for now I have left
it empty so as to not presume what is a necessary service.
The DiscoveryNodeService exists to register CustomNodeAttributes which
plugins can add. This is not necessary, since plugins can already add
additional attributes, and use the node attributes prefix.
This change removes the DiscoveryNodeService, and converts the only
consumer, the ec2 discovery plugin, to add the ec2 availability zone
in additionalSettings().
Repository plugins currently use a lot of custom classes like
RepositoryName and RepositorySettings in order to use guice to construct
repository implementations. But repositories now only really need their
settings to be constructed. Anything else they need (eg a cloud client)
can be constructed within the plugin, instead of via guice.
This change makes repository plugins use the new pull model. It removes
guice from the construction of Repository objects (no more child
injectors) and also from all repository plugins.
The api for snapshot/restore was split up between two interfaces,
Repository and IndexShardRepository. There was also complex
initialization and injection between the two. However, there is always a
one to one relationship between the two.
This change moves the IndexShardRepository api into Repository, as well
as updates the API so as not to require any services to be injected for
sublcasses.
* master: (192 commits)
[TEST] Fix rare OBOE in AbstractBytesReferenceTestCase
Reindex from remote
Rename writeThrowable to writeException
Start transport client round-robin randomly
Reword Refresh API reference (#19270)
Update fielddata.asciidoc
Fix stored_fields message
Add missing footer notes in mapper size docs
Remote BucketStreams
Add doc values support to the _size field in the mapper-size plugin
Bump version to 5.0.0-alpha5.
Update refresh.asciidoc
Update shrink-index.asciidoc
Change Debian repository for Vagrant debian-8 box
[TEST] fix test to account for internal empyt reference optimization
Upgrade to netty 3.10.6.Final (#19235)
[TEST] fix histogram test when extended bounds overlaps data
Remove redundant modifier
Simplify TcpTransport interface by reducing send code to a single send method (#19223)
Fix style violation in InstallPluginCommand.java
...
This change activates the doc_values on the _size field for indices created after 5.0.0-alpha4.
It also adds a note in the breaking changes that explain the situation and how to get around it.
Closes#18334
Today throughout the codebase, catch throwable is used with reckless
abandon. This is dangerous because the throwable could be a fatal
virtual machine error resulting from an internal error in the JVM, or an
out of memory error or a stack overflow error that leaves the virtual
machine in an unstable and unpredictable state. This commit removes
catch throwable from the codebase and removes the temptation to use it
by modifying listener APIs to receive instances of Exception instead of
the top-level Throwable.
Relates #19231
Rename `fields` to `stored_fields` and add `docvalue_fields`
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
The only reason for LifecycleComponent taking a generic type was so that
it could return that type on its start and stop methods. However, this
chaining has no practical necessity. Instead, start and stop can be
void, and a whole bunch of confusing generics disappear.
Before, a repository would maintain an index file (named 'index') per
repository, that contained the current snapshots in the repository.
This file was not atomically written, so repositories had to depend on
listing the blobs in the repository to determine what the current
snapshots are, and only rely on the index file if the repository does
not support the listBlobs operation. This could cause an incorrect view
of the current snapshots in the repository if any prior snapshot delete
operations failed to delete snapshot metadata files.
This commit introduces the atomic writing of the index file, and because
atomic writes are not guaranteed if the file already exists, we write to
a generational index file (index-N, where N is the current generation).
We also maintain an index-latest file that contains the current
generation, for those repositories that cannot list blobs.
Closes#19002
Relates #18156
BytesReference should be a really simple interface, yet it has a gazillion
ways to achieve the same this. Methods like `#hasArray`, `#toBytesArray`, `#copyBytesArray`
`#toBytesRef` `#bytes` are all really duplicates. This change simplifies the interface
dramatically and makes implementations of it much simpler. All array access has been removed
and is streamlined through a single `#toBytesRef` method. Utility methods to materialize a
compact byte array has been added too for convenience.
Repository-S3 needs a special permission because of problems in AmazonS3Client: when no region is set on a AmazonS3Client instance, the AWS SDK loads all known partitions from a JSON file and uses a Jackson's ObjectMapper for that: this one, in version 2.5.3 with the default binding options, tries to suppress access checks of ctor/field/method and thus requires this special permission. AWS must be fixed to uses Jackson correctly and have the correct modifiers on binded classes.
This must be fixed in aws sdk (see https://github.com/aws/aws-sdk-java/issues/766) but in the meanwhile we have no choice.
closes#18539
Raise IOException on deleteBlob if the blob doesn't exist
This commit raises an IOException on BlobContainer#deleteBlob
if the blob does not exist, in conformance with the BlobContainer
interface contract. Each implementation of BlobContainer now
conforms to this contract (file system, S3, Azure, HDFS). This
commit also contains blob container tests for each of the
repository implementations.
Closes#18530
As discussed at https://github.com/elastic/elasticsearch-cloud-azure/issues/91#issuecomment-229113595, we know that the current `discovery-azure` plugin only works with Azure Classic VMs / Services (which is somehow Legacy now).
The proposal here is to rename `discovery-azure` to `discovery-azure-classic` in case some users are using it.
And deprecate it for 5.0.
Closes#19144.
The factory for ingest processor is generic, but that is only for the
return type of the create mehtod. However, the actual consumer of the
factories only cares about Processor, so generics are not needed.
This change removes the generic type from the factory. It also removes
AbstractProcessorFactory which only existed in order pull the optional
tag from config. This functionality is moved to the caller of the
factories in ConfigurationUtil, and the create method now takes the tag.
This allows the covariant return of the implementation to work with
tests not needing casts.
When GCE region is empty we get back from the API something like:
```
{
"id": "dummy"
}
```
instead of:
```
{
"id": "dummy",
"items":[ ]
}
```
This generates a NPE when we aggregate all the lists into a single one.
Closes#16967.
Previously all rest handlers would take Client in their injected ctor.
However, it was only to hold the client around for runtime. Instead,
this can be done just once in the HttpService which handles rest
requests, and passed along through the handleRequest method. It also
should always be a NodeClient, and other types of Clients (eg a
TransportClient) would not work anyways (and some handlers can be
simplified in follow ups like reindex by taking NodeClient).
This change allows Plugin implementions to implement Closeable when they
have resources that should be released. As a first example of how this
can be used, I switched over ingest plugins, which just had the geoip
processor. The ingest framework had chains of closeable to support this,
which is now removed.
The discovery-plugin has been broken since 2.x because the code was not compliant with the security manager and because plugins have been refactored.
closes#18637, #15630
Since the Settings infrastructure has been improved, a group setting must be registered by the repository-azure plugin to allow settings like "cloud.azure.storage.my_account.account" to be coherent with Azure plugin documentation.
Instead of plugins calling `registerTokenizer` to extend the analyzer
they now instead have to implement `AnalysisPlugin` and override
`getTokenizer`. This lines up extending plugins in with extending
scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry`
immediately as part of its constructor which makes testing anslysis
much simpler.
This also moves the default analysis configuration into `AnalysisModule`
which is how search is setup.
Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`.
Instead it is only responsible for building `AnslysisRegistry`. We still
bind `AnalysisRegistry` but we only do so in `Node`. This is means it
is available at module construction time so we slowly remove the need to
bind it in guice.
Related to #18945 and to this 35d3bdab84 (commitcomment-17914150)
In GCS Repository plugin we defined a `service_account` setting which is defined as `Property.Filtered`.
It's not needed as it's only a path to a file.
Closes#18946
* master: (416 commits)
docs: removed obsolete information, percolator queries are not longer loaded into jvm heap memory.
Upgrade JNA to 4.2.2 and remove optionality
[TEST] Increase timeouts for Rest test client (#19042)
Update migrate_5_0.asciidoc
Add ThreadLeakLingering option to Rest client tests
Add a MultiTermAwareComponent marker interface to analysis factories. #19028
Attempt at fixing IndexStatsIT.testFilterCacheStats.
Fix docs build.
Move templates out of the Search API, into lang-mustache module
revert - Inline reroute with process of node join/master election (#18938)
Build valid slices in SearchSourceBuilderTests
Docs: Convert aggs/misc to CONSOLE
Docs: migration notes for _timestamp and _ttl
Group client projects under :client
[TEST] Add client-test module and make client tests use randomized runner directly
Move upgrade test to upgrade from version 2.3.3
Tasks: Add completed to the mapping
Fail to start if plugin tries broken onModule
Remove duplicated read byte array methods
Rename `fields` to `stored_fields` and add `docvalue_fields`
...
This is the same as what Lucene does for its analysis factories, and we hawe
tests that make sure that the elasticsearch factories are in sync with
Lucene's. This is a first step to move forward on #9978 and #18064.
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
This changes adds a MapperPlugin interface which allows pull style
retrieval of mappers and metadata mappers added by plugins. For now, I
have kept the MapperRegistry, but this should be removed in the future
as it is just a silly container for 2 maps which could themselves be
passed around.
We pretended to be able to ackt like a different version node for so long it's
time to be honest and remove this ability. It's just confusing and where needed
and tested we should build dedicated extension points.
This change removes some unnecessary dependencies from ClusterService
and cleans up ClusterName creation. ClusterService is now not created
by guice anymore.
Today we have a push model for registering basically anything. All our extension points
are defined on modules which we pass in to plugins. This is harder to maintain and adds
unnecessary dependencies on the modules itself. This change moves towards a pull model
where the plugin offers a getter kind of method to get the extensions. This will also
help in the future if we need to pass dependencies to the extension points which can
easily be defined on the method as arguments if a pull model is used.
Registering a script engine or native scripts still uses Guice today
and is much more complicated than needed. This change moves to a pull
based model where script plugins have to implement a dedicated interface
`ScriptPlugin` and defines simple getter returning instances rather than
classes.
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
The database files have been doubled in size compared to the previous files being used.
For this reason the database files are now gzip compressed, which required using
`GZIPInputStream` when loading database files.
* master: (51 commits)
Switch QueryBuilders to new MatchPhraseQueryBuilder
Added method to allow creation of new methods on-the-fly.
more cleanups
Remove cluster name from data path
Remove explicit parallel new GC flag
rehash the docvalues in DocValuesSliceQuery using BitMixer.mix instead of the naive Long.hashCode.
switch FunctionRef over to methodhandles
ingest: Move processors from core to ingest-common module.
Fix some typos (#18746)
Fix ut
convert FunctionRef/Def usage to methodhandles.
Add the ability to partition a scroll in multiple slices. API:
use painless types in FunctionRef
Update ingest-node.asciidoc
compute functional interface stuff in Definition
Use method name in bootstrap check might fork test
Make checkstyle happy (add Lookup import, line length)
Don't hide LambdaConversionException and behave like real javac compiled code when a conversion fails. This works anyways, because fallback is allowed to throw any Throwable
Pass through the lookup given by invokedynamic to the LambdaMetaFactory. Without it real lambdas won't work, as their implementations are private to script class
checkstyle have your upper L
...
Folded grok processor into ingest-common module.
The rest tests have been moved to ingest-common module as well, because these tests don't run in the rest-api-spec module but in the distribution:integ-test-zip module
and adding a test plugin there felt just wrong to me. I think this is ok. I left a tiny ingest rest test behind in that tests with an empty pipeline.
Removed messy tests, these tests were already covered in the rest tests
Added ingest test plugin in test infra so that each module testing integration with ingest doesn't need write its own plugin
Moved reindex ingest tests to qa module
Closes#18490
This commit refactors the handling of thread pool settings so that the
individual settings can be registered rather than registering the top
level group. With this refactoring, individual plugins must now register
their own settings for custom thread pools that they need, but a
dedicated API is provided for this in the thread pool module. This
commit also renames the prefix on the thread pool settings from
"threadpool" to "thread_pool". This enables a hard break on the settings
so that:
- some of the settings can be given more sensible names (e.g., the max
number of threads in a scaling thread pool is now named "max" instead
of "size")
- change the soft limit on the number of threads in the bulk and
indexing thread pools to a hard limit
- the settings names for custom plugins for thread pools can be
prefixed (e.g., "xpack.watcher.thread_pool.size")
- remove dynamic thread pool settings
Relates #18674
* master: (184 commits)
Add back pending deletes (#18698)
refactor matrix agg documentation from modules to main agg section
Implement ctx.op = "delete" on _update_by_query and _reindex
Close SearchContext if query rewrite failed
Wrap lines at 140 characters (:qa projects)
Remove log file
painless: Add support for the new Java 9 MethodHandles#arrayLength() factory (see https://bugs.openjdk.java.net/browse/JDK-8156915)
More complete exception message in settings tests
Use java from path if JAVA_HOME is not set
Fix uncaught checked exception in AzureTestUtils
[TEST] wait for yellow after setup doc tests (#18726)
Fix recovery throttling to properly handle relocating non-primary shards (#18701)
Fix merge stats rendering in RestIndicesAction (#18720)
[TEST] mute RandomAllocationDeciderTests.testRandomDecisions
Reworked docs for index-shrink API (#18705)
Improve painless compile-time exceptions
Adds UUIDs to snapshots
Add test rethrottle test case for delete-by-query
Do not start scheduled pings until transport start
Adressing review comments
...
* master: (911 commits)
[TEST] wait for yellow after setup doc tests (#18726)
Fix recovery throttling to properly handle relocating non-primary shards (#18701)
Fix merge stats rendering in RestIndicesAction (#18720)
[TEST] mute RandomAllocationDeciderTests.testRandomDecisions
Reworked docs for index-shrink API (#18705)
Improve painless compile-time exceptions
Adds UUIDs to snapshots
Add test rethrottle test case for delete-by-query
Do not start scheduled pings until transport start
Adressing review comments
Only filter intial recovery (post API) when shrinking an index (#18661)
Add tests to check that toQuery() doesn't return null
Removing handling of null lucene query where we catch this at parse time
Handle empty query bodies at parse time and remove EmptyQueryBuilder
Mute failing assertions in IndexWithShadowReplicasIT until fix
Remove allow running as root
Add upgrade-not-supported warning to alpha release notes
remove unrecognized javadoc tag from matrix aggregation module
set ValuesSourceConfig fields as private
Adding MultiValuesSource support classes and documentation to matrix stats agg module
...
This commit adds a UUID for each snapshot, in addition to the already
existing repository and snapshot name. The addition of UUIDs will enable
more robust handling of the deletion of previous snapshots and lingering
files from partially failed delete operations, on top of being able to
uniquely track each snapshot.
Closes#18228
Relates #18156
This commit clarifies the behavior that must be adhered to by any
implementors of the BlobContainer interface. This is done through
expanded Javadocs.
Closes#18157Closes#15580
This adds a low level primitive operations to shrink an existing
index into a new index with a single shard. This primitive expects
all shards of the source index to allocated on a single node. Once the target index is initializing on the shrink node it takes a snapshot of the source index shards and copies all files into the target indices data folder. An [optimization](https://issues.apache.org/jira/browse/LUCENE-7300) coming in Lucene 6.1 will also allow for optional constant time copy if hard-links are supported by the filesystem. All mappings are merged into the new indexes metadata once the snapshots have been taken on the merge node.
To shrink an existing index all shards must be moved to a single node (one instance of each shard) and the index must be read-only:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
"settings" : {
"index.routing.allocation.require._name" : "shrink_node_name",
"index.blocks.write" : true
}
}
```
once all shards are started on the shrink node. the new index can be created via:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
"settings" : {
"index.codec" : "best_compression",
"index.number_of_replicas" : 1
}
}'
```
This API will perform all needed check before the new index is created and selects the shrink node based on the allocation of the source index. This call returns immediately, to monitor shrink progress the recovery API should be used since all copy operations are reflected in the recovery API with byte copy progress etc.
The shrink operation does not modify the source index, if a shrink operation should
be canceled or if the shrink failed, the target index can simply be deleted and
all resources are released.
* changes `throttle_retries` to `use_throttle_retries`
* removes registering of all individual repository settings when the plugin starts. Not needed
* adds more comment about deprecated method in AWS SDK we need to implement though in a Delegate class within our tests
This PR changes the InternalTestCluster to support dedicated master nodes. The creation of dedicated master nodes can be controlled using a new `supportsMasterNodes` parameter to the ClusterScope annotation. If set to true (the default), dedicated master nodes will randomly be used. If set to false, no master nodes will be created and data nodes will also be allowed to become masters. If active, test runs will either have 1 or 3 masternodes
Failing the build on deprecation warnings was removed in
19b3ec88af. This commit removes the
suppressed deprecation warnings so that their use is surfaced in the
build now.
Relates #18582
* master: (158 commits)
Document the hack
Refactor property placeholder use of env. vars
Force java9 log4j hack in testing
Fix log4j buggy java version detection
Make java9 work again
Don't mkdir directly in deb init script
Fix env. var placeholder test so it's reproducible
Remove ScriptMode class in favor of boolean true/false
[rest api spec] fix doc urls
Netty request/response tracer should wait for send
Filter client/server VM options from jvm.options
[rest api spec] fix url for reindex api docs
Remove use of a Fields class in snapshot responses that contains x-content keys, in favor of declaring/using the keys directly.
Limit retries of failed allocations per index (#18467)
Proxy box method to use valueOf.
Use the build-in valueOf method instead of the custom one.
Fixed tests and added a comment to the box method.
Fix boxing.
Do not decode path when sending error
Fix race condition in snapshot initialization
...
This change makes ES compile with java9 again, build 118.
* There are a handful of changes due to failure to determine types during compile.
* The attachment plugins which use tika needed to have tika upgraded in order to pickup fixes there for java 9.
* azure discovery and s3 repository indirectly depend on jaxb, which is no longer in the default modules. They now add a jaxb dependency externally, and make JarHell allow for this package.
* ESBlobStore tests must move to the test framework if we want to be able to reuse them in the context of plugins.
* To be able to identify more easily what are Integration Tests vs Unit Tests, this commit renames `*AzureTestCase` to `*AzureIntegTestCase`.
* Move some debug level logs to trace level
* Collapse when possible identical catch blocks
* `blobNameFromUri()` does not need anymore to get the container name. We just split the URI after 3 `/` and simply get the remaining part.
* Added a Unit test for that
* As we renamed some existing classes, checkstyle is now complaining about the lines width.
* While we are at it, let's replace all calls to `execute().actionGet()` with `get()`
* Move `readSettingsFromFile()` in a Util class. Note that this class might be useful for other plugins (S3/EC2/Azure-discovery for instance) so may be we should move it to the test framework?
* Replace some part of the code with lambdas
I initially wrongly put this setting under `cloud.aws.s3.` prefix which does not make sense. It should be placed at the same place as `max_retries`.
Also applied @tlrx comments. We should set this even if max_retries is not set (when using default values).
Also added some documentation about this setting.
This commit fixes the inequality symbol used in a test assertion in
RepositoryS3SettingsTests#testInvalidChunkBufferSizeRepositorySettings. The
inequality symbol was previously backwards but fixed in commit
cad0608cdb but fixing the inequality
symbol here was missed in that commit.
Closes#18449
Probably when we updated Azure SDK, we introduced a regression.
Actually, we are not able to remove files anymore.
For example, if you register a new azure repository, the snapshot service tries to create a temp file and then remove it.
Removing does not work and you can see in logs:
```
[2016-05-18 11:03:24,914][WARN ][org.elasticsearch.cloud.azure.blobstore] [azure] can not remove [tests-ilmRPJ8URU-sh18yj38O6g/] in container {elasticsearch-snapshots}: The specified blob does not exist.
```
This fix deals with that. It now list all the files in a flatten mode, remove in the full URL the server and the container name.
As an example, when you are removing a blob which full name is `https://dpi24329.blob.core.windows.net/elasticsearch-snapshots/bar/test` you need to actually call Azure SDK with `bar/test` as the path, `elasticsearch-snapshots` is the container.
To run the test, you need to pass some parameters: `-Dtests.thirdparty=true -Dtests.config=/path/to/elasticsearch.yml`
Where `elasticsearch.yml` contains something like:
```
cloud.azure.storage.default.account: account
cloud.azure.storage.default.key: key
```
Related to #16472Closes#18436.
This change does the following:
- Queries that are currently unsupported such as prefix queries on numeric
fields or term queries on geo fields now throw an error rather than returning
a query that does not match anything.
- Fuzzy queries on numeric, date and ip fields are now unsupported: they used
to create range queries, we now expect users to use range queries directly.
Fuzzy, regexp and prefix queries are now only supported on text/keyword
fields (including `_all`).
- The `_uid` and `_id` fields do not support prefix or range queries anymore as
it would prevent us to store them more efficiently in the future, eg. by
using a binary encoding.
Note that it is still possible to ignore these errors by using the `lenient`
option of the `match` or `query_string` queries.
* master: (904 commits)
Removes unused methods in the o/e/common/Strings class.
Add note regarding thread stack size on Windows
painless: restore accidentally removed test
Documented fuzzy_transpositions in match query
Add not-null precondition check in BulkRequest
Build: Make run task you full zip distribution
Build: More pom generation improvements
Add test for wrong array index
Take return type from "after" field.
painless: build descriptor of array and field load/store in code; fix array index to adapt type not DEF
Build: Add developer info to generated pom files
painless: improve exception stacktraces
painless: Rename the dynamic call site factory to DefBootstrap and make the inner class very short (PIC = Polymorphic Inline Cache)
Remove dead code.
Avoid race while retiring executors
Allow only a single extension for a scripting engine
Adding REST tests to ensure key_as_string behavior stays consistent
[test] Set logging to 11 on reindex test
[TEST] increase logger level until we know what is going on
Don't allow `fuzziness` for `multi_match` types cross_fields, phrase and phrase_prefix
...
Previously multiple extensions could be provided, however, this can lead
to confusion with on-disk scripts (ie, "foo.js" and "foo.javascript")
having different content. Only a single extension is now supported.
The only language currently supporting multiple extensions was the
Javascript engine ("js" and "javascript"). It now only supports the
`.js` extension.
Relates to #10598
This removes all the mentions of the sandbox from the script engine
services and permissions model. This means that the following settings
are no longer supported:
```yaml
script.inline: sandbox
script.stored: sandbox
```
Instead, only a `true` or `false` value can be specified.
Since this would otherwise break the default-allow parameter for
languages like expressions, painless, and mustache, all script engines
have been updated to have individual settings, for instance:
```yaml
script.engine.groovy.inline: true
```
Would enable all inline scripts for groovy. (they can still be
overridden on a per-operation basis).
Expressions, Painless, and Mustache all default to `true` for inline,
file, and stored scripts to preserve the old scripting behavior.
Resolves#17114
I am unable to set ec2 discovery tags because this setting was
accidentally omitted from the register settings list in
Ec2DiscoveryPlugin.java. I get this:
java.lang.IllegalArgumentException: unknown setting [discovery.ec2.tag.project]
This removes dead/duplicate code and makes the `_index` field not configurable.
(Configuration used to jus be ignored, now we would throw an exception if any
is provided.)
Most of the current implementations of BaseNodesResponse (plural Nodes) ignore FailedNodeExceptions.
- This adds a helper function to do the grouping to TransportNodesAction
- Requires a non-null array of FailedNodeExceptions within the BaseNodesResponse constructor
- Reads/writes the array to output
- Also adds StreamInput and StreamOutput methods for generically reading and writing arrays
QueryBuilder has generics, but those are never used: all call sites use
`QueryBuilder<?>`. Only `AbstractQueryBuilder` needs generics so that the base
class can contain a default implementation for setters that returns `this`.
This commit introduces a handshake when initiating a light
connection. During this handshake, node information, cluster name, and
version are received from the target node of the connection. This
information can be used to immediately validate that the target node is
a member of the same cluster, and used to set the version on the
stream. This will allow us to extend APIs that are used during initial
cluster recovery without a major version change.
Relates #15971
This commit removes the method Strings#splitStringToArray and replaces
the call sites with invocations to String#split. There are only two
explanations for the existence of this method. The first is that
String#split is slightly tricky in that it accepts a regular expression
rather than a character to split on. This means that if s is a string,
s.split(".") does not split on the character '.', but rather splits on
the regular expression '.' which splits on every character (of course,
this is easily fixed by invoking s.split("\\.") instead). The second
possible explanation is that (again) String#split accepts a regular
expression. This means that there could be a performance concern
compared to just splitting on a single character. However, it turns out
that String#split has a fast path for the case of splitting on a single
character and microbenchmarks show that String#split has 1.5x--2x the
throughput of Strings#splitStringToArray. There is a slight behavior
difference between Strings#splitStringToArray and String#split: namely,
the former would return an empty array in cases when the input string
was null or empty but String#split will just NPE at the call site on
null and return a one-element array containing the empty string when the
input string is empty. There was only one place relying on this behavior
and the call site has been modified accordingly.
For now we support `_gce_` only if discovery is set to `gce` and all information about GCE is provided (project_id and zone).
But in some cases, people would like to only bind to `_gce_` on a single node (without any elasticsearch cluster).
They could access the machine then from other machines running inside the same project.
This commit adds a new GceMetadataService which is started as soon as the plugin is started so GceNameResolver can use it to resolve `_gce`.
Closes#15724.
When working on #18008 I found while reading the code that we don't filter anymore `repositories.s3.access_key` and `repositories.s3.secret_key`.
Also fixed a typo in REST test
Lucene allows to create a ICUTokenizer with a special config argument
enabling the customization of the rule based iterator by providing
custom rules files.
This commit enable this feature. Users could provide a list of RBBI rule
files to ICU tokenizer.
closes#13146
Now that the current uses of magical camelCase support have been
deprecated, we can remove these in master (sans remaining issues like
BulkRequest). This change removes camel case support from ParseField,
query types, analysis, and settings lookup.
see #8988
* `rename` processor, renamed `to` to `target_field`
* `date` processor, renamed `match_field` to `field` and renamed `match_formats` to `formats`
* `geoip` processor, renamed `source_field` to `field` and renamed `fields` to `properties`
* `attachment` processor, renamed `source_field` to `field` and renamed `fields` to `properties`
Closes#17835
Defaults to `true`.
If anyone is having trouble with this option, you could disable it with `cloud.aws.s3.throttle_retries: false` in `elasticsearch.yml` file.
* Moving from JSON.org to Jackson for request marshallers.
* The Java SDK now supports retry throttling to limit the rate of retries during periods of reduced availability. This throttling behavior can be enabled via ClientConfiguration or via the system property "-Dcom.amazonaws.sdk.enableThrottledRetry".
* Fixed String case conversion issues when running with non English locales.
* AWS SDK for Java introduces a new dynamic endpoint system that can compute endpoints for services in new regions.
* Introducing a new AWS region, ap-northeast-2.
* Added a new metric, HttpSocketReadTime, that records socket read latency. You can enable this metric by adding enableHttpSocketReadMetric to the system property com.amazonaws.sdk.enableDefaultMetrics. For more information, see [Enabling Metrics with the AWS SDK for Java](https://java.awsblog.com/post/Tx3C0RV4NRRBKTG/Enabling-Metrics-with-the-AWS-SDK-for-Java).
* New Client Execution timeout feature to set a limit spent across retries, backoffs, ummarshalling, etc. This new timeout can be specified at the client level or per request.
Also included in this release is the ability to specify the existing HTTP Request timeout per request rather than just per client.
* Added support for RequesterPays for all operations.
* Ignore the 'Connection' header when generating S3 responses.
* Allow users to generate an AmazonS3URI from a string without using URL encoding.
* Fixed issue that prevented creating buckets when using a client configured for the s3-external-1 endpoint.
* Amazon S3 bucket lifecycle configuration supports two new features: the removal of expired object delete markers and an action to abort incomplete multipart uploads.
* Allow TransferManagerConfiguration to accept integer values for multipart upload threshold.
* Copy the list of ETags before sorting https://github.com/aws/aws-sdk-java/pull/589.
* Option to disable chunked encoding https://github.com/aws/aws-sdk-java/pull/586.
* Adding retry on InternalErrors in CompleteMultipartUpload operation. https://github.com/aws/aws-sdk-java/issues/538
* Deprecated two APIs : AmazonS3#changeObjectStorageClass and AmazonS3#setObjectRedirectLocation.
* Added support for the aws-exec-read canned ACL. Owner gets FULL_CONTROL. Amazon EC2 gets READ access to GET an Amazon Machine Image (AMI) bundle from Amazon S3.
* Added support for referencing security groups in peered Virtual Private Clouds (VPCs). For more information see the service announcement at https://aws.amazon.com/about-aws/whats-new/2016/03/announcing-support-for-security-group-references-in-a-peered-vpc/ .
* Fixed a bug in AWS SDK for Java - Amazon EC2 module that returns NPE for dry run requests.
* Regenerated client with new implementation of code generator.
* This feature enables support for DNS resolution of public hostnames to private IP addresses when queried over ClassicLink. Additionally, you can now access private hosted zones associated with your VPC from a linked EC2-Classic instance. ClassicLink DNS support makes it easier for EC2-Classic instances to communicate with VPC resources using public DNS hostnames.
* You can now use Network Address Translation (NAT) Gateway, a highly available AWS managed service that makes it easy to connect to the Internet from instances within a private subnet in an AWS Virtual Private Cloud (VPC). Previously, you needed to launch a NAT instance to enable NAT for instances in a private subnet. Amazon VPC NAT Gateway is available in the US East (N. Virginia), US West (Oregon), US West (N. California), EU (Ireland), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Asia Pacific (Sydney) regions. To learn more about Amazon VPC NAT, see [New - Managed NAT (Network Address Translation) Gateway for AWS](https://aws.amazon.com/blogs/aws/new-managed-nat-network-address-translation-gateway-for-aws/)
* A default read timeout is now applied when querying data from EC2 metadata service.
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.
Notes about how the change works:
- Numeric mappers have been split into a legacy version that is essentially
the current mapper, and a new version that uses points, eg.
LegacyDateFieldMapper and DateFieldMapper.
- Since new and old fields have the same names, the decision about which one
to use is made based on the index creation version.
- If you try to force using a legacy field on a new index or a field that uses
points on an old index, you will get an exception.
- IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
and sortable).
- The internal MappedFieldType that is stored by the new mappers does not have
any of the points-related properties set. Instead, it keeps setting the index
options when parsing the `index` property of mappings and does
`if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
when parsing documents.
Known issues that won't fix:
- You can't use numeric fields in significant terms aggregations anymore since
this requires document frequencies, which points do not record.
- Term queries on numeric fields will now return constant scores instead of
giving better scores to the rare values.
Known issues that we could work around (in follow-up PRs, this one is too large
already):
- Range queries on `ip` addresses only work if both the lower and upper bounds
are inclusive (exclusive bounds are not exposed in Lucene). We could either
decide to implement it, or drop range support entirely and tell users to
query subnets using the CIDR notation instead.
- Since IP addresses now use a different representation for doc values,
aggregations will fail when running a terms aggregation on an ip field on a
list of indices that contains both pre-5.0 and 5.0 indices.
- The ip range aggregation does not work on the new ip field. We need to either
implement range aggs for SORTED_SET doc values or drop support for ip ranges
and tell users to use filters instead. #17700Closes#16751Closes#17007Closes#11513
When it comes to query parsing, either a field is tokenized and it would go
through analysis with its search_analyzer. Or it is not tokenized and the
raw string should be passed to termQuery(). Since numeric fields are not
tokenized and also declare a search analyzer, values would currently go through
analysis twice...
This commit removes `MappedFieldType.value` and simplifies
`MappedFieldType.valueforSearch`. `valueforSearch` was used to post-process
values that come for stored fields (eg. to convert a long back to a string
representation of a date in the case of a date field) and also values that
are extracted from the source but only in the case of GET calls: it would
not be called when performing source filtering on search requests.
`valueforSearch` is now only called for stored fields, since values that are
extracted from the source should already be formatted as expected.
* upgrades numerics to new Point format
* updates geo api changes
* adds GeoPointDistanceRangeQuery as XGeoPointDistanceRangeQuery
* cuts over to ES GeoHashUtils
CBOR is natively supported in Elasticsearch and allows for byte arrays.
This means, that by using CBOR the user can prevent base64 conversions
for the data being sent back and forth.
This PR adds support to extract data from a byte array in addition to
a string. This also required to add a ByteArrayValueSource class.
We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
This PR just adds a new test where we check that we forcing a value in the JSON document actually works as expected:
```json
{
"file": {
"_content": "BASE64"
"_name": "12-240.pdf",
"_language": "en",
"_content_type": "pdf"
}
}
```
Note that we don't support forcing all values. So sending:
```json
{
"file": {
"_content": "BASE64"
"_name": "12-240.pdf",
"_title": "12-240.pdf",
"_keywords": "Div42 Src580 LGE Mechtech",
"_language": "en",
"_content_type": "pdf"
}
}
```
Will have absolutely no effect on fields `title` and `keywords`.
Note that when `_language` is set, it only works if `index.mapping.attachment.detect_language` is set to `true`.
Related to https://discuss.elastic.co/t/mapper-attachments/46615/4
This removes the inconsistent output of IP addresses. The format was parsing-unfriendly and it makes it hard
to reason about API responses, such as to _nodes.
With this change in place, it will never print the hostname as part of the default format, which has the
added benefit that it can be used consistently for URIs, which was not the case when the hostname might
appear at the front with "hostname/ip:port".
`text` fields will have fielddata disabled by default. Fielddata can still be
enabled on an existing index by setting `fielddata=true` in the mappings.
Node roles are now serialized as well, they are not part of the node attributes anymore. DiscoveryNodeService takes care of dividing settings into attributes and roles. DiscoveryNode always requires to pass in attributes and roles separately.
This change moves placeholder replacement to a pkg private class for
settings. It also adds a null check when calling replacement, as
settings objects can still contain null values, because we only prohibit
nulls on file loading. Finally, this cleans up file and stream loading a
bit to not have unnecessary exception wrapping.
We can be better at checking `buffer_size` and `chunk_size` for S3 repositories.
For example, we know that:
* `buffer_size` should be more than `5mb`
* `chunk_size` should be no more than `5tb`
* `buffer_size` should be lower than `chunk_size`
Otherwise, setting `buffer_size` is useless.
For the record:
`chunk_size` is a Snapshot setting whatever the implementation is.
`buffer_size` is an S3 implementation setting.
Let say that you are snapshotting a 500mb file. If you set `chunk_size` to `200mb`, then Snapshot service will call S3 repository to snapshot 3 files with the following sizes:
* `200mb`
* `200mb`
* `100mb`
If you set `buffer_size` to `100mb` (AWS maximum size recommendation), the first file of `200mb` will be uploaded on S3 using the multipart feature in 2 chunks and the workflow is basically the following:
* create the multipart request and get back an `id` from AWS S3 platform
* upload part1: `100mb`
* upload part2: `100mb`
* "commit" the full upload using the `id`.
Closes#17244.
Today we allow to set all kinds of index level settings on the node level which
is error prone and difficult to get right in a consistent manner.
For instance if some analyzers are setup in a yaml config file some nodes might
not have these analyzers and then index creation fails.
Nevertheless, this change allows some selected settings to be specified on a node level
for instance:
* `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index
* `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses
All other index level setting must be specified on the index level. For existing clusters the index must be closed
and all settings must be updated via the API on each of the indices.
Closes#16799
The build currently uses the old maven support in gradle. This commit
switches to use the newer maven-publish plugin. This will allow future
changes, for example, easily publishing to artifactory.
An additional part of this change makes publishing of build-tools part
of the normal publishing, instead of requiring a separate upload step
from within buildSrc. That also sets us up for a follow up to enable
precomit checks on the buildSrc code itself.
Add infrastructure to run REST tests on a multi-version cluster
This change adds the infrastructure to run the rest tests on a multi-node
cluster that users 2 different minor versions of elasticsearch. It doesn't implement
any dedicated BWC tests but rather leverages the existing REST tests.
Since we don't have a real version to test against, the tests uses the current version
until the first minor / RC is released to ensure the infrastructure works.
Given the amount of problems this change already found I think it's worth having this run with our test suite by default. The structure of this infra will likely change over time but for now it's a step into the right direction. We will likely want to split it up into integTests and integBwcTests etc. so each plugin can have it's own bwc tests but that's left for future refactoring.
Today, certain bootstrap properties are set and read via system
properties. This action-at-distance way of managing these properties is
rather confusing, and completely unnecessary. But another problem exists
with setting these as system properties. Namely, these system properties
are interpreted as Elasticsearch settings, not all of which are
registered. This leads to Elasticsearch failing to startup if any of
these special properties are set. Instead, these properties should be
kept as local as possible, and passed around as method parameters where
needed. This eliminates the action-at-distance way of handling these
properties, and eliminates the need to register these non-setting
properties. This commit does exactly that.
Additionally, today we use the "-D" command line flag to set the
properties, but this is confusing because "-D" is a special flag to the
JVM for setting system properties. This creates confusion because some
"-D" properties should be passed via arguments to the JVM (so via
ES_JAVA_OPTS), and some should be passed as arguments to
Elasticsearch. This commit changes the "-D" flag for Elasticsearch
settings to "-E".
Add REST test for:
* `.doc`
* `.docx`
The later fails with:
```
==> Test Info: seed=DB93397128B876D4; jvm=1; suite=1
Suite: org.elasticsearch.ingest.attachment.IngestAttachmentRestIT
2> REPRODUCE WITH: gradle :plugins:ingest-attachment:integTest -Dtests.seed=DB93397128B876D4 -Dtests.class=org.elasticsearch.ingest.attachment.IngestAttachmentRestIT -Dtests.method="test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file}" -Des.logger.level=WARN -Dtests.security.manager=true -Dtests.locale=bg -Dtests.timezone=Europe/Athens
FAILURE 4.53s | IngestAttachmentRestIT.test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file} <<< FAILURES!
> Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [index] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parse_exception","reason":"Error parsing document in field [field1]"}],"type":"parse_exception","reason":"Error parsing document in field [field1]","caused_by":{"type":"tika_exception","reason":"Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@7f85baa5","caused_by":{"type":"illegal_state_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")","caused_by":{"type":"access_control_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")"}}}},"status":400}]
> at __randomizedtesting.SeedInfo.seed([DB93397128B876D4:53C706AB86441B2C]:0)
> at org.elasticsearch.test.rest.section.DoSection.execute(DoSection.java:107)
> at org.elasticsearch.test.rest.ESRestTestCase.test(ESRestTestCase.java:395)
> at java.lang.Thread.run(Thread.java:745)
```
Related to #16864
Internally the put pipeline API uses this information in node info API to validate if all specified processors in a pipeline exist on all nodes in the cluster.
Closes#16964
Squashed commit of the following:
commit a23f9d2d29220991aa498214530753d7a5a148c6
Merge: eec9c4e 0b0a251
Author: Robert Muir <rmuir@apache.org>
Date: Mon Mar 7 04:12:02 2016 -0500
Merge branch 'master' into lucene6
commit eec9c4e5cd11e9c3e0b426f04894bb2a6dae4f21
Merge: bc67205 675d940
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 13:45:00 2016 -0500
Merge branch 'master' into lucene6
commit bc67205bdfe1526eae277ab7856fc050ecbdb7b2
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 09:56:31 2016 -0500
fix test bug
commit a60723b007ff12d97b1810cef473bd7b553a0327
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:35:35 2016 +0100
Fix SimpleValidateQueryIT to put braces around boosted terms
commit ae3a49d7ba7ced448d2a5262e5d8ec98671a9090
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:27:25 2016 +0100
fix multimatchquery
commit ae23fdb88a8f6d3fb7ba60fd1aaf3fd72d899aa5
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:20:49 2016 +0100
Rewrite DecayFunctionScoreIT to be independent of the similarity used
This test relied a lot on the term scoring and compared scores
that are dependent on the similarity. This commit changes the base query
to be a predictable constant score query.
commit 366c2d518c35d31251033f1b6f6a93f6e2ae327d
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 14:06:14 2016 +0100
Fix scoring in tests due to changes to idf calculation.
Lucene 6 uses a different default similarity as well as a different
way to calculate IDF. In contrast to older version lucene 6 uses docCount per field
to calculate the IDF not the # of docs in the index to overcome the sparse field
cases.
commit dac99fd64ac2fa71b8d8d106fe68825e574c49f8
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:21:57 2016 -0500
don't hardcoded expected termquery score
commit 6e9f340ba49ab10eed512df86d52a121aa775b0f
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:04:45 2016 -0500
suppress deprecation warning until migrated to points
commit 3ac8908424b3fdad44a90a4f7bdb3eff7efd077d
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:21:43 2016 -0500
Remove invalid test: all commits have IDs, and its illegal to do this.
commit c12976288124ad1a26467e7e848fb810548e7eab
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:06:14 2016 -0500
don't test with unsupported back compat
commit 18bbfe76128570bc70883bf91ff4c44c82d27817
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:02:18 2016 -0500
remove now invalid lucene 4 backcompat test
commit 7e730e572886f0ef2d3faba712e4256216ff01ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:58:52 2016 -0500
remove now invalid lucene 4 backwards test
commit 244d2ab6868ba5ac9e0bcde3c2833743751a25ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:47:23 2016 -0500
use 6.0 codec
commit 5f64d4a431a6fdaa1234adca23f154c2a1de8284
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:43:08 2016 -0500
compile, javadocs, forbidden-apis, etc
commit 1f273cd62a7fe9ca8f8944acbbfc5cbdd3d81ccb
Merge: cd33921 29e3443
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 10:45:29 2016 +0100
Merge branch 'master' into lucene6
commit cd33921ac742ef9fb351012eff35f3c7dbda7264
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:58:37 2016 -0500
fix hunspell dictionary loading
commit c7fdbd837b01f7defe9cb1c24e2ec65604b0dc96
Merge: 4d4190f d8948ba
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:41:53 2016 -0500
Merge branch 'master' into lucene6
commit 4d4190fd82601aaafac6b8254ccb3edf218faa34
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:39:14 2016 -0500
remove nocommit
commit 77ca69e288b1a41aa9595c921ed166c272a00ea8
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:38:24 2016 -0500
clean up numericutils vs legacynumericutils
commit a466d696fbaad04b647ffbc0857a9439b583d0bf
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:32:43 2016 -0500
upgrade spatial4j
commit 5412c747a8cfe638bacedbc8233163cb75cc3dc5
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:19:28 2016 -0500
move to 6.0.0-snapshot-8eada27
commit b32bfe924626b87e540692375ece09e7c2edb189
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:30:09 2016 +0100
Fix some test compile errors.
commit 6ccde35e9840b03c68d1a2cd47c7923a06edf64a
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:25:51 2016 +0100
Current Lucene version is 6.0.0.
commit f62e1015d931b4cc04c778298a8fa1ba65e97ad9
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:20:48 2016 +0100
Fix compile errors in NGramTokenFilterFactory.
commit 6837c6eabf96075f743649da9b9b52dd39611c58
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:50:59 2016 +0100
Fix the edge ngram tokenizer/filter.
commit ccd7f070de5efcdfbeb34b9555c65c4990bf1ba6
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:42:44 2016 +0100
The missing value is now accessible through a getter.
commit bd3b77f9b28e5b05daa3d49683a9922a6baf2963
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:41:51 2016 +0100
Remove IndexCacheableQuery.
commit 05f3091c347aeae80eeb16349ac51d2b53cf86f7
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:39:43 2016 +0100
Fix compilation of function_score queries.
commit 81cda79a2431ac78f56b0cc5a5765387f662d801
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:35:02 2016 +0100
Fix compile errors in BlendedTermQuery.
commit 70994ce8dd1eca0b995870974a38e20f26f96a7b
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 23:33:03 2016 -0500
add bug ID
commit 29d4f1a71f36f646b5a6060bed3db019564a279d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 21:02:32 2016 -0500
easy .store changes
commit 5e1a1e6fd665fa455e88d3a8987362fad5f44bb1
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:47:24 2016 -0500
cleanups mostly around boosting
commit 333a669ec6c305ada5645d13ed1da0e19ec1d053
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:27:56 2016 -0500
more simple fixes
commit bd5cd98a1e089c866b6b4a5e159400b110140ce6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:49:38 2016 -0500
more easy fixes and removal of ancient cruft
commit a68f419ee47da5f9c9ce5b372f01d707e902474c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:35:02 2016 -0500
cutover numerics
commit 4ca5dc1fa47dd5892db00899032133318fff3116
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:34:18 2016 -0500
fix some constants
commit 88710a17817086e477c6c021ec346d0534b7fb88
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:14:25 2016 -0500
Add spatial-extras jar as a core dependency
commit c8cd6726583e5ce3f546ed355d4eca037164a30d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:03:33 2016 -0500
update to lucene 6 jars
Elasticsearch 5.0 doesn't support indices wiht legacy checksums anymore.
The last time we write legacy checksums was in 1.3.0 which was based
on lucene 4.9 already which means that all files have CRC32 checksums.
All indices that Elasticsearch can read today must be written with
lucene version >= 4.8 anyway so we can drop this layer of backwards
compatibility entirely.
Since we are close to upgrading to Lucene 6.0 we should get rid of this
in a more contiained change than the lucene upgrade.
This commit removes the ability to use string fields on indices created on or
after 5.0. Dynamic mappings now generate text fields by default for strings
but there are plans to also add a sub keyword field (in a future PR).
Most of the changes in this commit are just about replacing string with
keyword or text. Some tests have been removed because they existed because of
corner cases of string mappings like setting ignore-above on a text field or
enabling term vectors on a keyword field which are now impossible.
The plan is to remove strings entirely in 6.0.
Add setFactory permission to GceDiscoveryPlugin
This commit adds a missing permission and a simple test that
ensures we discover other nodes via a mock http endpoint.
Closes#16485
The big win here is catching tests that are incorrectly named and will
be skipped by gradle, providing a false sense of security.
The whole thing takes about 10 seconds on my Macbook Air, not counting
compiling the test classes, which seems worth it. Because this runs as
a gradle task with propery UP-TO-DATE handling it can be skipped if the
tests haven't been changed which should save some time.
I chose to keep this in test:framework rather than a new subproject of
buildSrc because ESIntegTestCase and doesn't inroduce any additional
dependencies.
DiscoveryService was a bridge into the discovery universe. This is unneeded and we can just access discovery directly or do things in a different way.
One of those different ways, is not having a dedicated discovery implementation for each our dicovery plugins but rather reuse ZenDiscovery.
UnicastHostProviders are now classified by discovery type, removing unneeded checks on plugins.
Closes#16821
We are using `2.0.0` today but Azure team now recommends:
```xml
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>4.0.0</version>
</dependency>
```
This new version fix the timeout issues we have seen with azure storage although #15080 adds a timeout support.
Azure storage client 2.0.0 was not passing correctly this value when it was calling Azure services.
Note that the timeout is a server side timeout and not client side timeout.
It means that it will raise only a timeout when:
* upload of blob is complete
* if azure service is not able to process the blob (and store it) within a given time range.
In which case it will raise an exception which elasticsearch can deal with:
```
java.io.IOException
at __randomizedtesting.SeedInfo.seed([91BC11AEF16E073F:6886FA5308FCE4D8]:0)
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:643)
at com.microsoft.azure.storage.blob.BlobOutputStream.writeBlock(BlobOutputStream.java:444)
at com.microsoft.azure.storage.blob.BlobOutputStream.access$000(BlobOutputStream.java:53)
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:388)
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:385)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.microsoft.azure.storage.StorageException: Operation could not be completed within the specified time.
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:305)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:175)
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1006)
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:978)
at com.microsoft.azure.storage.blob.BlobOutputStream.writeBlock(BlobOutputStream.java:438)
... 9 more
```
The following code was used to test this against Azure platform:
```java
public void testDumb() throws URISyntaxException, StorageException, IOException, InvalidKeyException {
String connectionString = "MY-AZURE-STRING";
CloudStorageAccount storageAccount = CloudStorageAccount.parse(connectionString);
CloudBlobClient client = storageAccount.createCloudBlobClient();
client.getDefaultRequestOptions().setTimeoutIntervalInMs(1000);
CloudBlobContainer container = client.getContainerReference("dumb");
container.createIfNotExists();
CloudBlockBlob blob = container.getBlockBlobReference("blob");
File sourceFile = File.createTempFile("sourceFile", ".tmp");
try {
int fileSize = 10000000;
byte[] buffer = new byte[fileSize];
Random random = new Random();
random.nextBytes(buffer);
logger.info("Generate local file");
FileOutputStream fos = new FileOutputStream(sourceFile);
fos.write(buffer);
fos.close();
logger.info("End generate local file");
FileInputStream fis = new FileInputStream(sourceFile);
logger.info("Start uploading");
blob.upload(fis, fileSize);
logger.info("End uploading");
}
finally {
if (sourceFile.exists()) {
sourceFile.delete();
}
}
}
```
With 2.0.0, the above code was not raising any exception. With 4.0.0, the exception is now thrown correctly.
The default timeout is 5 minutes. See https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/core/Utility.java#L352-L375Closes#12567.
Release notes from 2.0.0:
* Removed deprecated table AtomPub support.
* Removed deprecated constructors which take service clients in favor of constructors which take credentials.
* Added support for "Add" permissions on Blob SAS.
* Added support for "Create" permissions on Blob and File SAS.
* Added support for IP Restricted SAS and Protocol SAS.
* Added support for Account SAS to all services.
* Added support for Minute and Hour Metrics to FileServiceProperties and added support for File Metrics to CloudAnalyticsClient.
* Removed deprecated startCopyFromBlob() on CloudBlob. Use startCopy() instead.
* Removed deprecated Credentials and StorageKey classes. Please use the appropriate methods on StorageCredentialsAccountAndKey instead.
* Fixed a bug in table where a select on a non-existent field resulted in a null reference exception if the corresponding field in the TableEntity was not nullable.
* Fixed a bug in table where JsonParser was automatically closing the response stream before it was completely drained causing socket exhaustion.
* Fixed a bug in StorageCredentialsAccountAndKey.updateKey(String) which prevented valid keys from being set.
* Added CloudBlobContainer.listBlobs(final String, final boolean) method.
* Fixed a bug in blob where using AccessConditions on block blob uploads larger than 64MB done with the upload* methods or block blob uploads done openOutputStream with would fail if the blob did not already exist.
* Added support for setting a proxy per request. Proxy can be set on an OperationContext instance and will be used when that instance is passed to the request method.
* Added support for SAS to the Azure File service.
* Added support for Append Blob.
* Added support for Access Control Lists (ACL) to File Shares.
* Added support for getting and setting of CORS rules to File service.
* Added support for ShareStats to File Shares.
* Added support for copying an Azure File to another Azure File or a Block Blob asynchronously, and aborting Azure File copy operations asynchronously.
* Added support for copying a Blob to an Azure File asynchronously.
* Added support for setting a maximum quota property on a File Share.
* Removed deprecated AuthenticationScheme and its getter and setter. In the future only SharedKey will be used.
* Removed deprecated getter/setters for all request option properties on the service clients. Please use the default request options getter/setters instead.
* Removed getSubDirectoryReference() for blob directories and file directories. Use getDirectoryReference() instead.
* Removed getEntityClass() in TableQuery. Please use getClazzType() instead.
* Added client-side verification for lease duration and break periods.
* Deprecated the setters in table for timestamp as this property is only modifiable by the service.
* Deprecated startCopyFromBlob() on CloudBlob. Use startCopy() instead.
* Deprecated the Credentials and StorageKey classes. Please use the appropriate methods on StorageCredentialsAccountAndKey instead.
* Deprecated constructors which take service clients in favor of constructors which take credentials.
* Fixed a bug where the DateBackwardCompatibility flag was not applied if set on the CloudTableClient default request options.
* Changed library behavior to retry all exceptions thrown when parsing a response object.
* Changed behavior to stop removing query parameters passed in with the resource URI if that URI contains a SAS token. Some query parameters such as comp, restype, snapshot and api-version will still be removed.
* Added support for logging StringToSign to SharedKey and SAS.
* **Added a connect timeout to prevent hangs when establishing the network connection.**
* **Made performance enhancements to the BlobOutputStream class.**
* Fixed a bug where maximum execution time was ignored for file, queue, and table services.
* **Changed the socket timeout to be set to the service side timeout plus 5 minutes when maximum execution time is not set.**
* **Changed the socket timeout to default to 5 minutes rather than infinite when neither service side timeout or maximum execution time are set.**
* Fixed a bug where MD5 was calculated for commitBlockList even though UseTransactionalMD5 was set to false.
* Fixed a bug where selecting fields that did not exist returned an error rather than an EntityProperty with a null value.
* Fixed a bug where table entities with a single quote in their partition or row key could be inserted but not operated on in any other way.
* Fixed a bug for all listing API's where next() would sometimes throw an exception if hasNext() had not been called even if there were more elements to iterate on.
* Added sequence number to the blob properties. This is populated for page blobs.
* Creating a page blob sets its length property.
* Added support for page blob sequence numbers and sequence number access conditions.
* Fixed a bug in abort copy where the lease access condition was not sent to the service.
* Fixed an issue in startCopyFromBlob where if the URI of the source blob contained certain non-ASCII characters they would not be encoded appropriately. This would result in Authorization failures.
* Fixed a small performance issue in XML serialization.
* Fixed a bug in BlobOutputStream and FileOutputStream where flush added data to a request pool rather than immediately committing it to the Azure service.
* Refactored to remove the blob, queue, and file package dependency on table in the error handling code.
* Added additional client-side logging for REST requests, responses, and errors.
Closes#15976.
Instead of modifying methods each time we need to add a new behavior for settings, we can simply pass `SettingsProperty... properties` instead.
`SettingsProperty` could be defined then:
```
public enum SettingsProperty {
Filtered,
Dynamic,
ClusterScope,
NodeScope,
IndexScope
// HereGoesYours;
}
```
Then in setting code, it become much more flexible.
TODO: Note that we need to validate SettingsProperty which are added to a Setting as some of them might be mutually exclusive.
Removes all our logger wrappers except the wrapper for log4j1.2. If you
depend on Elasticsearch's jar in your application you'll need to declare
log4j 1.2 and/or some bridge to your favorite logger.
We did this to simplify our builds and code. No more commons-logging like
log implementation sniffing. No more optional dependency hacks in gradle.
We might one day want to use j.u.l instead of log4j. If we do want that
we can recover its wrapper by studying this commit. We didn't go directly
to j.u.l in this commit because that is a bigger change. Our logging
configuration is based on log4j1.2 and people are used to it. So it'd
be a much more fraught breaking change to do that conversion.
Now we have a nice Setting infra, we can define in Setting class if a setting should be filtered or not.
So when we register a setting, setting filtering would be automatically done.
Instead of writing:
```java
Setting<String> KEY_SETTING = Setting.simpleString("cloud.aws.access_key", false, Setting.Scope.CLUSTER);
settingsModule.registerSetting(AwsEc2Service.KEY_SETTING, false);
settingsModule.registerSettingsFilterIfMissing(AwsEc2Service.KEY_SETTING.getKey());
```
We could simply write:
```java
Setting<String> KEY_SETTING = Setting.simpleString("cloud.aws.access_key", false, Setting.Scope.CLUSTER, true);
settingsModule.registerSettingsFilterIfMissing(AwsEc2Service.KEY_SETTING.getKey());
```
It also removes `settingsModule.registerSettingsFilterIfMissing` method.
The plan would be to remove as well `settingsModule.registerSettingsFilter` method but it still used with wildcards. For example in Azure Repository plugin:
```java
module.registerSettingsFilter(AzureStorageService.Storage.PREFIX + "*.account");
module.registerSettingsFilter(AzureStorageService.Storage.PREFIX + "*.key");
```
Closes#16598.
This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.
Fields can be selected in the processor as well.
Close#16303
Groovy uses reflection to invoke closures. These reflective calls are optimized by the JVM after "sun.reflect.inflationThreshold" number of invocations.
After inflation, access to sun.reflect.MethodAccessorImpl is required from the security manager.
Closes#16536
IndexShard currently holds an arbitraritly used `getQueryShardContext` that comes
out of a ThreadLocal. It's usage is undefined and arbitraty since there is also
such a method with different semantics on `IndexService` This commit removes the threadLocal on
IndexShard as well as on the context itself. It's types are now a member and the QueryShardContext
lifecycle is managed byt SearchContext which passes the types on from the SearchRequest.
* Minor clean up of Writer constants.
* Removed synthetic attribute from the generated constructor and method.
* Added a safeguard for maximum script length.
Closes#16457
This processor is useful when all elements of a json array need to be processed in the same way.
This avoids that a processor needs to be defined for each element in an array.
Also it is very likely that it is unknown how many elements are inside an json array.
This change rewrites the entire settings filtering mechanism to be immutable.
All filters must be registered up-front in the SettingsModule. Filters that are comma-sparated are
not allowed anymore and check on registration.
This commit also adds settings filtering to the default settings recently added to ensure we don't render
filtered settings.
Also removes ensureCanWrite since this already passes the
TRUNCATE_EXISTING flag when opening.
Adds a REST test that fails without this fix due to the classloader
isolation.
Additionally, move SmbDirectoryWrapper into an elasticsearch package instead of a lucene one, so this would be found at compile time instead of runtime.
Forward-port of #16383
Renamed `ingest-with-mustache` to `smoke-test-ingest-with-all-dependencies`
Also renamed `ingest-disabled` to `smoke-test-ingest-disabled` so that the name is more inline with other qa smoke test modules.
This commit enableds strict settings validation on node startup. All settings
passed to elasticsearch either through system properties, yaml files or any other
way to pass settings must be registered and valid. Settings that are unknown ie. due to
typos or due to deprecation or removal will cause the node to NOT start up. Plugins
have to declare all their settings on the `SettingsModule#registerSetting` and settings for
plugins that are not installed must be removed.
This commit also removes the ability to specify the nodes name via `-Des.name` or just `name` in the
configuration files. The node name must be prefixed with the node prexif like `node.name: Boom`. Left over
usage of `name` will also cause startup to fail.
Cli tools currently catch all exceptions, and only print the exception
message, except when a special system property is set. Even with this
flag set, certain exceptions, like IOException, are captured and their
stack trace is always lost.
This change adds a UserError class, which can be used a cli tools to
specify a message to the user, as well as an exit status. All other
exceptions are propagated out of main, so java will exit with non-zero
and print the stack trace.
This commit removes and forbids the use of lowercase ells ('l') in long
literals because they are often hard to distinguish from the digit
representing one ('1').
Closes#16329
When there is an exception thrown during pipeline creation within
Rest calls (in put pipeline, and simulate) We now return a structured
error response to the user with details around which processor's
configuration is the cause of the issue, or which configuration property
is misconfigured, etc.
In the early days Elasticsearch used to use the index name as the index identity. Around 1.0.0 we introduced a unique index uuid which is stored in the index setting. Since then we used that uuid in a few places but it is by far not the main identifier when working with indices, partially because it's not always readily available in all places.
This PR start to make a move in the direction of using uuids instead of name by making sure that the uuid is available on the Index class (currently just a wrapper around the name) and as such also available via ShardRouting and ShardId.
Note that this is by no means an attempt to do the right thing with the uuid in all places. In almost all places it falls back to the name based comparison that was done before. It is meant as a first step towards slowly improving the situation.
Closes#16217
This commit method renames the ScriptEngineService interface methods
types, extensions, and sandboxed to getTypes, getExtensions, and
isSandboxed, respectively.
This commit converts the script mode settings to the new settings
infrastructure. This is a major refactoring of the handling of script
mode settings. This refactoring is necessary because these settings are
determined at runtime based on the registered script engines and the
registered script contexts.
Parsing is currently very lenient, which has the bad side-effect that if you
have a typo and pass eg. `store: fasle` this will actually be interpreted as
`store: true`. Since mappings can't be changed after the fact, it is quite bad
if it happens on an index that already contains data.
Note that this does not cover all settings that accept a boolean, but since the
PR was quite hard to build and already covers some main settirgs like `store`
or `doc_values` this would already be a good incremental improvement.
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
New Features:
Exceptions (throw, try/catch)
Infinite Loop Checking
Iterators added to the API
Fixes:
Improved loop path analysis
Fixed set/add issue with shortcuts in lists
Fixed minor promotion issue with incorrect type
Fixed score issue related to score extending Number
Documentation:
Added JavaDocs for some files
One test we forgot in #14843 and #13779 is the default client selection.
Most of the time, users won't define explicitly which client they want to use because they are providing only one connection to Azure storage:
```yml
cloud:
azure:
storage:
my_account:
account: your_azure_storage_account
key: your_azure_storage_key
```
Then using the default client like this:
```sh
# This one will use the default account (my_account1)
curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{
"type": "azure"
}'
```
This commit adds tests to check that the right client is still selected when no client is explicitly set when creating the snapshot.
This commit replaces server side timeout (which is BTW not correctly implemented in azure client 2.0.0 but fixed later #16084) with a client side timeout.
As a consequence, for each request sent to azure, azure client will raise an exception after a given amount of time (timeout).
Closes#12567
The rest test framework, because it used to be tightly integrated with
ESIntegTestCase, currently expects the addresses for the test cluster to
be passed using the transport protocol port. However, it only uses this
to then find the http address.
This change makes ESRestTestCase extend from ESTestCase instead of
ESIntegTestCase, and changes the sysprop used to tests.rest.cluster,
which now takes the http address.
closes#15459
Removes all Xlint skips in lang-plan-a. Warnings were just some extra
casts which were removed, some raw types which we just needed to <?>, and
a couple of unchecked casts in data that we know to be Map<String, Object>
because that structure is super-ultra-common in scripts.
This would be useful in order to only perform some validations in the case of
a mapping update and in cases when a mapping is restored eg. after a restart,
such as discussed in #15989.
This replaces the current `applyDefault` parameter which can be derived from
the mapping merge reason: the default mapping should be applied only in case of
a mapping update, if the mapping does not exist yet and if this is not the
default mapping.
Site plugins used to be used for things like kibana and marvel, but
there is no longer a need since kibana (and marvel as a kibana plugin)
uses node.js. This change removes site plugins, as well as the flag for
jvm plugins. Now all plugins are jvm plugins.
the environment is now available through NodeModule#getNode#getEnvironment and can be retrieved during onModule(NodeModule), no need for this indirection anymore using the BiFunction
* Folded IngestModule into NodeModule
* Renamed IngestBootstrapper to IngestService
* Let NodeService construct IngestService and removed the Guice annotations
* Let IngestService implement Closable
Using a single azure account is now rejected.
This commit fixes this issue and adds a test for it.
This regression was introduced with #13779. Hopefully no elasticsearch version has been released since then.
Needs to be merged in 2.2, 2.x and master branches.
ContextAndHeaders has a massive impact on the core infrastructure since it has to
be manually passed on to all relevant places across threads/network calls etc. For the same reason
it's also very error prone and easily forgotten on potentially relevant APIs.
The new ThreadContext is associated with a ThreadPool (node or transport client) and ensures that
headers and context registered on a current thread are inherited to new threads spawned, send across
the network to be deserialized on the receiver end as well as restored on the response handling thread
once the response is received.
1. Uses forbidden patterns to prevent things from referencing
java.io.Serializable or from mentioning serialVersionUID.
2. Uses -Xlint:-serial so we don't have to hear from javac that we aren't
declaring serialVersionUID on any classes that we make that happen to extend
Serializable.
3. Remove Serializable and serialVersionUID declarations.
I didn't use forbidden apis because it doesn't look like it has a way to ban
explicitly implementing Serializable. If you try to ban Serializable with
forbidden apis you end up banning all Exceptions and all Strings.
Closes#15847
This commit removes and now forbids all uses of
java.util.concurrent.ThreadLocalRandom across the codebase. The
underlying issue with ThreadLocalRandom is that it can not be
seeded. This means that if ThreadLocalRandom is used in production code,
then tests that cover any code path containing ThreadLocalRandom will be
prevented from being reproducible by use of ThreadLocalRandom. Instead,
using org.elasticsearch.common.random.Randomness#get will give
reproducible sources of random when running under tests and otherwise
still give an instance of ThreadLocalRandom when running as production
code.
CRUD and simulate apis work now fine, every node has the pipelines in memory, but node.ingest disables ingestion, meaning that any index or bulk request with a pipeline id is going to fail
Right now we define the same sort of methods as taking String arrays and
string varargs. We should standardize on one and varargs is easier to
call so lets use varargs!
We will keep this abstractions as it's convenient, otherwise IngestDocument would depend on ScriptService directly, and would explicitly rely on mustache which is not even part of core. better to have the interface in core, and the impl as part of the ingest plugin, which relies on mustache, shipped with core by default.
* Added percolator field mapper that extracts the query terms and indexes these terms with the percolator query.
* At percolate time these extracted terms are used to query percolator queries that are like to be evaluated. This can significantly cut down the time it takes to percolate. Whereas before all percolator queries were evaluated if they matches with the document being percolated.
* Changes made to percolator queries are no longer immediately visible, a refresh needs to happen before the changes are visible.
* By default the percolate api only returns upto 10 matches instead of returning all matching percolator queries.
* Made percolate more modular, so that it is easier to add unit tests.
* Added unit tests for the percolator.
Closes#12664Closes#13646
Adds task manager class and enables all activities to register with the task manager. Currently, the immutable Transport*Activity class represents activity itself shared across all requests. This PR adds and an additional structure Task that keeps track of currently running requests and can be used to communicate with these requests using TransportTaskAction.
Related to #15117
By default, azure does not timeout. This commit adds support for a timeout settings which defaults to 5 minutes.
It's a timeout **per request** not a global timeout for a snapshot request.
It can be defined globally, per account or both. Defaults to `5m`.
```yml
cloud:
azure:
storage:
timeout: 10s
my_account1:
account: your_azure_storage_account1
key: your_azure_storage_key1
default: true
my_account2:
account: your_azure_storage_account2
key: your_azure_storage_key2
timeout: 30s
```
In this example, timeout will be 10s for `my_account1` and 30s for `my_account2`.
Closes#14277.
All those repository settings can also be defined globally in `elasticsearch.yml` file using prefix `repositories.azure.`. For example:
```yml
repositories.azure:
container: backup-container
base_path: backups
chunk_size: 32m
compress": true
```
Closes#13776.
An index template for the '.ingest' index is required because:
* We don't want arbitrary fields in pipeline documents, because that can turn into upgrade problems if we add more properties to the pipeline dsl.
* We know what are the usages are of the '.ingest' index, so we can optimize for that and prevent that this index is used for different purposes.
Closes#15001
This removes the backward compatibility layer with pre-2.0 indices, notably
the extraction of _id, _routing or _timestamp from the source document when a
path is defined.
This changes a couple of things:
Mappings are truly immutable. Before, each field mapper stored a
MappedFieldTypeReference that was shared across fields that have the same name
across types. This means that a mapping update could have the side-effect of
changing the field type in other types when updateAllTypes is true. This works
differently now: after a mapping update, a new copy of the mappings is created
in such a way that fields across different types have the same MappedFieldType.
See the new Mapper.updateFieldType API which replaces MappedFieldTypeReference.
DocumentMapper is now immutable and MapperService.merge has been refactored in
such a way that if an exception is thrown while eg. lookup structures are being
updated, then the whole mapping update will be aborted. As a consequence,
FieldTypeLookup's checkCompatibility has been folded into copyAndAddAll.
Synchronization was simplified: given that mappings are truly immutable, we
don't need the read/write lock so that no documents can be parsed while a
mapping update is being processed. Document parsing is not performed under a
lock anymore, and mapping merging uses a simple synchronized block.
This adds the required changes/checks so that the build can run on
FreeBSD.
There are a few things that differ between FreeBSD and Linux:
- CPU probes return -1 for CPU usage
- `hot_threads` cannot be supported on FreeBSD
From OpenJDK's `os_bsd.cpp`:
```c++
bool os::is_thread_cpu_time_supported() {
#ifdef __APPLE__
return true;
#else
return false;
#endif
}
```
So this API now returns (for each FreeBSD node):
```
curl -s localhost:9200/_nodes/hot_threads
::: {Devil Hunter Gabriel}{q8OJnKCcQS6EB9fygU4R4g}{127.0.0.1}{127.0.0.1:9300}
hot_threads is not supported on FreeBSD
```
- multicast fails in native `join` method - known bug:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193246
Which causes:
```
1> Caused by: java.net.SocketException: Invalid argument
1> at java.net.PlainDatagramSocketImpl.join(Native Method)
1> at java.net.AbstractPlainDatagramSocketImpl.join(AbstractPlainDatagramSocketImpl.java:179)
1> at java.net.MulticastSocket.joinGroup(MulticastSocket.java:323)
1> at org.elasticsearch.plugin.discovery.multicast.MulticastChannel$Plain.buildMulticastSocket(MulticastChannel.java:309)
```
So these tests are skipped on FreeBSD.
Resolves#15562
both processors and pipelines now have the ability to define
a separate list of processors to be executed if the original line
of execution throws an Exception.
processors without an on_failure parameter defined will throw an
exception and exit the pipeline immediately. processors with on_failure
defined will catch the exception and allow for further processors to
run. Exceptions within the on_failure block will be treated the same as
the top-level.
The append processor allows to append one or more values to an existing list; add a new list with the provided values if the field doesn't exist yet, or convert an existing scalar into a list and add the provided values to the newly created list.
This required adapting of IngestDocument#appendFieldValue behaviour, also added support for templating to it.
Closes#14324
DocumentMapperParser has both parse and parseCompressed methods. Except that the
parse methods are ONLY used from the unit tests. This commit removes the parse
method and moves all tests to parseCompressed so that they test more
realistically how mappings are managed.
Then I renamed parseCompressed to parse given that this is the only alternative
anyway.
This change adds a Fixture class for use by gradle. A Fixture is an
external process that integration tests will use. It can be added as a
dependsOn for integTest, and will automatically be shutdown upon success
or failure, as well as relevant information dumped on failure. There is
also an example fixture in this change.
* Pipeline store can now only start when there is no .ingest index or all primary shards of .ingest have been started
* IngestPlugin adds`node.ingest` setting to `true`. This is used to figure out to what nodes to send the refresh request too. This setting isn't yet configurable. This will be done in a follow up issue.
* Removed the background pipeline updater and added added logic to deal with specific scenarious to reload all pipelines.
* Ingest services are no longer be managed by Guice. Only the bootstrapper gets managed by guice and that contructs
all the services/components ingest will need.
Added ingest wide template infrastructure to IngestDocument
Added a TemplateService interface that the ingest framework uses
Added a TemplateService implementation that the ingest plugin provides that delegates to the ES' script service
Cut SetProcessor over to use the template infrastructure for the `field` and `value` settings.
Removed the MetaDataProcessor
Removed dependency on mustache library
Added qa ingest mustache rest test so that the ingest and mustache integration can be tested.
We have the Text API, which is essentially a wrapper around a String and a
BytesReference and then we have 3 implementations depending on whether the
String view should be cached, the BytesReference view should be cached, or both
should be cached.
This commit merges everything into a single Text that is essentially the old
StringAndBytesText impl.
Long term we should look into whether this API has any performance benefit or
if we could just use plain strings. This would greatly simplify all our other
APIs that currently use Text.
This fixes the `lenient` parameter to be `missingClasses`. I will remove this boolean and we can handle them via the normal whitelist.
It also adds a check for sheisty classes (jar hell with the jdk).
This is inspired by the lucene "sheisty" classes check, but it has false positives. This check is more evil, it validates every class file against the extension classloader as a resource, to see if it exists there. If so: jar hell.
This jar hell is a problem for several reasons:
1. causes insanely-hard-to-debug problems (like bugs in forbidden-apis)
2. hides problems (like internal api access)
3. the code you think is executing, is not really executing
4. security permissions are not what you think they are
5. brings in unnecessary dependencies
6. its jar hell
The more difficult problems are stuff like jython, where these classes are simply 'uberjared' directly in, so you cant just fix them by removing a bogus dependency. And there is a legit reason for them to do that, they want to support java 1.4.
When creating a metadata mapper for a new type, we reuse an existing
configuration from an existing type (if any) in order to avoid introducing
conflicts. However this field type that is provided is considered as both an
initial configuration and the default configuration. So at serialization time,
we might only serialize the difference between the current configuration and
this default configuration, which might be different to what is actually
considered the default configuration.
This does not cause bugs today because metadata mappers usually override the
toXContent method and compare the current field type with Defaults.FIELD_TYPE
instead of defaultFieldType() but I would still like to do this change to
avoid future bugs.
The `path` option allowed to index/store a field `a.b.c` under just `c` when
set to `just_name`. This "feature" has been removed in 2.0 in favor of `copy_to`
so we can remove the back compat in 3.x.
Today mappings are mutable because of two APIs:
- Mapper.merge, which expects changes to be performed in-place
- IncludeInAll, which allows to change whether values should be put in the
`_all` field in place.
This commit changes both APIs to return a modified copy instead of modifying in
place so that mappings can be immutable. For now, only the type-level object is
immutable, but in the future we can imagine making them immutable at the
index-level so that mapping updates could be completely atomic at the index
level.
Close#9365
This change adds back the http.type setting. It also cleans up all the
transport related guice code to be consolidated within the
NetworkModule (as transport and http related stuff is what and how ES
exposes over the network). The setter methods previously used by some
plugins to override eg the TransportService or HttpServerTransport are
removed, and those plugins should now register a custom implementation
of the class with a name and set that using the appropriate config
setting. Note that I think ActionModule should also be moved into here,
to sit along side the rest actions, but I left that for a followup.
closes#14148
Migrated from ES-Hadoop. Contains several improvements regarding:
* Security
Takes advantage of the pluggable security in ES 2.2 and uses that in order
to grant the necessary permissions to the Hadoop libs. It relies on a
dedicated DomainCombiner to grant permissions only when needed only to the
libraries installed in the plugin folder
Add security checks for SpecialPermission/scripting and provides out of
the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1)
* Testing
Uses a customized Local FS to perform actual integration testing of the
Hadoop stack (and thus to make sure the proper permissions and ACC blocks
are in place) however without requiring extra permissions for testing.
If needed, a MiniDFS cluster is provided (though it requires extra
permissions to bind ports)
Provides a RestIT test
* Build system
Picks the build system used in ES (still Gradle)
This commit removes and now forbids all uses of
Collections#shuffle(List) and Random#<init>() across the codebase. The
rationale for removing and forbidding these methods is to increase test
reproducibility. As these methods use non-reproducible seeds, production
code and tests that rely on these methods contribute to
non-reproducbility of tests.
Instead of Collections#shuffle(List) the method
Collections#shuffle(List, Random) can be used. All that is required then
is a reproducible source of randomness. Consequently, the utility class
Randomness has been added to assist in creating reproducible sources of
randomness.
Instead of Random#<init>(), Random#<init>(long) with a reproducible seed
or the aforementioned Randomess class can be used.
Closes#15287
IndexResponse, DeleteResponse and UpdateResponse share some logic. This can be unified to a single DocWriteResponse base class. On top, some replication actions are now not about write operations anymore. This commit renames ActionWriteResponse to ReplicationResponse
Last some toXContent is moved from the Rest layer to the actual response classes, for more code re-sharing.
Closes#15334
Unify metadata map and source, add also support for _ingest prefix. Depending on the prefix, either _source, nothing or _ingest, we will figure out which map to use for values retrieval, but also modifications.
If one is using the ingest plugin and providing a pipeline id with the request, the chance that the source is going to be modified is 99%. We shouldn't worry about keeping track of whether something changed. That seemed useful at first so we can save the resources for setting back the source (map to bytes) when not needed. Also, we are trying to unify metadata fields and source in the same map and that is going to complicate how we keep track of changes that happen in the source only. Best solution is to remove the flag.
IngestDocument now holds an additional map of transient metadata. The only field that gets added automatically is `timestamp`, which contains the timestamp of ingestion in ISO8601 format. In the future it will be possible to eventually add or modify these fields, which will not get indexed, but they will be available via templates to all of the processors.
Transient metadata will be visualized by the simulate api, although they will never get indexed. Moved WriteableIngestDocument to the simulate package as it's only used by simulate and it's now modelled for that specific usecase.
Also taken the chance to remove one IngestDocument constructor used only for testing (accepting only a subset of es metadata fields). While doing that introduced some more randomizations to some existing processor tests.
Closes#15036
throw exception if a copy_to is within a multi field
Copy to within multi field is ignored from 2.0 on, see #10802.
Instead of just ignoring it, we should throw an exception if this
is found in the mapping when a mapping is added. For already
existing indices we should at least log a warning.
We remove the copy_to in any case.
related to #14946
This commit adds the infrastructure to make settings that are updateable
resetable and changes the application of updates to be transactional. This means
setting updates are either applied or not. If the application failes all values are rejected.
This initial commit converts all dynamic cluster settings to make use of the new infrastructure.
All cluster level dynamic settings are not resettable to their defaults or to the node level settings.
The infrastructure also allows to list default values and descriptions which is not fully implemented yet.
Values can be reset using a list of key or simple regular expressions. This has only been implemented on the java
layer yet. For instance to reset all recovery settings to their defaults a user can just specify `indices.recovery.*`.
This commit also adds strict settings validation, if a setting is unknown or if a setting can not be applied the entire
settings update request will fail.
When using S3 or EC2, it was possible to use a proxy to access EC2 or S3 API but username and password were not possible to be set.
This commit adds support for this. Also, to make all that consistent, proxy settings for both plugins have been renamed:
* from `cloud.aws.proxy_host` to `cloud.aws.proxy.host`
* from `cloud.aws.ec2.proxy_host` to `cloud.aws.ec2.proxy.host`
* from `cloud.aws.s3.proxy_host` to `cloud.aws.s3.proxy.host`
* from `cloud.aws.proxy_port` to `cloud.aws.proxy.port`
* from `cloud.aws.ec2.proxy_port` to `cloud.aws.ec2.proxy.port`
* from `cloud.aws.s3.proxy_port` to `cloud.aws.s3.proxy.port`
New settings are `proxy.username` and `proxy.password`.
```yml
cloud:
aws:
protocol: https
proxy:
host: proxy1.company.com
port: 8083
username: myself
password: theBestPasswordEver!
```
You can also set different proxies for `ec2` and `s3`:
```yml
cloud:
aws:
s3:
proxy:
host: proxy1.company.com
port: 8083
username: myself1
password: theBestPasswordEver1!
ec2:
proxy:
host: proxy2.company.com
port: 8083
username: myself2
password: theBestPasswordEver2!
```
Note that `password` is filtered with `SettingsFilter`.
We also fix a potential issue in S3 repository. We were supposed to accept key/secret either set under `cloud.aws` or `cloud.aws.s3` but the actual code never implemented that.
It was:
```java
account = settings.get("cloud.aws.access_key");
key = settings.get("cloud.aws.secret_key");
```
We replaced that by:
```java
String account = settings.get(CLOUD_S3.KEY, settings.get(CLOUD_AWS.KEY));
String key = settings.get(CLOUD_S3.SECRET, settings.get(CLOUD_AWS.SECRET));
```
Also, we extract all settings for S3 in `AwsS3Service` as it's already the case for `AwsEc2Service` class.
Closes#15268.
Since 2.2 we run all scripts with minimal privileges, similar to applets in your browser.
The problem is, they have unrestricted access to other things they can muck with (ES, JDK, whatever).
So they can still easily do tons of bad things
This PR restricts what classes scripts can load via the classloader mechanism, to make life more difficult.
The "standard" list was populated from the old list used for the groovy sandbox: though
a few more were needed for tests to pass (java.lang.String, java.util.Iterator, nothing scary there).
Additionally, each scripting engine typically needs permissions to some runtime stuff.
That is the downside of this "good old classloader" approach, but I like the transparency and simplicity,
and I don't want to waste my time with any feature provided by the engine itself for this, I don't trust them.
This is not perfect and the engines are not perfect but you gotta start somewhere. For expert users that
need to tweak the permissions, we already support that via the standard java security configuration files, the
specification is simple, supports wildcards, etc (though we do not use them ourselves).
Azure team released new versions of their Java SDK.
According to https://github.com/Azure/azure-sdk-for-java/wiki/Azure-SDK-for-Java-Features, it comes with 2 versions.
We should at least update to `0.9.0` of V1 but also consider moving to the new APIs (V2).
This commit first updates to latest API V1.
```xml
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-svc-mgmt-compute</artifactId>
<version>0.9.0</version>
</dependency>
```
Closes#15209
Failures to merge a mapping can either come as a MergeMappingException if they
come from Mapper.merge or as an IllegalArgumentException if they come from
FieldTypeLookup.checkCompatibility. I think we should settle on one: this pull
request replaces all usage of MergeMappingException with
IllegalArgumentException.
The PipelineTests tried to test if the configured map/list in set processor wasn't modified while documents were ingested. Creating a pipeline programmatically created more noise than the test needed. The new tests in IngestDocumentTests have the same goal, but is much smaller and clearer by directly testing against IngestDocument.
This commit removes and now forbids all uses of the type-unsafe empty
Collections fields Collections#EMPTY_LIST, Collections#EMPTY_MAP, and
Collections#EMPTY_SET. The type-safe methods Collections#emptyList,
Collections#emptyMap, and Collections#emptySet should be used instead.
This change attempts to simplify the gradle tasks for precommit. One
major part of that is using a "less groovy style", as well as being more
consistent about how tasks are created and where they are configured. It
also allows the things creating the tasks to set up inter task
dependencies, instead of assuming them (ie decoupling from tasks
eleswhere in the build).
Rename processor now checks whether the field to rename exists and throws exception if it doesn't. It also checks that the new field to rename to doesn't exist yet, and throws exception otherwise. Also we make sure that the rename operation is atomic, otherwise things may break between the remove and the set and we'd leave the document in an inconsistent state.
Note that the requirement for the new field name to not exist simplifies the usecase for e.g. { "rename" : { "list.1": "list.2"} } as such a rename wouldn't be accepted if list is actually a list given that either list.2 already exists or the index is out of bounds for the existing list. If one really wants to replace an existing field, that field needs to be removed first through remove processor and then rename can be used.
1) It no longer extends from Closeable.
2) Removed the config directory setter. Implementation that relied on it, now get the location to the config dir via their constructors.
Validation is not done as part of the distance setter method and tested in GeoDistanceQueryBuilderTests. Fixed GeoDistanceTests to adapt to the new validation.
Closes#15135
When reading, through #getFieldValue and #hasField, and a list is encountered, the next element in the path is treated as the index of the item that the path points to (e.g. `list.0.key`). If the index is not a number or out of bounds, an exception gets thrown.
Added #appendFieldValue method that has the same behaviour as setFieldValue, but when a list is the last element in the path, instead of replacing the whole list it will simply add a new element to the existing list. This method is currently unused, we have to decide whether the set processor or a new processor should use it.
A few other changes made:
- Renamed hasFieldValue to hasField, as this method is not really about values but only keys. It will return true if a key is there but its value is null, while it returns false only when a field is not there at all.
- Changed null semantic in getFieldValue. null gets returned only when it was an actual value in the source, an exception is thrown otherwise when trying to access a non existing field, so that null != field not present.
- Made remove stricter about non existing fields. Throws error when trying to remove a non existing field. This is more consistent with the other methods in IngestDocument which are strict about fields that are not present.
Relates to #14324
Do not to load fields from _source when using the `fields` option.
Non stored (non existing) fields are ignored by the fields visitor when using the `fields` option.
Fixes#10783
Support * wildcard to retrieve stored fields when using the `fields` option.
Supported pattern styles are "xxx*", "*xxx", "*xxx*" and "xxx*yyy".
Its enough to test the content type for what we are testing.
Currently tests are flaky if charset is detected as e.g. windows-1252 vs iso-8859-1 and so on.
In fact, they fail on windows 100% of the time.
We are not trying to test charset detection heuristics (which might be different even due to newlines in tests or other things).
If we want to do test that, we should test it separately.
When importing dangling indices on a single node that is data and master eligable the async dangling index
call can still be in-flight when the cluster is checked for green / yellow. Adding a dedicated master node
and a data only node that does the importing fixes this issus just like we do in OldIndexBackwardsCompatibilityIT
Client in PipelineStore gets provided via a guice provider
Processor and Factory throw Exception instead of IOException
Removed PipelineExecutionService.Listener with ActionListener
* Moved PipelineReference to a top level class and named it PipelineDefinition
* Pulled some logic from the crud transport classes to the PipelineStore
* Use IOUtils#close(...) where appropriate
Move ParsedSimulateRequest to SimulatePipelineRequest and remove Parser class in favor of static parse methods.
Simplified execute methods in SimulateExecutionService.
This moves the registration of field mappers from the index level to the node
level and also ensures that mappers coming from plugins are treated no
differently from core mappers.
* Forbid System.setProperties & co in forbidden APIs.
* Ban property write access at runtime with security manager.
Plugins that need to modify system properties will need to request permission in their plugin-security.policy
This makes AvgTests use a mock plugin engine. I also removed the
textScriptExplicit* methods for the base class since they only make sense for
a groovy script, not a mock script.
Bug introduced in #13779: we don't filter anymore credentials because we were filtering `cloud.azure.storage.account` and `cloud.azure.storage.key` but now credentials are like `cloud.azure.storage.XXX.account` and `cloud.azure.storage.XXX.key` where `XXX` can be a storage setting id.
Closes#14843.
Adds a counter to each write operation on a shard. This sequence numbers is indexed into lucene using doc values, for now (we will probably require indexing to support range searchers in the future).
On top of this, primary term semantics are enforced and shards will refuse write operation coming from an older primary.
Other notes:
- The add SequenceServiceNumber is just a skeleton and will be replaced with much heavier one, once we have all the building blocks (i.e., checkpoints).
- I completely ignored recovery - for this we will need checkpoints as well.
- A new based class is introduced for all single doc write operations. This is handy to unify common logic (like toXContent).
- For now, we don't use seq# as versioning. We could in the future.
Relates to #10708Closes#14651
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
At the time of geo_shape query conception, CONTAINS was not yet a supported spatial operation in Lucene. Since it is now available this commit adds ShapeRelation.CONTAINS to GeoShapeQuery. Randomized testing is included and documentation is updated.
We actually want to keep the test when using deprecated setting in 3.0.
We will keep this setting deprecated as well so people will be able to update in a smoother way.
Also add the deprecating information to the migration documentation.
Follow up for #13228.
This commit adds support for a secondary storage account:
```yml
cloud:
azure:
storage:
my_account1:
account: your_azure_storage_account1
key: your_azure_storage_key1
default: true
my_account2:
account: your_azure_storage_account2
key: your_azure_storage_key2
```
When creating a repository, you can choose which azure account you want to use for it:
```sh
curl -XPUT localhost:9200/_snapshot/my_backup1?pretty -d '{
"type": "azure"
}'
curl -XPUT localhost:9200/_snapshot/my_backup2?pretty -d '{
"type": "azure",
"settings": {
"account" : "my_account2",
"location_mode": "secondary_only"
}
}'
```
`location_mode` supports `primary_only` or `secondary_only`. Defaults to `primary_only`. Note that if you set it
to `secondary_only`, it will force `read_only` to true.
DateHistogramTests had some dependency on groovy scripts and
were moved to the lang-groovy module. This PR moves it back
and replaces use of groovy scripts by a mock script engine.
Removing three test cases that were testing doing some date
manipulation using script, since these are more groovy script
tests than testing the DateHistogram aggregation.
Only MutateProcessor implemented equals / hashcode hence we would only use that one in our tests, since they relied on them. Better to not rely on equals/hashcode, drop them and mock processor/pipeline in our tests that need them. That also allow to make MutateProcessor constructor package private as the other processors.
Removed equals and hashcode whenever they wouldn't be reliable because of exception comparison. at the end of the day we use them for testing and we can simplify our tests without requiring equals and hashcode in prod code, which also would require more tests if maintained.
Add equals/hashcode test for Data/TransportData and randomize existing serialization tests
This fixes an issue where if the field for the aggregation was unmapped the extended bounds would get dropped and the resulting buckets would not cover the extended bounds requested.
Closes#14735
This fixes an issue where if the field for the aggregation was unmapped the extended bounds would get dropped and the resulting buckets would not cover the extended bounds requested.
Closes#14735
This commit removes all noreleases and cuts over to Lucene 5.4 GeoPointField type. Included are randomized testing updates to unit and integration test suites for ensuring full backward compatability with existing geo_point indexes.
Remove code duplications from ConfigurationUtils
Make sure that the mutate processor doesn't use Tuple as that would require to depend on core.
Also make sure that the MutateProcessor tests don't end up testing the factory as well.
Make processor getters package private as they are only needed in tests.
Add new tests to MutateProcessorFactoryTests
Transitive dependencies can be confusing and hard to deal with when
conflicts arise between them. This change removes transitive
dependencies from elasticsearch, and forces any dependency conflicts to
be resolved manually, instead of automatically by gradle.
closes#14627
`AbstractLegacyBlobContainer` was kept for historical reasons (see #13434).
We can migrate Azure and S3 repositories to use the new methods added in #13434 so we can remove `AbstractLegacyBlobContainer` class.
Some dependencies must be specified in a couple places in the build.
e.g. randomized runner is specified both in buildSrc (for the gradle
wrapper plugin), as well as in the test-framework.
This change creates buildSrc/versions.properties which acts similar to
the set of shared version properties we used to have in the maven parent
pom.
geoip: add a `fields` option to control what fields are added by geoip processor
geoip: instead of adding all fields, only `country_code`, `city_name`, `location`, `continent_name` and `region_name` fields are added.
fix forbiddenapis
update
clean up and add rest test
update mutate factory to use configuration utilities
compile gsub pattern
cleanup, update parseBooleans, null tests
The documentation says we support EPUB, but the parser is not enabled.
This parser does not require any external dependencies, so I think its ok?
Separately, test-framework drags in an ancient commons-codec (via httpclient), which gradle
"upgrades", but IDEs can't handle this case and just hit jar hell. So just wire that to 1.9,
this allows running tests in the IDE for this plugin.
The plugin name currently defaults to the gradle project name. But the
gradle project name for standalone repo (like an external plugin would
be) defaults to the directory name of the repo. This is trappy, since it
depends on how the repo was checked out.
This change enforces the plugin name is always set.
closes#14603
Random code shouldn't be listening on sockets elsewhere.
Today its the wild west, but we only need to grant access to what the user configured.
This means e.g. multicast plugin has to declare its intentions in its security.policy
Closes#14549