Commit Graph

5556 Commits

Author SHA1 Message Date
Simon Willnauer 5c8164a561 Clean up BytesReference (#19196)
BytesReference should be a really simple interface, yet it has a gazillion
ways to achieve the same this. Methods like `#hasArray`, `#toBytesArray`, `#copyBytesArray`
`#toBytesRef` `#bytes` are all really duplicates. This change simplifies the interface
dramatically and makes implementations of it much simpler. All array access has been removed
and is streamlined through a single `#toBytesRef` method. Utility methods to materialize a
compact byte array has been added too for convenience.
2016-07-01 16:09:31 +02:00
Nik Everett 27e320d5ce Migrate sum, min, and max aggs to NamedWriteable 2016-07-01 09:23:26 -04:00
Nik Everett 91b66e3cf4 Migration stats and extended stats to NamedWriteable
Migrates the `stats` and `extended_stats` aggregations and pipeline
aggregations from the special purpose aggregations streams to
`NamedWriteable`. These are the first pipeline aggregations so this
adds the infrastructure to support both streams and `NamedWriteable`s
for pipeline aggregations.
2016-07-01 09:13:15 -04:00
javanna 598c36128e Revert "Raised IOException on deleteBlob (#18815)"
This reverts commit d24cc65cad as it seems to be causing test failures.
2016-07-01 11:00:32 +02:00
gfyoung d24cc65cad Raised IOException on deleteBlob (#18815)
Raise IOException on deleteBlob if the blob doesn't exist

This commit raises an IOException on BlobContainer#deleteBlob
if the blob does not exist, in conformance with the BlobContainer
interface contract.  Each implementation of BlobContainer now
conforms to this contract (file system, S3, Azure, HDFS).  This 
commit also contains blob container tests for each of the 
repository implementations.

Closes #18530
2016-06-30 23:00:10 -04:00
Ryan Ernst 8275ab497b Merge pull request #19170 from rjernst/rest_handler_client
Changed rest handler interface to take NodeClient
2016-06-30 11:00:09 -07:00
Nik Everett f5a269b029 Start migration away from aggregation streams
We'll migrate to NamedWriteable so we can share code with the rest
of the system. So we can work on this in multiple pull requests without
breaking Elasticsearch in between the commits this change supports
*both* old style `InternalAggregations.stream` serialization and
`NamedWriteable` style serialization. As such it creates about a
half dozen `// NORELEASE` comments that will have to be removed
once the migration is complete.

This also introduces a boolean `transportClient` flag to `SearchModule`
which is used to skip inappropriate registrations for for the
transport client while still registering the things it needs. In
this case that means that the `InternalAggregation` subclasses are
registered with the `NamedWriteableRegistry` but the `AggregationBuilder`
subclasses are not.

Finally, this moves aggregation registration from guice configuration
time to `SearchModule` construction time. This will make it simpler to
work with in the future as we further clean up Elasticsearch's
extension points.
2016-06-30 12:57:34 -04:00
Boaz Leskes 09ca6d6ed2 Add a BridgePartition to be used by testAckedIndexing (#19172)
We have long worked to capture different partitioning scenarios in our testing infra. This PR adds a new variant, inspired by the Jepsen blogs, which was forgotten far - namely a partition where one node can still see and be seen by all other nodes. It also updates the resiliency page to better reflect all the work that was done in this area.
2016-06-30 17:58:12 +02:00
Ryan Ernst 04a4bcdca0 Add comment explaining bytes reference edge case 2016-06-30 08:47:55 -07:00
Ryan Ernst e079c83020 Fix test edge case for bytes reference 2016-06-30 08:45:54 -07:00
Ryan Ernst c762e7aa15 Merge branch 'master' into rest_handler_client 2016-06-30 08:16:25 -07:00
Ryan Ernst 0732004ae8 Merge pull request #19177 from rjernst/ingest_factory_generic
Remove generics from ingest Processor.Factory
2016-06-30 08:08:26 -07:00
Christoph Büscher afb5e6332b Make sure TimeIntervalRounding is monotonic for increasing dates (#19020)
Currently there are cases when using TimeIntervalRounding#round() and date1 <
date2 that round(date2) < round(date1). These errors can happen when using a
non-fixed time zone and the values to be rounded are slightly after a time zone
offset change (e.g. DST transition).

Here is an example for the "CET" time zone with a 45 minute rounding interval.
The dates to be rounded are on the left (with utc time stamp), the rounded
values on the right. The error case is marked:

2011-10-30T01:40:00.000+02:00 1319931600000 | 2011-10-30T01:30:00.000+02:00 1319931000000
2011-10-30T02:02:30.000+02:00 1319932950000 | 2011-10-30T01:30:00.000+02:00 1319931000000
2011-10-30T02:25:00.000+02:00 1319934300000 | 2011-10-30T02:15:00.000+02:00 1319933700000
2011-10-30T02:47:30.000+02:00 1319935650000 | 2011-10-30T02:15:00.000+02:00 1319933700000
2011-10-30T02:10:00.000+01:00 1319937000000 | 2011-10-30T01:30:00.000+02:00 1319931000000 *
2011-10-30T02:32:30.000+01:00 1319938350000 | 2011-10-30T02:15:00.000+01:00 1319937300000
2011-10-30T02:55:00.000+01:00 1319939700000 | 2011-10-30T02:15:00.000+01:00 1319937300000
2011-10-30T03:17:30.000+01:00 1319941050000 | 2011-10-30T03:00:00.000+01:00 1319940000000

We should correct this by detecting that we are crossing a transition when
rounding, and in that case pick the largest valid rounded value before the
transition.

This change adds this correction logic to the rounding function and adds this
invariant to the randomized TimeIntervalRounding tests. Also adding the example
test case from above (with corrected behaviour) for illustrative purposes.
2016-06-30 17:05:54 +02:00
Simon Willnauer 40ec639c89 Factor out abstract TCPTransport* classes to reduce the netty footprint (#19096)
Today we have a ton of logic inside the NettyTransport* codebase. The footprint
of the code that has a direct netty dependency is large and alternative implementations
are pretty hard today since they need to know all about our proticol etc.
This change moves most of the code into TCPTransport* baseclasses and moves all
the protocol send code together. The base classes now contain the majority of the logic
while NettyTransport* classes remain to implement the glue code, configuration and optimization.
2016-06-30 13:41:53 +02:00
Ryan Ernst e4f265eb3a Ingest: Remove generics from Processor.Factory
The factory for ingest processor is generic, but that is only for the
return type of the create mehtod. However, the actual consumer of the
factories only cares about Processor, so generics are not needed.

This change removes the generic type from the factory. It also removes
AbstractProcessorFactory which only existed in order pull the optional
tag from config. This functionality is moved to the caller of the
factories in ConfigurationUtil, and the create method now takes the tag.
This allows the covariant return of the implementation to work with
tests not needing casts.
2016-06-30 02:33:54 -07:00
Martijn van Groningen 299c6fcc63 test: use the reader from the searcher (newSearcher(...) method may change the reader) instead of the reader we create in the test
Closes #19151
2016-06-30 11:10:38 +02:00
Ryan Ernst c77dc4a82c Merge pull request #19136 from rjernst/script_service_deps
Scripts: Remove ClusterState from compile api
2016-06-29 22:34:40 -07:00
Ryan Ernst 865b951b7d Internal: Changed rest handler interface to take NodeClient
Previously all rest handlers would take Client in their injected ctor.
However, it was only to hold the client around for runtime. Instead,
this can be done just once in the HttpService which handles rest
requests, and passed along through the handleRequest method. It also
should always be a NodeClient, and other types of Clients (eg a
TransportClient) would not work anyways (and some handlers can be
simplified in follow ups like reindex by taking NodeClient).
2016-06-29 18:02:18 -07:00
Ryan Ernst 7c50de182e Remove test for closing ingest processors, this is now handled at the
plugin level
2016-06-29 16:23:16 -07:00
Ryan Ernst 172ced3e2d Fix test bug in plugin cli progress tests 2016-06-29 15:56:36 -07:00
Nik Everett 8db43c0107 Move RestHandler registration to ActionModule and ActionPlugin
`RestHandler`s are highly tied to actions so registering them in the
same place makes sense.

Removes the need to for plugins to check if they are in transport client
mode before registering a RestHandler - `getRestHandlers` isn't called
at all in transport client mode.

This caused guice to throw a massive fit about the circular dependency
between NodeClient and the allocation deciders. I broke the circular
dependency by registering the actions map with the node client after
instantiation.
2016-06-29 18:31:44 -04:00
Ryan Ernst 4dcb2b8024 Merge pull request #19137 from rjernst/closeable_plugins
Make plugins closeable
2016-06-29 13:54:20 -07:00
Ryan Ernst b3daf7d683 Remove unnecessary variant of detailedMessage 2016-06-29 11:25:23 -07:00
Ryan Ernst 8b533b7ca9 Internal: Deprecate ExceptionsHelper.detailedMessage
This is a trappy "helper" and only hurts.
See #19069
2016-06-29 11:09:35 -07:00
Jason Tedor fc38e503e0 Clearer error when handling fractional time values
In 2f638b5a23, support for fractional time
values was removed. While this change is documented, the error message
presented does not give an indication that fractional inputs are not
supported. This commit fixes this by detecting when the input is a time
value that would successfully parse as a double but will not parse as a
long and presenting a clear error message that fractional time values
are not supported.

Relates #19158
2016-06-29 13:36:11 -04:00
Christoph Büscher 0d81dee013 Fix key_as_string for date histogram and epoch_millis/epoch_second format
When doing a `date_histogram` aggregation with `"format":"epoch_millis"` or
`"format" : "epoch_second"` and using a time zone other than UTC, the
`key_as_string` ouput in the response does not reflect the UTC timestamp that is
used as the key. This happens because when applying the `time_zone` in
DocValueFormat.DateTime to an epoch-based formatter, this adds the time zone
offset to the value being formated. Instead we should adjust the added display
offset to get back the utc instance in EpochTimePrinter.

Closes #19038
2016-06-29 19:18:12 +02:00
Alexander Reelsen 56fa751928 Plugins: Add status bar on download (#18695)
As some plugins are becoming big now, it is hard for the user to know, if the plugin
is being downloaded or just nothing happens.

This commit adds a progress bar during download, which can be disabled by using the `-q`
parameter.

In addition this updates to jimfs 1.1, which allows us to test the batch mode, as adding
security policies are now supported due to having jimfs:// protocol support in URL stream
handlers.
2016-06-29 16:44:12 +02:00
Britta Weber 6d5666553c [TEST] mute test because it fails about 1/100 runs 2016-06-29 15:53:57 +02:00
Simon Willnauer 819fe40d61 Extract AbstractBytesReferenceTestCase (#19141)
We have a ton of tests for PagedBytesReference but not really many for the other
implementation of BytesReference. This change factors out a basic AbstractBytesReferenceTestCase
that simplifies testing other implementations. It also caught a couple of bug here and there like
a missing mask when reading bytes as ints in PagedBytesReference.
2016-06-29 14:45:54 +02:00
Simon Willnauer 872cdffc27 Factor out ChannelBuffer from BytesReference (#19129)
The ChannelBuffer interface today leaks into the BytesReference abstraction
which causes a hard dependency on Netty across the board. This chance moves
this dependency and all BytesReference -> ChannelBuffer conversion into
NettyUtlis and removes the abstraction leak on BytesReference.
This change also removes unused methods on the BytesReference interface
and simplifies access to internal pages.
2016-06-29 10:45:05 +02:00
Ryan Ernst 6590e77c1a Plugins: Make plugins closeable
This change allows Plugin implementions to implement Closeable when they
have resources that should be released. As a first example of how this
can be used, I switched over ingest plugins, which just had the geoip
processor. The ingest framework had chains of closeable to support this,
which is now removed.
2016-06-28 16:16:26 -07:00
Ryan Ernst ecf6101798 Scripts: Remove ClusterState from compile api
Stored scripts are pulled from the cluster state, and the current api
requires passing the ClusterState on each call to compile. However, this
means every user of the ScriptService needs to depend on the
ClusterService. Instead, this change makes the ScriptService a
ClusterStateListener. It also simplifies tests a lot, as they no longer
need to create fake cluster states (except when testing stored scripts).
2016-06-28 13:20:00 -07:00
Simon Willnauer 9b9e17abf7 Cleanup Compressor interface (#19125)
Today we have several deprecated methods, leaking netty interfaces, support for
multiple compressors on the compressor interface. The netty interface can simply
be replaced by BytesReference which we already have an implementation for, all the
others are not used and are removed in this commit.
2016-06-28 17:51:33 +02:00
Yannick Welsch 0515791846 Fix logger usages 2016-06-28 16:51:06 +02:00
Boaz Leskes 2512594d9e Testing infra - stablize data folder usage and clean up (#19111)
The plan for persistent node ids ( #17811 ) is to tie the node identity to a file stored in it's data folders. As such it becomes important that nodes in our testing infra have better affinity with their data folders and that their data folders are not cleaned underneath them. The first is important because we fix the random seed used for node id generation (for reproducibility) and allowing the same node to use two different data folders causes two separate nodes to have the same id, which prevents the cluster from forming. The second is important, for example, where a full cluster restart / single node restart need to maintain node identity and wiping the data folders at the wrong moment prevents this.

Concretely this commit does the following:
1) Remove previous attempts to have data folder per role using a prefix. This wasn't effective as it was using the data paths settings which are only used for part of the runs. An attempt to completely separate the paths via the home dir failed due to assumptions made by index custom path about node data folder ordinal uniqueness (see #19076)
2) Change full cluster restarts to start up nodes in the same order their were first created in, only randomly swapping nodes with the same roles.
3) Change test cluster reset methods to first shutdown the unneeded nodes and then re-start the shared nodes that were shut down, so they'll reclaim their data folders.
4) Improve data folder wiping logic and make sure it wipes only folders of "offline" nodes.
5) Add some very basic tests
2016-06-28 16:38:56 +02:00
Jim Ferenczi 6d069078d3 Fixed tests that assumed that broken settings can be updated 2016-06-28 16:14:57 +02:00
Jim Ferenczi ef0e3db0de Validates new dynamic settings from the current state
Thanks to https://github.com/elastic/elasticsearch/pull/19088 the settings are now validated against dynamic updaters on the master.
Though only the new settings are applied to the IndexService created for the validation.
Because of this we cannot check the transition from one value to another in a dynamic updaters.
This change creates the IndexService from the current settings and validates that the new dynamic settings
can replace the current settings.
This change also removes the validation of dynamic settings when an index is opened.
The validation should have occurred when the settings have been updated.
2016-06-28 15:35:04 +02:00
Nik Everett fa4844c3f4 Pull actions from plugins
Instead of implementing onModule(ActionModule) to register actions,
this has plugins implement ActionPlugin to declare actions. This is
yet another step in cleaning up the plugin infrastructure.

While I was in there I switched AutoCreateIndex and DestructiveOperations
to be eagerly constructed which makes them easier to use when
de-guice-ing the code base.
2016-06-28 08:36:24 -04:00
Jason Tedor 2f638b5a23 Keep input time unit when parsing TimeValues
This commit modifies TimeValue parsing to keep the input time unit. This
enables round-trip parsing from instances of String to instances of
TimeValue and vice-versa. With this, this commit removes support for the
unit "w" representing weeks, and also removes support for fractional
values of units (e.g., 0.5s).

Relates #19102
2016-06-27 18:41:18 -04:00
Ryan Ernst 3f2946ce6d Fix line length in new indices module tests. 2016-06-27 11:33:22 -07:00
Ryan Ernst 33ccc5aead Merge branch 'master' into mapper_plugin_api 2016-06-27 11:19:59 -07:00
Ryan Ernst f17fcce3ed Add duplicate mapper detection and tests 2016-06-27 11:17:58 -07:00
Jim Ferenczi eb1e231a63 Revert "Rename `fields` to `stored_fields` and add `docvalue_fields`"
This reverts commit 2f46f53dc8.
2016-06-27 17:20:32 +02:00
Simon Willnauer 4fb1c4fe5a Validate settings against dynamic updaters on the master (#19088)
Today all settings are only validated against their validators
that are available when settings are registered. Yet, some settings updaters
have validators that are dynamic ie. their validation depends on other variables
that are only available at runtime. We do not run those validators when settings
are updated causing index updates to fail on the data nodes instead of on the master.

Relates to #19046
2016-06-27 17:18:26 +02:00
Colin Goodheart-Smithe 108ba23073 Pass resolved extended bounds to unmapped histogram aggregator
Previous to this change the unresolved extended bounds was passed into the histogram aggregator which meant extendedbounds.min and extendedbounds.max was passed through as null. This had two effects on the histogram aggregator:

1. If the histogram aggregator was unmapped across all shards, the reduce phase would not add buckets for the extended bounds and the response would contain zero buckets
2. If the histogram aggregator was not unmapped in some shards, the reduce phase might sometimes chose to reduce based on the unmapped shard response and therefore the extended bounds would be ignored.

This change resolves the extended bounds in the unmapped case and solves the above two issues.

Closes #19009
2016-06-27 14:07:37 +01:00
Boaz Leskes cb0824e957 Make shard store fetch less dependent on the current cluster state, both on master and non data nodes (#19044)
#18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted.

These are the notable changes:
1) Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed.
2) The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed.
3) The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders.
4) I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.
2016-06-27 15:05:06 +02:00
Martijn van Groningen d3cd58eb2f Merges PR #18957
This commit fixes several NPEs caused by implicitly performing a get request for a document that exists with its _source disabled and then trying to access the source. Instead of causing an NPE the following queries will throw an exception with a "source disabled" message (similar behavior as if the document does not exist).:
- GeoShape query for pre-indexed shape (throws IllegalArgumentException)
- Percolate query for an existing document (throws IllegalArgumentException)

A Terms query with a lookup will ignore the document if the source does not exist (same as if the document does not exist).

GET and HEAD requests for the document _source will return a 404 if the source is disabled (even if the document exists).
2016-06-27 09:37:28 +02:00
Martijn van Groningen ba90508b91 fix checkstyle issue 2016-06-27 09:00:13 +02:00
Nik Everett 71b95fb63c Switch analysis from push to pull
Instead of plugins calling `registerTokenizer` to extend the analyzer
they now instead have to implement `AnalysisPlugin` and override
`getTokenizer`. This lines up extending plugins in with extending
scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry`
immediately as part of its constructor which makes testing anslysis
much simpler.

This also moves the default analysis configuration into `AnalysisModule`
which is how search is setup.

Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`.
Instead it is only responsible for building `AnslysisRegistry`. We still
bind `AnalysisRegistry` but we only do so in `Node`. This is means it
is available at module construction time so we slowly remove the need to
bind it in guice.
2016-06-26 07:15:42 -04:00
Jason Tedor c79e27180e Require timeout units when parsing query body
Today when parsing the timeout field in a query body, if time units are
supplied the parser throws a NumberFormatException. Addtionally, the
parsing allows the timeout field to not specify units (it assumes
milliseconds). This commit fixes this behavior by not only allowing time
units to be specified but requires time units to be specified. This is
consistent with the documented behavior and the behavior in 2.x.

Relates #19077
2016-06-25 16:18:25 -04:00