OpenSearch

Commit Graph

Author	SHA1	Message	Date
Ryan Ernst	e7818f75e1	Fix checkstyle for TestProcessor	2016-07-05 22:33:08 -07:00
Ryan Ernst	2fc41adeb5	Merge branch 'master' into ingest_plugin_api	2016-07-05 20:53:03 -07:00
Nik Everett	b3c015e2bb	Reindex from remote This adds a remote option to reindex that looks like ``` curl -POST 'localhost:9200/_reindex?pretty' -d'{ "source": { "remote": { "host": "http://otherhost:9200" }, "index": "target", "query": { "match": { "foo": "bar" } } }, "dest": { "index": "target" } }' ``` This reindex has all of the features of local reindex: * Using queries to filter what is copied * Retry on rejection * Throttle/rethottle The big advantage of this version is that it goes over the HTTP API which can be made backwards compatible. Some things are different: The query field is sent directly to the other node rather than parsed on the coordinating node. This should allow it to support constructs that are invalid on the coordinating node but are valid on the target node. Mostly, that means old syntax.	2016-07-05 16:13:17 -04:00
Jason Tedor	96f283c195	Rename writeThrowable to writeException This commit renames writeThrowable to writeException. The situation here stems from the fact that the StreamOutput method for serializing Exceptions needs to accept Throwables too as Throwables can be the cause of serialized Exceptions. Yet, we do not serialize Throwables in the Error sub-hierarchy in a way that they can be deserialized into their initial type. This leads to an asymmetry in the StreamOutput method for serializing Exceptions and the StreamInput method for writing Excpetions. Namely, the former will accept Throwables but the latter will only return Exceptions. A goal with the stream methods has always been symmetry in the method names so that serialization/deserialization routines appear symmetrical in code. It is this asymmetry on the input/output types for Exceptions on StreamOutput/StreamInput that clashes with the desired symmetry of naming. Despite this, we should favor symmetry in the naming of the methods. This commit renames StreamOutput#writeThrowable to StreamOutput#writeException which leaves us with Exception StreamInput#readException and void StreamOutput#writeException(Throwable).	2016-07-05 14:37:01 -04:00
Boaz Leskes	6861d3571e	Persistent Node Ids (#19140 ) Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys. The first approach I considered was to use the node's published address as the base for the id. We already [treat nodes with the same address as the same](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java#L387) so this is a simple change (see [here](https://github.com/elastic/elasticsearch/compare/master...bleskes:node_persistent_id_based_on_address)). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same. Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join. Last, and most importantly, this change requires that all nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking. Other less important notes: - DummyTransportAddress is removed as we need a unique network address per node. Use `LocalTransportAddress.buildUnique()` instead. - I renamed `node.add_lid_to_custom_path` to `node.add_lock_id_to_custom_path` to avoid confusion with the node ID which is now part of the `NodeEnvironment` logic. - I removed the `version` paramater from `MetaDataStateFormat#write` , it wasn't really used and was just in the way :) - TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.	2016-07-04 21:09:25 +02:00
Tanguy Leroux	0e7faf1005	Enable Checkstyle RedundantModifier	2016-07-04 15:22:12 +02:00
Jason Tedor	3343ceeae4	Do not catch throwable Today throughout the codebase, catch throwable is used with reckless abandon. This is dangerous because the throwable could be a fatal virtual machine error resulting from an internal error in the JVM, or an out of memory error or a stack overflow error that leaves the virtual machine in an unstable and unpredictable state. This commit removes catch throwable from the codebase and removes the temptation to use it by modifying listener APIs to receive instances of Exception instead of the top-level Throwable. Relates #19231	2016-07-04 08:41:06 -04:00
Ryan Ernst	5a66c08ae9	Merge branch 'master' into ingest_plugin_api	2016-07-01 16:27:52 -07:00
Ryan Ernst	822c995367	Internal: Remove generics from LifecycleComponent The only reason for LifecycleComponent taking a generic type was so that it could return that type on its start and stop methods. However, this chaining has no practical necessity. Instead, start and stop can be void, and a whole bunch of confusing generics disappear.	2016-07-01 16:17:42 -07:00
Ryan Ernst	e5caadc4f3	Merge branch 'master' into ingest_plugin_api	2016-07-01 12:35:26 -07:00
Nik Everett	f30a70c51f	Fix comment I forgot a word....	2016-07-01 14:48:08 -04:00
Nik Everett	ff42d7cfc6	Add embedded stash key support to rest tests This allowes embedding stash keys in string like `t${key}est`. This allows simple string concatenation like acitons. The test for this is in `ObjectPathTests` because `Stash` doesn't seem to have a test on its own and it is simple enough to test embedded stashes this way. And this is a way I expect them to be used eventually.	2016-07-01 14:11:11 -04:00
Ryan Ernst	65c9b0b588	Merge branch 'master' into ingest_plugin_api	2016-07-01 09:26:17 -07:00
Tanguy Leroux	8c40b2b54e	Fix order of modifiers	2016-07-01 16:57:14 +02:00
Simon Willnauer	5c8164a561	Clean up BytesReference (#19196 ) BytesReference should be a really simple interface, yet it has a gazillion ways to achieve the same this. Methods like `#hasArray`, `#toBytesArray`, `#copyBytesArray` `#toBytesRef` `#bytes` are all really duplicates. This change simplifies the interface dramatically and makes implementations of it much simpler. All array access has been removed and is streamlined through a single `#toBytesRef` method. Utility methods to materialize a compact byte array has been added too for convenience.	2016-07-01 16:09:31 +02:00
javanna	dd781d410a	fix line length problems in all classes under o.e.test.rest package	2016-07-01 11:13:10 +02:00
javanna	0b5a549305	[TEST] remove special treatment for stashed $body in REST tests, instead always evaluate the stash through ObjectPath When we introduced docs testing we added a special case for $body in Stash, so that the last stashed body could be evaluated, and expressions like "$body.took" could be extracted out of it. We can instead do that for any object in the stash, by simply wrapping the internal map in an ObjectPath instance. We can then drop the special stashResponse method and go back to using the ordinary stashValue too. The downside of this change is that it adds a feature that may not be supported by other REST test runners, namely the evaluation of compouned paths from the stash. If we have "object" stashed as an object, it is now possible to extract directly each subobject of it as well e.g. "object.subobject.field1". None of the current REST tests rely on this, but our docs snippets tests do.	2016-07-01 11:13:10 +02:00
javanna	43b82ce244	[TEST] remove feature yaml from REST tests The only runner that supported it was the java runner, we can use json format instead given that the default one with cat apis is text	2016-07-01 11:13:10 +02:00
javanna	60bafa5d78	[TEST] parse yaml responses too through ObjectPath rather than only json responses No need to match against yaml responses via regexes in REST tests, yaml responses can be properly parsed via ObjectPath instead. Few REST tests need to be updated accordingly.	2016-07-01 11:13:10 +02:00
javanna	34f5c50a7f	[TEST] eagerly parse response body at ObjectPath initialization and read content type from response headers We are going to parse the body anyways whenever it's in json format as it is going to be stashed. It is not useful to lazily parse it anymore. Also this allows us to not rely on automatic detection of the xcontent type based on the content of the response, but rather read the content type from the response headers.	2016-07-01 11:13:10 +02:00
javanna	d5df738538	[TEST] ObjectPath to support parsing yaml or json that have an array as root object ObjectPath used a Map up until now for the internal representation of its navigable object. That works in most of the cases, but there could also be an array as root object, in which case a List needs to be used instead of a Map. This commit changes the internal representation of the object to Object which can either be a List or a Map. The change is minimal as ObjectPath already had the checks in place to verify the type of the object in the current position and navigate through it. Note: The new test added to ObjectPathTest uses yaml format explicitly as auto-detection of json format works only for a json object that starts with '{', not if the root object is actually an array and starts with '['.	2016-07-01 11:13:10 +02:00
javanna	bbaa23bdfd	[TEST] extend ObjectPathTests to support also yaml format	2016-07-01 11:13:10 +02:00
javanna	44dc801e90	[TEST] make JsonPath independent of data format, rename to ObjectPath The internal representation of the object that JsonPath gives access to is a map. That is independent of the initial input format, which is json but could also be yaml etc. This commit renames JsonPath to ObjectPath and adds a static method to create an ObjectPath from an XContent	2016-07-01 11:13:10 +02:00
javanna	76199ce497	[TEST] rename REST tests Stash methods to distinguish between retrieving a value and replacing values within a map Stash#unstashMap -> replaceStashedValues Stash#unstashValue -> getValue	2016-07-01 11:13:10 +02:00
javanna	62462f5d9b	[TEST] replace ResponseBodyAssertion with existing MatchAssertion We introduced a special response_body assertion to test our docs snippets. The match assertion does the same job though and can be reused and adapted where needed. ResponseBodyAssertion contains provides much better and accurate errors though, which can be now utilized in MatchAssertion so that many more REST tests can benefit from readable error messages. Each response body gets always stashed and can be retrieved for later evaluations already. Instead of providing the response body as strings that get parsed to json objects separately, then converted to maps as ResponseBodyAssertion did, we parse everything once, the json is part of the yaml test, which is supported. The only downside is that json comments cannot be used, rather yaml comments should be used (// C style vs # ). There were only two docs tests that were using comments in ingest-node.asciidoc where I went ahead and remove the comments which didn't seem that useful anyways.	2016-07-01 11:13:10 +02:00
javanna	598c36128e	Revert "Raised IOException on deleteBlob (#18815 )" This reverts commit `d24cc65cad` as it seems to be causing test failures.	2016-07-01 11:00:32 +02:00
gfyoung	d24cc65cad	Raised IOException on deleteBlob (#18815 ) Raise IOException on deleteBlob if the blob doesn't exist This commit raises an IOException on BlobContainer#deleteBlob if the blob does not exist, in conformance with the BlobContainer interface contract. Each implementation of BlobContainer now conforms to this contract (file system, S3, Azure, HDFS). This commit also contains blob container tests for each of the repository implementations. Closes #18530	2016-06-30 23:00:10 -04:00
Nik Everett	f5a269b029	Start migration away from aggregation streams We'll migrate to NamedWriteable so we can share code with the rest of the system. So we can work on this in multiple pull requests without breaking Elasticsearch in between the commits this change supports both old style `InternalAggregations.stream` serialization and `NamedWriteable` style serialization. As such it creates about a half dozen `// NORELEASE` comments that will have to be removed once the migration is complete. This also introduces a boolean `transportClient` flag to `SearchModule` which is used to skip inappropriate registrations for for the transport client while still registering the things it needs. In this case that means that the `InternalAggregation` subclasses are registered with the `NamedWriteableRegistry` but the `AggregationBuilder` subclasses are not. Finally, this moves aggregation registration from guice configuration time to `SearchModule` construction time. This will make it simpler to work with in the future as we further clean up Elasticsearch's extension points.	2016-06-30 12:57:34 -04:00
Boaz Leskes	09ca6d6ed2	Add a BridgePartition to be used by testAckedIndexing (#19172 ) We have long worked to capture different partitioning scenarios in our testing infra. This PR adds a new variant, inspired by the Jepsen blogs, which was forgotten far - namely a partition where one node can still see and be seen by all other nodes. It also updates the resiliency page to better reflect all the work that was done in this area.	2016-06-30 17:58:12 +02:00
jaymode	983a64c833	Add support for `teardown` section in REST tests This commits adds support for a `teardown` section that can be defined in REST tests to clean up any items that may have been created by the test and are not cleaned up by deletion of indices and templates.	2016-06-30 11:33:29 -04:00
Ryan Ernst	0732004ae8	Merge pull request #19177 from rjernst/ingest_factory_generic Remove generics from ingest Processor.Factory	2016-06-30 08:08:26 -07:00
Simon Willnauer	40ec639c89	Factor out abstract TCPTransport* classes to reduce the netty footprint (#19096 ) Today we have a ton of logic inside the NettyTransport* codebase. The footprint of the code that has a direct netty dependency is large and alternative implementations are pretty hard today since they need to know all about our proticol etc. This change moves most of the code into TCPTransport* baseclasses and moves all the protocol send code together. The base classes now contain the majority of the logic while NettyTransport* classes remain to implement the glue code, configuration and optimization.	2016-06-30 13:41:53 +02:00
Ryan Ernst	e4f265eb3a	Ingest: Remove generics from Processor.Factory The factory for ingest processor is generic, but that is only for the return type of the create mehtod. However, the actual consumer of the factories only cares about Processor, so generics are not needed. This change removes the generic type from the factory. It also removes AbstractProcessorFactory which only existed in order pull the optional tag from config. This functionality is moved to the caller of the factories in ConfigurationUtil, and the create method now takes the tag. This allows the covariant return of the implementation to work with tests not needing casts.	2016-06-30 02:33:54 -07:00
Ryan Ernst	08b3b6264e	Tests pass, started removing generics from processor factory	2016-06-30 01:49:22 -07:00
Ryan Ernst	f1376262fe	Merge branch 'master' into ingest_plugin_api	2016-06-29 14:16:16 -07:00
Simon Willnauer	872cdffc27	Factor out ChannelBuffer from BytesReference (#19129 ) The ChannelBuffer interface today leaks into the BytesReference abstraction which causes a hard dependency on Netty across the board. This chance moves this dependency and all BytesReference -> ChannelBuffer conversion into NettyUtlis and removes the abstraction leak on BytesReference. This change also removes unused methods on the BytesReference interface and simplifies access to internal pages.	2016-06-29 10:45:05 +02:00
Ryan Ernst	258c3e86ab	Added IngestPlugin api, cutover common and geoip, changed ingest factory api to take ProcessorsRegistry	2016-06-28 10:52:07 -07:00
Yannick Welsch	3cc2251e33	Fix number of arguments provided to logger calls	2016-06-28 17:38:56 +02:00
Yannick Welsch	98276111e1	Re-enable logger usage checks It was inadvertently disabled after applying code review comments. This commit reenables the logger usage checker and makes it less naggy when encountering logging usages of the form logger.info(someStringBuilder). Previously it would fail with the error message "First argument must be a string constant so that we can statically ensure proper place holder usage". Now it will only fail in case any arguments are provided as well, for example logger.info(someStringBuilder, 42).	2016-06-28 16:48:05 +02:00
Boaz Leskes	2512594d9e	Testing infra - stablize data folder usage and clean up (#19111 ) The plan for persistent node ids ( #17811 ) is to tie the node identity to a file stored in it's data folders. As such it becomes important that nodes in our testing infra have better affinity with their data folders and that their data folders are not cleaned underneath them. The first is important because we fix the random seed used for node id generation (for reproducibility) and allowing the same node to use two different data folders causes two separate nodes to have the same id, which prevents the cluster from forming. The second is important, for example, where a full cluster restart / single node restart need to maintain node identity and wiping the data folders at the wrong moment prevents this. Concretely this commit does the following: 1) Remove previous attempts to have data folder per role using a prefix. This wasn't effective as it was using the data paths settings which are only used for part of the runs. An attempt to completely separate the paths via the home dir failed due to assumptions made by index custom path about node data folder ordinal uniqueness (see #19076) 2) Change full cluster restarts to start up nodes in the same order their were first created in, only randomly swapping nodes with the same roles. 3) Change test cluster reset methods to first shutdown the unneeded nodes and then re-start the shared nodes that were shut down, so they'll reclaim their data folders. 4) Improve data folder wiping logic and make sure it wipes only folders of "offline" nodes. 5) Add some very basic tests	2016-06-28 16:38:56 +02:00
Jason Tedor	2f638b5a23	Keep input time unit when parsing TimeValues This commit modifies TimeValue parsing to keep the input time unit. This enables round-trip parsing from instances of String to instances of TimeValue and vice-versa. With this, this commit removes support for the unit "w" representing weeks, and also removes support for fractional values of units (e.g., 0.5s). Relates #19102	2016-06-27 18:41:18 -04:00
Nik Everett	79fa778e33	Fix percolator tests They need their plugin or they'll break!	2016-06-27 15:34:36 -04:00
Ryan Ernst	33ccc5aead	Merge branch 'master' into mapper_plugin_api	2016-06-27 11:19:59 -07:00
Boaz Leskes	cb0824e957	Make shard store fetch less dependent on the current cluster state, both on master and non data nodes (#19044 ) #18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted. These are the notable changes: 1) Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed. 2) The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed. 3) The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders. 4) I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.	2016-06-27 15:05:06 +02:00
Nik Everett	71b95fb63c	Switch analysis from push to pull Instead of plugins calling `registerTokenizer` to extend the analyzer they now instead have to implement `AnalysisPlugin` and override `getTokenizer`. This lines up extending plugins in with extending scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry` immediately as part of its constructor which makes testing anslysis much simpler. This also moves the default analysis configuration into `AnalysisModule` which is how search is setup. Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`. Instead it is only responsible for building `AnslysisRegistry`. We still bind `AnalysisRegistry` but we only do so in `Node`. This is means it is available at module construction time so we slowly remove the need to bind it in guice.	2016-06-26 07:15:42 -04:00
Ryan Ernst	6995bde710	Merge branch 'master' into mapper_plugin_api	2016-06-24 11:15:06 -07:00
Yannick Welsch	a5908a5da5	[TEST] Increase timeouts for Rest test client (#19042 ) Some Rest / Doc tests were running into the default socket timeout of 10 seconds.	2016-06-23 14:05:56 +02:00
Adrien Grand	7ba5bceebe	Add a MultiTermAwareComponent marker interface to analysis factories. #19028 This is the same as what Lucene does for its analysis factories, and we hawe tests that make sure that the elasticsearch factories are in sync with Lucene's. This is a first step to move forward on #9978 and #18064.	2016-06-23 10:19:24 +02:00
Tanguy Leroux	04da1bda0d	Move templates out of the Search API, into lang-mustache module This commit moves template support out of the Search API to its own dedicated Search Template API in the lang-mustache module. It provides a new SearchTemplateAction that can be used to render templates before it gets delegated to the usual Search API. The current REST endpoint are identical, but the Render Search Template endpoint now uses the same Search Template API with a new "simulate" option. When this option is enabled, the Search Template API only renders template and returns immediatly, without executing the search. Closes #17906	2016-06-23 09:30:53 +02:00
Nik Everett	0bf447c697	Group client projects under :client :client ---------> :client:rest :client-sniffer -> :client:sniffer :client-test ----> :client:test This lines the client up with how we do things like modules and plugins.	2016-06-22 14:26:41 -04:00

1 2 3 4 5 ...

461 Commits