OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	5bc3b7f741	Enable node roles to be pluggable (#43175 ) This commit introduces the possibility for a plugin to introduce additional node roles.	2019-06-13 15:15:48 -04:00
Alexander Reelsen	e7868e92bd	Restore date aggregation performance in UTC case (#38221 ) (#38700 ) The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes #37826	2019-02-11 16:30:48 +03:00
Alexander Reelsen	9f026bb8ad	Reduce object creation in Rounding class (#38061 ) This reduces objects creations in the rounding class (used by aggs) by properly creating the objects only once. Furthermore a few unneeded ZonedDateTime objects were created in order to create other objects out of them. This was changed as well. Running the benchmarks shows a much faster performance for all of the java time based Rounding classes.	2019-01-31 14:18:28 +01:00
Alexander Reelsen	b94acb608b	Speed up converting of temporal accessor to zoned date time (#37915 ) The existing implementation was slow due to exceptions being thrown if an accessor did not have a time zone. This implementation queries for having a timezone, local time and local date and also checks for an instant preventing to throw an exception and thus speeding up the conversion. This removes the existing method and create a new one named DateFormatters.from(TemporalAccessor accessor) to resemble the naming of the java time ones. Before this change an epoch millis parser using the toZonedDateTime method took approximately 50x longer. Relates #37826	2019-01-31 08:55:40 +01:00
Alexander Reelsen	daa2ec8a60	Switch mapping/aggregations over to java time (#36363 ) This commit moves the aggregation and mapping code from joda time to java time. This includes field mappers, root object mappers, aggregations with date histograms, query builders and a lot of changes within tests. The cut-over to java time is a requirement so that we can support nanoseconds properly in a future field mapper. Relates #27330	2019-01-23 10:40:05 +01:00
Alexander Reelsen	9f3da013d8	Date/Time parsing: Use java time API instead of exception handling (#37222 ) * Add benchmark * Use java time API instead of exception handling when several formatters are used, the existing way of parsing those is to throw an exception catch it, and try the next one. This is is considerably slower than the approach taken in joda time, so that indexing is reduced when a date format like `x\|\|y` is used and y is the date format being used. This commit now uses the java API to parse the date by appending the date time formatters to each other and does not rely on exception handling. * fix benchmark * fix tests by changing formatter, also expose printer * restore optional printing logic to fix tests * fix tests * incorporate review comments	2019-01-11 09:25:05 +01:00
Nik Everett	e28509fbfe	Core: Less settings to AbstractComponent (#35140 ) Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to stop passing around `Settings` in a ton of places. While this change touches many files, it touches them all in fairly small, mechanical ways, doing a few things per file: 1. Drop the `super(settings);` line on everything that extends `AbstractComponent`. 2. Drop the `settings` argument to the ctor if it is no longer used. 3. If the file doesn't use `logger` then drop `extends AbstractComponent` from it. 4. Clean up all compilation failure caused by the `settings` removal and drop any now unused `settings` isntances and method arguments. I've intentionally not removed the `settings` argument from a few files: 1. TransportAction 2. AbstractLifecycleComponent 3. BaseRestHandler These files don't need `settings` either, but this change is large enough as is. Relates to #34488	2018-10-31 21:23:20 -04:00
Yannick Welsch	49cbcaff4f	Allow excluding folder names when scanning for dangling indices (#34349 ) ES is scanning for dangling indices on every cluster state update. For this, it lists the subfolders of the indices directory to determine which extra index directories exist on the node where there's no corresponding index in the cluster state. These are potential targets for dangling index import. On certain machine types, and with large number of indices, this subfolder listing can be horribly slow. This means that every cluster state update will be slowed down by potentially hundreds of milliseconds. One of the reasons for this poor performance is that Files.isDirectory() is a relatively expensive call on some OS and JDK versions. There is no need though to do all these isDirectory calls for folders which we know we are going to discard anyhow in the next step of the dangling indices logic. This commit allows adding an exclusion predicate to the availableIndexFolders methods which can dramatically speed up this method when scanning for dangling indices.	2018-10-08 15:35:50 +02:00
Daniel Mitterdorfer	f174f72fee	Circuit-break based on real memory usage With this commit we introduce a new circuit-breaking strategy to the parent circuit breaker. Contrary to the current implementation which only accounts for memory reserved via child circuit breakers, the new strategy measures real heap memory usage at the time of reservation. This allows us to be much more aggressive with the circuit breaker limit so we bump it to 95% by default. The new strategy is turned on by default and can be controlled with the new cluster setting `indices.breaker.total.userealmemory`. Note that we turn it off for all integration tests with an internal test cluster because it leads to spurious test failures which are of no value (we cannot fully control heap memory usage in tests). All REST tests, however, will make use of the real memory circuit breaker. Relates #31767	2018-07-13 10:08:28 +02:00
Yannick Welsch	c8712e9531	Limit AllocationService dependency injection hack (#24479 ) Changes the scope of the AllocationService dependency injection hack so that it is at least contained to the AllocationService and does not leak into the Discovery world.	2017-05-05 08:39:18 +02:00
Daniel Mitterdorfer	087a931cb2	Use 'pipe' instead of of 'comma' to separate benchmark params With this commit we separate benchmark parameters with pipe symbols instead of commas as JMH has a special formatting logic for comma-separated string which messes up the JSON output of microbenchmarks.	2016-10-10 14:56:44 +02:00
Simon Willnauer	194a6b1df0	Remove LocalTransport in favor of MockTcpTransport (#20695 ) This change proposes the removal of all non-tcp transport implementations. The mock transport can be used by default to run tests instead of local transport that has roughly the same performance compared to TCP or at least not noticeably slower. This is a master only change, deprecation notice in 5.x will be committed as a separate change.	2016-10-07 11:27:47 +02:00
Ali Beyad	ac1b13dde7	Changes the API of GatewayAllocator#applyStartedShards and (#20642 ) Changes the API of GatewayAllocator#applyStartedShards and GatewayAllocator#applyFailedShards to take both a RoutingAllocation and a list of shards to apply. This allows better mock allocators to be created as being done in #20637. Closes #20642	2016-09-23 09:31:46 -04:00
Ali Beyad	029fc909b5	Removes FailedRerouteAllocation and StartedRerouteAllocation Removes the FailedRerouteAllocation class and StartedRerouteAllocation class, as they were just wrappers for RerouteAllocation that stored started and failed shards, but these started and failed shards can be passed in directly to the methods that needed them, removing the need for this wrapper class and extra level of indirection. Closes #20626	2016-09-23 09:02:36 -04:00
Boaz Leskes	2ee9ab25d9	Remove `RoutingAllocation.Result` (#20538 ) Currently all the reroute-like methods of `AllocationService` return a result object of type `RoutingAllocation.Result`. The result object contains the new `RoutingTable` and `MetaData` plus an indication whether those were changed. The caller is then responsible of updating a cluster state with these. These means that things can easily go wrong and one can take one of these but not the other causing inconsistencies. We already have a utility method on the `ClusterState` builder that does but no one forces you to do so. Also 99% of the callers do the same thing: i.e., check if the result was changed and if so update the very same cluster state that was passed to `AllocationService`. This PR folds this pattern into `AllocationService` and changes almost all it's methods to return a new cluster state (potentially the original one). This saves some 500 lines of code. The one exception here is the reroute API which executes allocation commands and potentially returns an explanation as well (next to the routing table and metadata). That API now returns a `CommandsResult` object which encapsulate a cluster state and the explanation.	2016-09-19 13:54:35 +02:00
Ryan Ernst	1ff348ed7f	Plugins: Make custom allocation deciders use pull based extensions This change converts AllocationDecider registration from push based on ClusterModule to implementing with a new ClusterPlugin interface. AllocationDecider instances are allowed to use only Settings and ClusterSettings.	2016-08-17 15:55:31 -07:00
Yannick Welsch	27a760f9c1	Add routing changes API to RoutingAllocation (#19992 ) Adds a class that records changes made to RoutingAllocation, so that at the end of the allocation round other values can be more easily derived based on these changes. Most notably, it: - replaces the explicit boolean flag that is passed around everywhere to denote changes to the routing table. The boolean flag is automatically updated now when changes actually occur, preventing issues where it got out of sync with actual changes to the routing table. - records actual changes made to RoutingNodes so that primary term and in-sync allocation ids, which are part of index metadata, can be efficiently updated just by looking at the shards that were actually changed.	2016-08-17 10:46:59 +02:00
Boaz Leskes	609a199bd4	Upon being elected as master, prefer joins' node info to existing cluster state (#19743 ) When we introduces [persistent node ids](https://github.com/elastic/elasticsearch/pull/19140) we were concerned that people may copy data folders from one to another resulting in two nodes competing for the same id in the cluster. To solve this we elected to not allow an incoming join if a different with same id already exists in the cluster, or if some other node already has the same transport address as the incoming join. The rationeel there was that it is better to prefer existing nodes and that we can rely on node fault detection to remove any node from the cluster that isn't correct any more, making room for the node that wants to join (and will keep trying). Sadly there were two problems with this: 1) One minor and easy to fix - we didn't allow for the case where the existing node can have the same network address as the incoming one, but have a different ephemeral id (after node restart). This confused the logic in `AllocationService`, in this rare cases. The cluster is good enough to detect this and recover later on, but it's not clean. 2) The assumption that Node Fault Detection will clean up is wrong when the node just won an election (it wasn't master before) and needs to process the incoming joins in order to commit the cluster state and assume it's mastership. In those cases, the Node Fault Detection isn't active. This PR fixes these two and prefers incoming nodes to existing node when finishing an election. On top of the, on request by @ywelsch , `AllocationService` synchronization between the nodes of the cluster and it's routing table is now explicit rather than something we do all the time. The same goes for promotion of replicas to primaries.	2016-08-05 08:58:03 +02:00
Boaz Leskes	6861d3571e	Persistent Node Ids (#19140 ) Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys. The first approach I considered was to use the node's published address as the base for the id. We already [treat nodes with the same address as the same](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java#L387) so this is a simple change (see [here](https://github.com/elastic/elasticsearch/compare/master...bleskes:node_persistent_id_based_on_address)). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same. Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join. Last, and most importantly, this change requires that all nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking. Other less important notes: - DummyTransportAddress is removed as we need a unique network address per node. Use `LocalTransportAddress.buildUnique()` instead. - I renamed `node.add_lid_to_custom_path` to `node.add_lock_id_to_custom_path` to avoid confusion with the node ID which is now part of the `NodeEnvironment` logic. - I removed the `version` paramater from `MetaDataStateFormat#write` , it wasn't really used and was just in the way :) - TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.	2016-07-04 21:09:25 +02:00
Simon Willnauer	bdb6dcea3a	Cleanup ClusterService dependencies and detached from Guice (#18941 ) This change removes some unnecessary dependencies from ClusterService and cleans up ClusterName creation. ClusterService is now not created by guice anymore.	2016-06-17 17:07:19 +02:00
Daniel Mitterdorfer	d56e4bc7b1	Remove obsolete benchmarks / comments	2016-06-15 16:54:54 +02:00
Daniel Mitterdorfer	2c467fd9c2	Add microbenchmarking infrastructure (#18891 ) With this commit we add a benchmarks project that contains the necessary build infrastructure and an example benchmark. It is added as a separate project to avoid interfering with the regular build too much (especially sanity checks) and to keep the microbenchmarks isolated. Microbenchmarks are generated with `gradle :benchmarks:jmhJar` and can be run with ` gradle :benchmarks:jmh`. We intentionally do not use the [jmh-gradle-plugin](https://github.com/melix/jmh-gradle-plugin) as it causes all sorts of problems (dependencies are not properly excluded, not all JMH parameters can be set) and it adds another abstraction layer that is not needed. Closes #18242	2016-06-15 16:48:02 +02:00

22 Commits