Commit Graph

29 Commits

Author SHA1 Message Date
Nik Everett bd4b9dd10e
Speed up time interval arounding around dst (backport #56371) (#56396)
When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In #55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.

The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
2020-05-08 13:39:27 -04:00
Nik Everett e35919d3b8
Optimize date_histograms across daylight savings time (backport of #55559) (#56334)
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.

Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)

I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.

This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.

This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.

Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
2020-05-07 09:10:51 -04:00
Tanguy Leroux 4d36917e52
Merge feature/searchable-snapshots branch into 7.x (#54803) (#54825)
This is a backport of #54803 for 7.x.

This pull request cherry picks the squashed commit from #54803 with the additional commits:

    6f50c92 which adjusts master code to 7.x
    a114549 to mute a failing ILM test (#54818)
    48cbca1 and 50186b2 that cleans up and fixes the previous test
    aae12bb that adds a missing feature flag (#54861)
    6f330e3 that adds missing serialization bits (#54864)
    bf72c02 that adjust the version in YAML tests
    a51955f that adds some plumbing for the transport client used in integration tests

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-07 13:28:53 +02:00
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Rory Hunter d863c510da
Autoformat :qa:os and :benchmarks (#52816)
Add `:qa:os` and `:benchmarks` to the list of automatically formatted
projects, and apply some manual fix-ups to polish it up.

In particular, I noticed that `Files.write(...)` when passed a list will
automaticaly apply a UTF-8 encoding and write a newline after each line,
making it easier to use than FileUtils.append. It's even available from
1.8.

Also, in the Allocators class, a number of methods declared thrown exceptions that IntelliJ reported were never thrown, and as far as I could see this is true, so I removed the exceptions.
2020-02-28 14:48:04 +00:00
Jason Tedor 5bc3b7f741
Enable node roles to be pluggable (#43175)
This commit introduces the possibility for a plugin to introduce
additional node roles.
2019-06-13 15:15:48 -04:00
Alexander Reelsen e7868e92bd
Restore date aggregation performance in UTC case (#38221) (#38700)
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.

This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.

Closes #37826
2019-02-11 16:30:48 +03:00
Alexander Reelsen 9f026bb8ad
Reduce object creation in Rounding class (#38061)
This reduces objects creations in the rounding class (used by aggs) by properly
creating the objects only once. Furthermore a few unneeded ZonedDateTime objects
were created in order to create other objects out of them. This was
changed as well.

Running the benchmarks shows a much faster performance for all of the
java time based Rounding classes.
2019-01-31 14:18:28 +01:00
Alexander Reelsen b94acb608b
Speed up converting of temporal accessor to zoned date time (#37915)
The existing implementation was slow due to exceptions being thrown if
an accessor did not have a time zone. This implementation queries for
having a timezone, local time and local date and also checks for an
instant preventing to throw an exception and thus speeding up the conversion.

This removes the existing method and create a new one named
DateFormatters.from(TemporalAccessor accessor) to resemble the naming of
the java time ones.

Before this change an epoch millis parser using the toZonedDateTime
method took approximately 50x longer.

Relates #37826
2019-01-31 08:55:40 +01:00
Alexander Reelsen daa2ec8a60
Switch mapping/aggregations over to java time (#36363)
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.

The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.

Relates #27330
2019-01-23 10:40:05 +01:00
Alexander Reelsen 9f3da013d8
Date/Time parsing: Use java time API instead of exception handling (#37222)
* Add benchmark

* Use java time API instead of exception handling

when several formatters are used, the existing way of parsing those is
to throw an exception catch it, and try the next one. This is is
considerably slower than the approach taken in joda time, so that
indexing is reduced when a date format like `x||y` is used and y is the
date format being used.

This commit now uses the java API to parse the date by appending the
date time formatters to each other and does not rely on exception
handling.

* fix benchmark

* fix tests by changing formatter, also expose printer

* restore optional printing logic to fix tests

* fix tests

* incorporate review comments
2019-01-11 09:25:05 +01:00
Nik Everett e28509fbfe
Core: Less settings to AbstractComponent (#35140)
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.

I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler

These files don't *need* `settings` either, but this change is large
enough as is.

Relates to #34488
2018-10-31 21:23:20 -04:00
Yannick Welsch 49cbcaff4f
Allow excluding folder names when scanning for dangling indices (#34349)
ES is scanning for dangling indices on every cluster state update. For this, it lists the subfolders of
the indices directory to determine which extra index directories exist on the node where there's no
corresponding index in the cluster state. These are potential targets for dangling index import. On
certain machine types, and with large number of indices, this subfolder listing can be horribly slow.
This means that every cluster state update will be slowed down by potentially hundreds of
milliseconds. One of the reasons for this poor performance is that Files.isDirectory() is a relatively
expensive call on some OS and JDK versions. There is no need though to do all these isDirectory
calls for folders which we know we are going to discard anyhow in the next step of the dangling
indices logic. This commit allows adding an exclusion predicate to the availableIndexFolders
methods which can dramatically speed up this method when scanning for dangling indices.
2018-10-08 15:35:50 +02:00
Nik Everett 22459576d7
Logging: Make node name consistent in logger (#31588)
First, some background: we have 15 different methods to get a logger in
Elasticsearch but they can be broken down into three broad categories
based on what information is provided when building the logger.

Just a class like:
```
private static final Logger logger = ESLoggerFactory.getLogger(ActionModule.class);
```
or:
```
protected final Logger logger = Loggers.getLogger(getClass());
```

The class and settings:
```
this.logger = Loggers.getLogger(getClass(), settings);
```

Or more information like:
```
Loggers.getLogger("index.store.deletes", settings, shardId)
```

The goal of the "class and settings" variant is to attach the node name
to the logger. Because we don't always have the settings available, we
often use the "just a class" variant and get loggers without node names
attached. There isn't any real consistency here. Some loggers get the
node name because it is convenient and some do not.

This change makes the node name available to all loggers all the time.
Almost. There are some caveats are testing that I'll get to. But in
*production* code the node name is node available to all loggers. This
means we can stop using the "class and settings" variants to fetch
loggers which was the real goal here, but a pleasant side effect is that
the ndoe name is now consitent on every log line and optional by editing
the logging pattern. This is all powered by setting the node name
statically on a logging formatter very early in initialization.

Now to tests: tests can't set the node name statically because
subclasses of `ESIntegTestCase` run many nodes in the same jvm, even in
the same class loader. Also, lots of tests don't run with a real node so
they don't *have* a node name at all. To support multiple nodes in the
same JVM tests suss out the node name from the thread name which works
surprisingly well and easy to test in a nice way. For those threads
that are not part of an `ESIntegTestCase` node we stick whatever useful
information we can get form the thread name in the place of the node
name. This allows us to keep the logger format consistent.
2018-07-31 10:54:24 -04:00
Daniel Mitterdorfer f174f72fee
Circuit-break based on real memory usage
With this commit we introduce a new circuit-breaking strategy to the parent
circuit breaker. Contrary to the current implementation which only accounts for
memory reserved via child circuit breakers, the new strategy measures real heap
memory usage at the time of reservation. This allows us to be much more
aggressive with the circuit breaker limit so we bump it to 95% by default. The
new strategy is turned on by default and can be controlled  with the new cluster
setting `indices.breaker.total.userealmemory`.

Note that we turn it off for all integration tests with an internal test cluster
because it leads to spurious test failures which are of no value (we cannot
fully control heap memory usage in tests). All REST tests, however, will make
use of the real memory circuit breaker.

Relates #31767
2018-07-13 10:08:28 +02:00
Yannick Welsch c8712e9531 Limit AllocationService dependency injection hack (#24479)
Changes the scope of the AllocationService dependency injection hack so that it is at least contained to the AllocationService and does not leak into the Discovery world.
2017-05-05 08:39:18 +02:00
Daniel Mitterdorfer 087a931cb2 Use 'pipe' instead of of 'comma' to separate benchmark params
With this commit we separate benchmark parameters with pipe symbols
instead of commas as JMH has a special formatting logic for comma-separated
string which messes up the JSON output of microbenchmarks.
2016-10-10 14:56:44 +02:00
Simon Willnauer 194a6b1df0 Remove LocalTransport in favor of MockTcpTransport (#20695)
This change proposes the removal of all non-tcp transport implementations. The
mock transport can be used by default to run tests instead of local transport that has
roughly the same performance compared to TCP or at least not noticeably slower.

This is a master only change, deprecation notice in 5.x will be committed as a
separate change.
2016-10-07 11:27:47 +02:00
Ali Beyad ac1b13dde7 Changes the API of GatewayAllocator#applyStartedShards and (#20642)
Changes the API of GatewayAllocator#applyStartedShards and 
GatewayAllocator#applyFailedShards to take both a RoutingAllocation
and a list of shards to apply. This allows better mock allocators
to be created as being done in #20637.

Closes #20642
2016-09-23 09:31:46 -04:00
Ali Beyad 029fc909b5 Removes FailedRerouteAllocation and StartedRerouteAllocation
Removes the FailedRerouteAllocation class and StartedRerouteAllocation
class, as they were just wrappers for RerouteAllocation that stored
started and failed shards, but these started and failed shards can
be passed in directly to the methods that needed them, removing the
need for this wrapper class and extra level of indirection.

Closes #20626
2016-09-23 09:02:36 -04:00
Daniel Mitterdorfer 9d8961aeb9 Provide log4j2 logging config for microbenchmarks 2016-09-19 14:28:16 +02:00
Boaz Leskes 2ee9ab25d9 Remove `RoutingAllocation.Result` (#20538)
Currently all the reroute-like methods of `AllocationService` return a result object of type `RoutingAllocation.Result`. The result object contains the new `RoutingTable` and `MetaData` plus an indication whether those were changed. The caller is then responsible of updating a cluster state with these. These means that things can easily go wrong and one can take one of these but not the other causing inconsistencies. We already have a utility method on the `ClusterState` builder that does but no one forces you to do so. Also 99% of the callers do the same thing: i.e., check if the result was changed and if so update the very same cluster state that was passed to `AllocationService`.  This PR folds this pattern into `AllocationService` and changes almost all it's methods to return a new cluster state (potentially the original one).  This saves some 500 lines of code.

The one exception here is the reroute API which executes allocation commands and potentially returns an explanation as well (next to the routing table and metadata). That API now returns a `CommandsResult` object which encapsulate a cluster state and the explanation.
2016-09-19 13:54:35 +02:00
Ryan Ernst 1ff348ed7f Plugins: Make custom allocation deciders use pull based extensions
This change converts AllocationDecider registration from push based on
ClusterModule to implementing with a new ClusterPlugin interface.
AllocationDecider instances are allowed to use only Settings and
ClusterSettings.
2016-08-17 15:55:31 -07:00
Yannick Welsch 27a760f9c1 Add routing changes API to RoutingAllocation (#19992)
Adds a class that records changes made to RoutingAllocation, so that at the end of the allocation round other values can be more easily derived based on these changes. Most notably, it:

- replaces the explicit boolean flag that is passed around everywhere to denote changes to the routing table. The boolean flag is automatically updated now when changes actually occur, preventing issues where it got out of sync with actual changes to the routing table.
- records actual changes made to RoutingNodes so that primary term and in-sync allocation ids, which are part of index metadata, can be efficiently updated just by looking at the shards that were actually changed.
2016-08-17 10:46:59 +02:00
Boaz Leskes 609a199bd4 Upon being elected as master, prefer joins' node info to existing cluster state (#19743)
When we introduces [persistent node ids](https://github.com/elastic/elasticsearch/pull/19140) we were concerned that people may copy data folders from one to another resulting in two nodes competing for the same id in the cluster. To solve this we elected to not allow an incoming join if a different with same id already exists in the cluster, or if some other node already has the same transport address as the incoming join. The rationeel there was that it is better to prefer existing nodes and that we can rely on node fault detection to remove any node from the cluster that isn't correct any more, making room for the node that wants to join (and will keep trying).

Sadly there were two problems with this:
1) One minor and easy to fix - we didn't allow for the case where the existing node can have the same network address as the incoming one, but have a different ephemeral id (after node restart). This confused the logic in `AllocationService`, in this rare cases. The cluster is good enough to detect this and recover later on, but it's not clean.
2) The assumption that Node Fault Detection will clean up is *wrong* when the node just won an election (it wasn't master before) and needs to process the incoming joins in order to commit the cluster state and assume it's mastership. In those cases, the Node Fault Detection isn't active. 

This PR fixes these two and prefers incoming nodes to existing node when finishing an election. 
On top of the, on request by @ywelsch , `AllocationService` synchronization between the nodes of the cluster and it's routing table is now explicit rather than something we do all the time. The same goes for promotion of replicas to primaries.
2016-08-05 08:58:03 +02:00
Boaz Leskes 6861d3571e Persistent Node Ids (#19140)
Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys.

The first approach I considered was to use the node's published address as the base for the id. We already [treat nodes with the same address as the same](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java#L387) so this is a simple change (see [here](https://github.com/elastic/elasticsearch/compare/master...bleskes:node_persistent_id_based_on_address)). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same. 

Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I

It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join.

Last, and most importantly, this change requires that *all* nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking.

Other less important notes:
- DummyTransportAddress is removed as we need a unique network address per node. Use `LocalTransportAddress.buildUnique()` instead.
- I renamed `node.add_lid_to_custom_path` to `node.add_lock_id_to_custom_path` to avoid confusion with the node ID which is now part of the `NodeEnvironment` logic.
- I removed the `version` paramater from `MetaDataStateFormat#write` , it wasn't really used and was just in the way :)
- TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.
2016-07-04 21:09:25 +02:00
Simon Willnauer bdb6dcea3a Cleanup ClusterService dependencies and detached from Guice (#18941)
This change removes some unnecessary dependencies from ClusterService
and cleans up ClusterName creation. ClusterService is now not created
by guice anymore.
2016-06-17 17:07:19 +02:00
Daniel Mitterdorfer d56e4bc7b1 Remove obsolete benchmarks / comments 2016-06-15 16:54:54 +02:00
Daniel Mitterdorfer 2c467fd9c2 Add microbenchmarking infrastructure (#18891)
With this commit we add a benchmarks project that contains the necessary build
infrastructure and an example benchmark. It is added as a separate project to avoid
interfering with the regular build too much (especially sanity checks) and to keep
the microbenchmarks isolated.

Microbenchmarks are generated with `gradle :benchmarks:jmhJar` and can be run with
` gradle :benchmarks:jmh`.

We intentionally do not use the
[jmh-gradle-plugin](https://github.com/melix/jmh-gradle-plugin) as it causes all
sorts of problems (dependencies are not properly excluded, not all JMH parameters
can be set) and it adds another abstraction layer that is not needed.

Closes #18242
2016-06-15 16:48:02 +02:00