OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nhat Nguyen	5bc8a859c6	Remove ban tasks with the current thread context (#55404 ) If we start unbanning when the last child task completed and that child task executed with a specific user, then unban requests are denied because internal requests can't run with a user. We need to remove bans with the current thread context.	2020-04-17 13:49:19 -04:00
Andrei Dan	727ff2fe82	[7.x] Handle v2 template with index.hidden setting (#55015 ) (#55400 ) This validates that if the winner v2 template is a global one, it doesn't specify the index.hidden setting. (cherry picked from commit 19a97f76aac73e0455053097e5391165a9357427) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-04-17 16:48:28 +01:00
Martijn van Groningen	c584c14a32	Enable feature enabled flags for java integration tests (#55373 ) (#55398 ) Enabled data streams and itv2 feature enabled system properties in server module's integ test task. PR #54726 added java integration tests for data streams, so this is why these system properties need to be enabled when running release build.	2020-04-17 16:46:49 +02:00
James Rodewig	13426ce92f	[DOCS] Fix typo in search scroll comments (#55096 ) Co-authored-by: ScriptShi <xjtushilei@foxmail.com>	2020-04-17 09:37:32 -04:00
Armin Braun	60b8a5daba	Exclude Snapshot Shard Status Update Requests from Circuit Breaker (#55376 ) (#55383 ) Hotfix to not run into stuck snapshots because of master circuit breaking these requests. Given that these requests are very small and much of the memory associated with them is already allocated when the circuit breaker kicks in, the risk of this change introducing a higher chance of master running out of memory should be very small. Closes #54714	2020-04-17 13:49:36 +02:00
Tanguy Leroux	e77256c50f	Mute more testSupportedFieldTypes tests (#55374 ) Relates #55360	2020-04-17 13:15:08 +02:00
Christoph Büscher	389b6492a4	Fix test failure in FunctionScoreQueryBuilderTests.testCacheability (#55343 ) We rewrite more query builders to MatchNoneQueryBuilders now, which are always cacheable. We should make sure the tests expects this when the rewritten query is a MatchNoneQueryBuilder. Closes #55331	2020-04-17 11:25:06 +02:00
Martijn van Groningen	417d5f2009	Make data streams in APIs resolvable. (#55337 ) Backport from: #54726 The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used. In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs. Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex(). If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api). Relates to #53100	2020-04-17 08:33:37 +02:00
Ryan Ernst	62246aa9c9	Add generic Set support to streams (#54769 ) (#55123 ) This commit adds support for reading and writing sets as generic values in stream input and output. closes #54708	2020-04-16 14:29:38 -07:00
Mark Tozzi	22c55180c1	[7.x] Backport ValuesSourceRegistry and related work (#54922 ) * Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>	2020-04-16 16:54:46 -04:00
Julie Tibshirani	d7cded8d7a	Fix updating include_in_parent/include_in_root of nested field. (#55326 ) The main changes are: 1. Throw an error when updating `include_in_parent` or `include_in_root` attribute of nested field dynamically by the PUT mapping API. 2. Add a test for the change. Closes #53792 Co-authored-by: bellengao <gbl_long@163.com>	2020-04-16 11:17:12 -07:00
Christoph Büscher	4d849f0948	Fix creating filtered alias using now in a date_nanos range query failed (#54785 ) (#55329 ) Modify the value of nowInMillis in queryShardContext to current timestamp, because the value will be used lately when validating the filtered alias which uses now in a date_nanos range query.	2020-04-16 19:47:53 +02:00
Christos Soulios	5856344691	[7.x] Add supported-type tests to avg aggregation (#55283 )	2020-04-16 20:02:31 +03:00
Lee Hinman	8b7bdae6cb	Ensure error handler is called during SLM retention callback failure (#55252 ) (#55321 ) When retrieving the snapshots for a set of repos or deleting a single snapshot, it's possible for the body of the `ActionListener`'s `onResponse` method to throw an Exception. In this case, the `errHandler` passed in may not be executed, resulting in the `running` boolean not being reset back to false. This commit uses `ActionListener.wrap(...)` instead of creating a new ActionListener, which ensures that if the `onResponse` fails in any way, the `onFailure` handler is still called. Resolves #55217	2020-04-16 10:50:15 -06:00
Julie Tibshirani	9c2865b28d	Rewrite wrapper queries to match_none if possible. (#55320 ) Queries like script_score wrap a query and modify its score. If the inner query rewrites to match_none, then the entire query can rewrite to match_none. This lets us detect that certain shards can be skipped during the 'can match' phase. This was a simple change that seemed like it would help in some cases. But it will likely not have a huge impact, since in many use cases where the 'can match' phase is helpful, the search is not sorted by score.	2020-04-16 09:43:11 -07:00
David Turner	7941f4a47e	Add RepositoriesService to createComponents() args (#54814 ) Today we pass the `RepositoriesService` to the searchable snapshots plugin during the initialization of the `RepositoryModule`, forcing the plugin to be a `RepositoryPlugin` even though it does not implement any repositories. After discussion we decided it best for now to pass this in via `Plugin#createComponents` instead, pending some future work in which plugins can depend on services more dynamically.	2020-04-16 16:27:36 +01:00
Dan Hermann	e89c5d6850	[7.x] Allow transactional metadata update with index creation (#55308 )	2020-04-16 09:28:07 -05:00
David Turner	8a565c4fa6	Voting config exclusions should work with absent nodes (#55291 ) Today the voting config exclusions API accepts node filters and resolves them to a collection of node IDs against the current cluster membership. This is problematic since we may want to exclude nodes that are not currently members of the cluster. For instance: - if attempting to remove a flaky node from the cluster you cannot reliably exclude it from the voting configuration since it may not reliably be a member of the cluster - if `cluster.auto_shrink_voting_configuration: false` then naively shrinking the cluster will remove some nodes but will leaving their node IDs in the voting configuration. The only way to clean up the voting configuration is to grow the cluster back to its original size (potentially replacing some of the voting configuration) and then use the exclusions API. This commit adds an alternative API that accepts node names and node IDs but not node filters in general, and deprecates the current node-filters-based API. Relates #47990. Backport of #50836 to 7.x. Co-authored-by: zacharymorn <zacharymorn@gmail.com>	2020-04-16 12:28:50 +01:00
Christos Soulios	2a56a3a1f3	Add tests to MedianAbsoluteDeviationAggregator (#54884 ) (#55282 )	2020-04-16 13:46:09 +03:00
Christos Soulios	6a0eebf1d7	Add supported type tests to min aggregation (#54021 ) (#55284 )	2020-04-16 13:45:56 +03:00
Ignacio Vera	8afc8e6f89	enable BWC tests after lucene upgrade (#55247 ) (#55254 )	2020-04-16 07:10:15 +02:00
Jay Modi	2d9e3c7794	Start resource watcher service early (#55275 ) The ResourceWatcherService enables watching of files for modifications and deletions. During startup various consumers register the files that should be watched by this service. There is behavior that might be unexpected in that the service may not start polling until later in the startup process due to the use of lifecycle states to control when the service actually starts the jobs to monitor resources. This change removes this unexpected behavior so that upon construction the service has already registered its tasks to poll resources for changes. In making this modification, the service no longer extends AbstractLifecycleComponent and instead implements the Closeable interface so that the polling jobs can be terminated when the service is no longer required. Relates #54867 Backport of #54993	2020-04-15 20:45:39 -06:00
Julie Tibshirani	f75e3b4551	Remove an unused method on LeafDocLookup.	2020-04-15 16:58:49 -07:00
Nik Everett	a2ff2331bf	Add explict tests for auto_date_histogram (backport of #54819 ) (#55262 ) `auto_date_histogram`'s reduction behavior is fairly complex and we have some fairly complex testing logic for it but it is super difficult to look at that testing logic and say "ah, that is what it does in this case". This adds some tests explicit (non-randomized) tests of the reduction logic that should be easier to read.	2020-04-15 19:09:46 -04:00
William Brafford	2ba3be9db6	Remove deprecated third-party methods from tests (#55255 ) (#55269 ) I've noticed that a lot of our tests are using deprecated static methods from the Hamcrest matchers. While this is not a big deal in any objective sense, it seems like a small good thing to reduce compilation warnings and be ready for a new release of the matcher library if we need to upgrade. I've also switched a few other methods in tests that have drop-in replacements.	2020-04-15 17:54:47 -04:00
Stéphane Campinas	6ef1c64760	SearchService#canMatch takes into consideration the alias filter (#55120 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-04-15 22:33:40 +02:00
Ryan Ernst	29b70733ae	Use task avoidance with forbidden apis (#55034 ) Currently forbidden apis accounts for 800+ tasks in the build. These tasks are aggressively created by the plugin. In forbidden apis 3.0, we will get task avoidance (https://github.com/policeman-tools/forbidden-apis/pull/162), but we need to ourselves use the same task avoidance mechanisms to not trigger these task creations. This commit does that for our foribdden apis usages, in preparation for upgrading to 3.0 when it is released.	2020-04-15 13:27:53 -07:00
Ignacio Vera	a677b63daa	Upgrade to lucene 8.5.1 release (#55229 ) (#55235 ) Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.	2020-04-15 17:35:42 +02:00
Armin Braun	2f91e2aab7	Fix Race in Snapshot Abort (#54873 ) (#55233 ) We can be a little more efficient when aborting a snapshot. Since we know the new repository data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners. This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.	2020-04-15 15:42:15 +02:00
Armin Braun	d8b43c6283	Make Snapshot Deletes Less Racy (#54765 ) (#55226 ) Snapshot deletes should first check the cluster state for an in-progress snapshot and try to abort it before checking the repository contents. This allows for atomically checking and aborting a snapshot in the same cluster state update, removing all possible races where a snapshot that is in-progress could not be found if it finishes between checking the repository contents and the cluster state. Also removes confusing races, where checking the cluster state off of the cluster state thread finds an in-progress snapshot that is then not found in the cluster state update to abort it. Finally, the logic to use the repository generation of the in-progress snapshot + 1 was error prone because it would always fail the delete when the repository had a pending generation different from its safe generation when a snapshot started (leading to the snapshot finalizing at a higher generation). These issues (particularly that last point) can easily be reproduced by running `SLMSnapshotBlockingIntegTests` in a loop with current `master` (see #54766). The snapshot resiliency test for concurrent snapshot creation and deletion was made to more aggressively start the delete operation so that the above races would become visible. Previously, the fact that deletes would never coincide with initializing snapshots resulted in a number of the above races not reproducing. This PR is the most consistent I could get snapshot deletes without changes to the state machine. The fact that aborted deletes will not put the delete operation in the cluster state before waiting for the snapshot to abort still allows for some possible (though practically very unlikely) races. These will be fixed by a state-machine change in upcoming work in #54705 (which will have a much simpler and clearer diff after this change). Closes #54766	2020-04-15 14:47:16 +02:00
Armin Braun	e164c9aaee	Remove Redundant Cluster State during Snapshot INIT + Master Failover (#54420 ) (#55208 ) * Remove Redundant Cluster State during Snapshot INIT + Master Failover (#54420) Similar to #54395 we know that a snapshot in INIT state has not written anything to the repository yet. If we see one from a master failover, there is no point in moving it to ABORTED before removing it from the cluster state in a subsequent CS update. Instead, we can simply remove its job from the CS the first time we see it on master failover and be done with it.	2020-04-15 12:27:52 +02:00
Armin Braun	48048646e7	Move Snapshot Status Related Method to Appropriate Places (#54558 ) (#55209 ) * Move Snapshot Status Related Method to Appropriate Places Lots of things living in `SnapshotsService` for no reason other than that `SnapshotsService` provides the `RepositoriesService`. Cleaning this up to directly use `RepositoriesService` in the relevant transport actions and by that shortening the already very complex `SnapshotsService`.	2020-04-15 10:25:52 +02:00
Mark Vieira	ce85063653	[7.x] Re-add origin url information to publish POM files (#55173 )	2020-04-14 13:24:15 -07:00
Armin Braun	f7467a7fe8	Fix Cluster Stabilization in SnapshotResiliencyTests (#55159 ) (#55168 ) Just like in `AbstractCoordinatorTestCase` we can't just assume the cluster is stable once all the cluster states align since stray follower/leader check tasks could still hit us after a disconnect, causing future test operations to fail. => fixed by running all tasks in the possible time span of running into these checks before validating that cluster states align on all nodes to prevent this like we do in the coordinator tests. Closes #55103	2020-04-14 19:22:26 +02:00
Yannick Welsch	a610513ec7	Provide repository-level stats for searchable snapshots (#55051 ) Provides basic repository-level stats that will allow us to get some insight into how many requests are actually being made by the underlying SDK. Currently only tracks GET and LIST calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint that will help us better understand some of the low-level dynamics of searchable snapshots.	2020-04-14 14:34:08 +02:00
Alan Woodward	16ebbff3b6	Mute CancellableTasksIT (#55152 ) Test failures are tracked in #55106	2020-04-14 12:55:20 +01:00
Yannick Welsch	73c522320a	Fix DanglingIndicesIT wait conditions (#55105 ) Closes #55105	2020-04-14 13:54:33 +02:00
William Brafford	52bebec51f	NodeInfo response should use a collection rather than fields (#54460 ) (#55132 ) This is a first cut at giving NodeInfo the ability to carry a flexible list of heterogeneous info responses. The trick is to be able to serialize and deserialize an arbitrary list of blocks of information. It is convenient to be able to deserialize into usable Java objects so that we can aggregate nodes stats for the cluster stats endpoint. In order to provide a little bit of clarity about which objects can and can't be used as info blocks, I've introduced a new interface called "ReportingService." I have removed the hard-coded getters (e.g., getOs()) in favor of a flexible method that can return heterogeneous kinds of info blocks (e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the need to cast in the client code.	2020-04-13 17:18:39 -04:00
Nhat Nguyen	96bb1164f0	Support hierarchical task cancellation (#54757 ) With this change, when a task is canceled, the task manager will cancel not only its direct child tasks but all also its descendant tasks. Closes #50990	2020-04-13 12:35:21 -04:00
Igor Motov	51c6f69e02	[7.x] Add support for filters to T-Test aggregation (#54980 ) (#55066 ) Adds support for filters to T-Test aggregation. The filters can be used to select populations based on some criteria and use values from the same or different fields. Closes #53692	2020-04-13 12:28:58 -04:00
Ioannis Kakavas	7a8a66d9ae	[7.x] Fix ReloadSecureSettings API to consume password (#54771 ) (#55059 ) The secure_settings_password was never taken into consideration in the ReloadSecureSettings API. This commit fixes that and adds necessary REST layer testing. Doing so, it also: - Allows TestClusters to have a password protected keystore so that it can be set for tests. - Adds a parameter to the run task so that elastisearch can be run with a password protected keystore from source.	2020-04-13 09:50:55 +03:00
Yang Wang	862799956c	Deprecate local parameter for get field mapping request (#55014 ) (#55099 ) The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0. This PR deprecates the parameter from rest layer. It will be removed in next major version.	2020-04-12 13:48:47 +10:00
Nik Everett	c00811f3a3	Make some agg tests easier to read (#54954 ) (#55079 ) We added a fancy method to provide random realistic test data to the reduction tests in #54910. This uses that to remove some of the more esoteric machinations in the agg tests. This will marginally increase the coverage of the serialiation tests and, more importantly, remove some mysterious value generation code that only really made sense for random reduction tests but was used all over the place. It doesn't, on the other hand, make the tests shorter. Just hopefully more clear. I only cleaned up a few tests this way. If we like this it'd probably be worth grabbing others.	2020-04-10 14:15:30 -04:00
Nik Everett	b99a50bcb9	value_count Aggregation optimization (backport of #54854 ) (#55076 ) We found some problems during the test. Data: 200Million docs, 1 shard, 0 replica hits \| avg \| sum \| value_count \| ----------- \| ------- \| ------- \| ----------- \| 20,000 \| .038s \| .033s \| .063s \| 200,000 \| .127s \| .125s \| .334s \| 2,000,000 \| .789s \| .729s \| 3.176s \| 20,000,000 \| 4.200s \| 3.239s \| 22.787s \| 200,000,000 \| 21.000s \| 22.000s \| 154.917s \| The performance of `avg`, `sum` and other is very close when performing statistics, but the performance of `value_count` has always been poor, even not on an order of magnitude. Based on some common-sense knowledge, we think that `value_count` and sum are similar operations, and the time consumed should be the same. Therefore, we have discussed the agg of `value_count`. The principle of counting in es is to traverse the field of each document. If the field is an ordinary value, the count value is increased by 1. If it is an array type, the count value is increased by n. However, the problem lies in traversing each document and taking out the field, which changes from disk to an object in the Java language. We summarize its current problems with Elasticsearch as: - Number cast to string overhead, and GC problems caused by a large number of strings - After the number type is converted to string, sorting and other unnecessary operations are performed Here is the proof of type conversion overhead. ``` // Java long to string source code, getChars is very time-consuming. public static String toString(long i) { int size = stringSize(i); if (COMPACT_STRINGS) { byte[] buf = new byte[size]; getChars(i, size, buf); return new String(buf, LATIN1); } else { byte[] buf = new byte[size * 2]; StringUTF16.getChars(i, size, buf); return new String(buf, UTF16); } } ``` test type \| average \| min \| max \| sum ------------ \| ------- \| ---- \| ----------- \| ------- double->long \| 32.2ns \| 28ns \| 0.024ms \| 3.22s long->double \| 31.9ns \| 28ns \| 0.036ms \| 3.19s long->String \| 163.8ns \| 93ns \| 1921 ms \| 16.3s particularly serious. Our optimization code is actually very simple. It is to manage different types separately, instead of uniformly converting to string unified processing. We added type identification in ValueCountAggregator, and made special treatment for number and geopoint types to cancel their type conversion. Because the string type is reduced and the string constant is reduced, the improvement effect is very obvious. hits \| avg \| sum \| value_count \| value_count \| value_count \| value_count \| value_count \| value_count \| \| \| \| double \| double \| keyword \| keyword \| geo_point \| geo_point \| \| \| \| before \| after \| before \| after \| before \| after \| ----------- \| ------- \| ------- \| ----------- \| ----------- \| ----------- \| ----------- \| ----------- \| ----------- \| 20,000 \| 38s \| .033s \| .063s \| .026s \| .030s \| .030s \| .038s \| .015s \| 200,000 \| 127s \| .125s \| .334s \| .078s \| .116s \| .099s \| .278s \| .031s \| 2,000,000 \| 789s \| .729s \| 3.176s \| .439s \| .348s \| .386s \| 3.365s \| .178s \| 20,000,000 \| 4.200s \| 3.239s \| 22.787s \| 2.700s \| 2.500s \| 2.600s \| 25.192s \| 1.278s \| 200,000,000 \| 21.000s \| 22.000s \| 154.917s \| 18.990s \| 19.000s \| 20.000s \| 168.971s \| 9.093s \| - The results are more in line with common sense. `value_count` is about the same as `avg`, `sum`, etc., or even lower than these. Previously, `value_count` was much larger than avg and sum, and it was not even an order of magnitude when the amount of data was large. - When calculating numeric types such as `double` and `long`, the performance is improved by about 8 to 9 times; when calculating the `geo_point` type, the performance is improved by 18 to 20 times.	2020-04-10 13:16:39 -04:00
Tim Brooks	98fba92022	Fail sniff process if no connections opened (#54934 ) Currently the remote cluster sniff connection process can succeed even if no connections are opened. This commit fixes this by failing the connection process if no connections are successfully opened.	2020-04-10 10:06:45 -06:00
Jim Ferenczi	d14ed34577	Explicitly test rewrite of date histogram's time zones on date_nanos (#54402 ) This commit adds an explicit test of time zone rewrite on date nanos field. Today this is working but we need tests to ensure that we don't break it unintentionally.	2020-04-10 17:37:59 +02:00
Igor Motov	da976d247f	Improve robustness of Query Result serializations (#54692 ) (#55028 ) Makes query result serialization more robust by propagating possible IOExceptions that can occur during shard level result serialization to the caller instead of throwing AssertionError that is not intercepted. Fixes #54665	2020-04-10 10:29:01 -04:00
Jason Tedor	9eeae59a83	Clarify available processors (#54907 ) The use of available processors, the terminology, and the settings around it have evolved over time. This commit cleans up some places in the codes and in the docs to adjust to the current terminology.	2020-04-10 08:48:27 -04:00
Przemko Robakowski	35c195b224	Prevent putting V2 index template when overlapping with existing template (#54933 ) (#55042 ) * Prevent putting V2 index template when overlapping with existing template This change prevents putting V2 index template when it would overlap with existing V2 template of the same priority Relates to #53101	2020-04-10 10:31:37 +02:00
Nik Everett	62d6bc31bf	Reduce memory for big aggs run against many shards (#54758 ) (#55024 ) This changes the behavior of aggregations when search is performed against enough shards to enable "batch reduce" mode. In this case we force always store aggregations in serialized form rather than a traditional java reference. This should shrink the memory usage of large aggregations at the cost of slightly slowing down aggregations where the coordinating node is also a data node. Because we're only doing this when there are many shards this is likely to be fairly rare. As a side effect this lets us add logs for the memory usage of the aggs buffer: ``` [2020-04-03T17:03:57,052][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1320->448] max [1320] [2020-04-03T17:03:57,089][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328] [2020-04-03T17:03:57,102][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328] [2020-04-03T17:03:57,103][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328] [2020-04-03T17:03:57,105][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs final reduction [888] max [1328] ``` These are useful, but you need to keep some things in mind before trusting them: 1. The buffers are oversized ala Lucene's ArrayUtils. This means that we are using more space than we need, but probably not much more. 2. Before they are merged the aggregations are inflated into their traditional Java objects which probably take up a lot more space than the serialized form. That is, after all, the reason why we store them in serialized form in the first place. And, just because I can, here is another example of the log: ``` [2020-04-03T17:06:18,731][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,750][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,809][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,827][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,829][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs final reduction [98352] max [147528] ``` I got that last one by building a ten shard index with a million docs in it and running a `sum` in three layers of `terms` aggregations, all on `long` fields, and with a `batched_reduce_size` of `3`.	2020-04-09 14:58:42 -04:00

1 2 3 4 5 ...

4550 Commits