OpenSearch

Commit Graph

Author	SHA1	Message	Date
Igor Motov	8a669dc9b7	EQL: Add cascading search cancellation (#54843 ) EQL search cancellation now propagates cancellation to underlying search operations. Relates to #49638	2020-04-14 08:06:02 -04:00
Alan Woodward	16ebbff3b6	Mute CancellableTasksIT (#55152 ) Test failures are tracked in #55106	2020-04-14 12:55:20 +01:00
Yannick Welsch	73c522320a	Fix DanglingIndicesIT wait conditions (#55105 ) Closes #55105	2020-04-14 13:54:33 +02:00
David Turner	87e8367ece	Fix testCreateAndRestoreSearchableSnapshot (#55147 ) Fixes a couple of related failures in SearchableSnapshotsIntegTests. Firstly, we were not correctly accounting for the case where the cache was so small that some/all files were read directly; fixed this by only asserting that the cache is definitely used if the corresponding node has a cache that's large enough to hold the whole index. Secondly, we were not permitting shards to be completely empty, which might be the case (rarely) if there were not many documents indexed and the distribution of IDs was a bit unlucky; fixed this by asserting that we get stats for at least one file for the whole index, rather than for each shard separately. Closes #55126	2020-04-14 11:54:46 +01:00
Ioannis Kakavas	70cc1d57fb	Mute failing test (#54734 )	2020-04-14 10:18:33 +01:00
Mark Vieira	cb58725164	Mute InferenceIngestIT.testPipelineIngest	2020-04-14 09:27:56 +01:00
debadair	e8fa539bea	[DOCS] Removed obsolete warning about no way to securely store passwords (#55133 ) (#55140 ) * [DOCS] Removed obsolete warning about no way to securely store passwords. * Update x-pack/docs/en/watcher/actions/email.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co>	2020-04-13 21:38:32 -07:00
William Brafford	52bebec51f	NodeInfo response should use a collection rather than fields (#54460 ) (#55132 ) This is a first cut at giving NodeInfo the ability to carry a flexible list of heterogeneous info responses. The trick is to be able to serialize and deserialize an arbitrary list of blocks of information. It is convenient to be able to deserialize into usable Java objects so that we can aggregate nodes stats for the cluster stats endpoint. In order to provide a little bit of clarity about which objects can and can't be used as info blocks, I've introduced a new interface called "ReportingService." I have removed the hard-coded getters (e.g., getOs()) in favor of a flexible method that can return heterogeneous kinds of info blocks (e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the need to cast in the client code.	2020-04-13 17:18:39 -04:00
Ryan Ernst	ae14d1661e	Replace license check isAuthAllowed with isSecurityEnabled (#54547 ) (#55082 ) The isAuthAllowed() method for license checking is used by code that wants to ensure security is both enabled and available. The enabled state is dynamic and provided by isSecurityEnabled(). But since security is available with all license types, an check on the license level is not necessary. Thus, this change replaces isAuthAllowed() with calling isSecurityEnabled().	2020-04-13 12:26:39 -07:00
Benjamin Trent	d32f6fed1d	[ML] inference only persist if there are stats (#54752 ) (#55121 ) We needlessly send documents to be persisted. If there are no stats added, then we should not attempt to persist them. Also, this PR fixes the race condition that caused issue: https://github.com/elastic/elasticsearch/issues/54786	2020-04-13 14:03:05 -04:00
lcawl	fcd96db006	[DOCS] Edits create data frame analytics job API (#54751 )	2020-04-13 10:43:52 -07:00
Nhat Nguyen	96bb1164f0	Support hierarchical task cancellation (#54757 ) With this change, when a task is canceled, the task manager will cancel not only its direct child tasks but all also its descendant tasks. Closes #50990	2020-04-13 12:35:21 -04:00
Igor Motov	51c6f69e02	[7.x] Add support for filters to T-Test aggregation (#54980 ) (#55066 ) Adds support for filters to T-Test aggregation. The filters can be used to select populations based on some criteria and use values from the same or different fields. Closes #53692	2020-04-13 12:28:58 -04:00
Jake Landis	a2fafa6af4	[7.x] Lazy test cluster module and plugins (#54852 ) (#55087 ) This change converts the module and plugin parameters for testClusters to be lazy. Meaning that the values are not resolved until they are actually used. This removes the requirement to use project.afterEvaluate to be able to resolve the bundle artifact. Note - this does not completely remove the need for afterEvaluate since it is still needed for the custom resource extension.	2020-04-13 10:53:35 -05:00
James Rodewig	57d6493e29	[DOCS] EQL: Document `string` function (#55086 )	2020-04-13 11:23:45 -04:00
Peter Dyson	f0b6cf4c11	[DOCS] Note where ILM policies are stored and backup caveats (#54859 )	2020-04-13 09:11:16 -06:00
Igor Motov	6861295706	Further improve InternalTTestTests (#55081 ) A small follow-up to #54910. Now that we can generated consistent set of internal aggs to reduce, we no longer need to keep agg parameters as class variables. Related to #54910	2020-04-13 10:26:23 -04:00
Vishal Patel	16921ebbd8	[DOCS] Collapse nested objects in Explore API docs (#55067 ) Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-04-13 09:27:03 -04:00
Benjamin Trent	c5c7ee9d73	[7.x] [ML] Start gathering and storing inference stats (#53429 ) (#54738 ) * [ML] Start gathering and storing inference stats (#53429) This PR enables stats on inference to be gathered and stored in the `.ml-stats-` indices. Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user. `.ml-stats-` is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name. We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).	2020-04-13 08:15:46 -04:00
Ioannis Kakavas	7a8a66d9ae	[7.x] Fix ReloadSecureSettings API to consume password (#54771 ) (#55059 ) The secure_settings_password was never taken into consideration in the ReloadSecureSettings API. This commit fixes that and adds necessary REST layer testing. Doing so, it also: - Allows TestClusters to have a password protected keystore so that it can be set for tests. - Adds a parameter to the run task so that elastisearch can be run with a password protected keystore from source.	2020-04-13 09:50:55 +03:00
Yang Wang	862799956c	Deprecate local parameter for get field mapping request (#55014 ) (#55099 ) The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0. This PR deprecates the parameter from rest layer. It will be removed in next major version.	2020-04-12 13:48:47 +10:00
Andrei Dan	c0406f78b7	ILM add cluster update timeout on step retry (#54878 ) (#55022 ) This commits adds a timeout when moving ILM back on to a failed step. In case the master is struggling with processing the cluster update requests these ones will expire (as we'll send them again anyway on the next ILM loop run) ILM more descriptive source messages for cluster updates Use the configured ILM step master timeout setting (cherry picked from commit ff6c5ed16616eadfcddd9c95317d370f0d126583) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-04-11 10:13:31 +01:00
Andrei Dan	b8df265b42	[7.x] ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909 ) (#55018 ) * ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) This changes the priority of the cluster state update that stops ILM altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL` priority can see the "stop ILM update" not make it up the tasks queue. On the same note, we're keeping the `start ILM` cluster update priority to `NORMAL` on purpose such that we only start `ILM` if the cluster can handle it. (cherry picked from commit d67df3a7cd2a8619c2c9efac4dde3ba83271f2fa) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-04-11 10:12:35 +01:00
James Rodewig	2655dfa2fe	[DOCS] EQL: Reword field support for EQL functions (#55074 ) Changes boilerplate sentence of "If using a field as the argument, this parameter only supports..." to "...this parameter supports only...". The latter is a bit more clear and readable.	2020-04-10 15:33:29 -04:00
Albert Zaharovits	f22004a262	Preserve parent task id for data frame analytics (#55046 ) This change makes sure that all internal client requests spawned by the data frame analytics persistent task executor and that use the end user security credentials, have the parent task id assigned. The objective here is to permit auditing (as well as tracking for debugging purposes) of all the end-user requests executed on its behalf by persistent tasks. Because data frame analytics taks already implements graceful shutdown of child tasks, this change does not interfere with it by opting out of the persistent task cancellation of child tasks. Relates #54943 #52314	2020-04-10 22:27:21 +03:00
Jason Tedor	d1137ebdaa	Passthrough special characters in thread pool docs (#55080 ) Some of these characters are special to Asciidoctor and they ruin the rendering on this page. Instead, we use a macro to passthrough these characters without Asciidoctor applying any subtitutions to them. This commit then addresses some rendering issues in the thread pool docs. Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-04-10 15:11:19 -04:00
Mark Vieira	5d4ddf9146	Fixes for IntelliJ IDEA 2020.1 support (#55077 )	2020-04-10 11:57:48 -07:00
Nik Everett	c00811f3a3	Make some agg tests easier to read (#54954 ) (#55079 ) We added a fancy method to provide random realistic test data to the reduction tests in #54910. This uses that to remove some of the more esoteric machinations in the agg tests. This will marginally increase the coverage of the serialiation tests and, more importantly, remove some mysterious value generation code that only really made sense for random reduction tests but was used all over the place. It doesn't, on the other hand, make the tests shorter. Just hopefully more clear. I only cleaned up a few tests this way. If we like this it'd probably be worth grabbing others.	2020-04-10 14:15:30 -04:00
Nik Everett	b99a50bcb9	value_count Aggregation optimization (backport of #54854 ) (#55076 ) We found some problems during the test. Data: 200Million docs, 1 shard, 0 replica hits \| avg \| sum \| value_count \| ----------- \| ------- \| ------- \| ----------- \| 20,000 \| .038s \| .033s \| .063s \| 200,000 \| .127s \| .125s \| .334s \| 2,000,000 \| .789s \| .729s \| 3.176s \| 20,000,000 \| 4.200s \| 3.239s \| 22.787s \| 200,000,000 \| 21.000s \| 22.000s \| 154.917s \| The performance of `avg`, `sum` and other is very close when performing statistics, but the performance of `value_count` has always been poor, even not on an order of magnitude. Based on some common-sense knowledge, we think that `value_count` and sum are similar operations, and the time consumed should be the same. Therefore, we have discussed the agg of `value_count`. The principle of counting in es is to traverse the field of each document. If the field is an ordinary value, the count value is increased by 1. If it is an array type, the count value is increased by n. However, the problem lies in traversing each document and taking out the field, which changes from disk to an object in the Java language. We summarize its current problems with Elasticsearch as: - Number cast to string overhead, and GC problems caused by a large number of strings - After the number type is converted to string, sorting and other unnecessary operations are performed Here is the proof of type conversion overhead. ``` // Java long to string source code, getChars is very time-consuming. public static String toString(long i) { int size = stringSize(i); if (COMPACT_STRINGS) { byte[] buf = new byte[size]; getChars(i, size, buf); return new String(buf, LATIN1); } else { byte[] buf = new byte[size * 2]; StringUTF16.getChars(i, size, buf); return new String(buf, UTF16); } } ``` test type \| average \| min \| max \| sum ------------ \| ------- \| ---- \| ----------- \| ------- double->long \| 32.2ns \| 28ns \| 0.024ms \| 3.22s long->double \| 31.9ns \| 28ns \| 0.036ms \| 3.19s long->String \| 163.8ns \| 93ns \| 1921 ms \| 16.3s particularly serious. Our optimization code is actually very simple. It is to manage different types separately, instead of uniformly converting to string unified processing. We added type identification in ValueCountAggregator, and made special treatment for number and geopoint types to cancel their type conversion. Because the string type is reduced and the string constant is reduced, the improvement effect is very obvious. hits \| avg \| sum \| value_count \| value_count \| value_count \| value_count \| value_count \| value_count \| \| \| \| double \| double \| keyword \| keyword \| geo_point \| geo_point \| \| \| \| before \| after \| before \| after \| before \| after \| ----------- \| ------- \| ------- \| ----------- \| ----------- \| ----------- \| ----------- \| ----------- \| ----------- \| 20,000 \| 38s \| .033s \| .063s \| .026s \| .030s \| .030s \| .038s \| .015s \| 200,000 \| 127s \| .125s \| .334s \| .078s \| .116s \| .099s \| .278s \| .031s \| 2,000,000 \| 789s \| .729s \| 3.176s \| .439s \| .348s \| .386s \| 3.365s \| .178s \| 20,000,000 \| 4.200s \| 3.239s \| 22.787s \| 2.700s \| 2.500s \| 2.600s \| 25.192s \| 1.278s \| 200,000,000 \| 21.000s \| 22.000s \| 154.917s \| 18.990s \| 19.000s \| 20.000s \| 168.971s \| 9.093s \| - The results are more in line with common sense. `value_count` is about the same as `avg`, `sum`, etc., or even lower than these. Previously, `value_count` was much larger than avg and sum, and it was not even an order of magnitude when the amount of data was large. - When calculating numeric types such as `double` and `long`, the performance is improved by about 8 to 9 times; when calculating the `geo_point` type, the performance is improved by 18 to 20 times.	2020-04-10 13:16:39 -04:00
Mark Vieira	38590c83f0	Update opensuse 15.1 os identifier	2020-04-10 10:04:53 -07:00
Luca Cavanna	93c39ad4e7	Async search: create internal index only before storing initial response (#54619 ) We currently create the .async-search index if necessary before performing any action (index, update or delete). Truth is that this is needed only before storing the initial response. The other operations are either update or delete, which will anyways not find the document to update/delete even if the index gets created when missing. This also caused `testCancellation` failures as we were trying to delete the document twice from the .async-search index, once from `TransportDeleteAsyncSearchAction` and once as a consequence of the search task being completed. The latter may be called after the test is completed, but before the cluster is shut down and causing problems to the after test checks, for instance if it happens after all the indices have been cleaned up. It is totally fine to try to delete a response that is no longer found, but not quite so if such call will also trigger an index creation. With this commit we remove all the calls to createIndexIfNecessary from the update/delete operation, and we leave one call only from storeInitialResponse which is where the index is expected to be created. Closes #54180	2020-04-10 18:24:05 +02:00
Tim Brooks	98fba92022	Fail sniff process if no connections opened (#54934 ) Currently the remote cluster sniff connection process can succeed even if no connections are opened. This commit fixes this by failing the connection process if no connections are successfully opened.	2020-04-10 10:06:45 -06:00
Ross Wolf	96a903b17f	EQL: Add string function (#54470 ) * EQL: Add string() function * EQL: Reorder queryfolder_tests * EQL: Add test queries * EQL: Fix InternalEqlScriptUtils.string and test case * EQL: Fix testStringFunctionWithText error message * EQL: Flatten ToStringFunctionPipe.equals * EQL: Reorder painless whitelist * EQL: Address feedback and remove string(null) handling * EQL: Move string(pid) test over * EQL: Rename source -> value	2020-04-10 09:48:29 -06:00
Jim Ferenczi	d14ed34577	Explicitly test rewrite of date histogram's time zones on date_nanos (#54402 ) This commit adds an explicit test of time zone rewrite on date nanos field. Today this is working but we need tests to ensure that we don't break it unintentionally.	2020-04-10 17:37:59 +02:00
Igor Motov	da976d247f	Improve robustness of Query Result serializations (#54692 ) (#55028 ) Makes query result serialization more robust by propagating possible IOExceptions that can occur during shard level result serialization to the caller instead of throwing AssertionError that is not intercepted. Fixes #54665	2020-04-10 10:29:01 -04:00
Przemysław Witek	17101d86d9	[7.x] Do not execute ML CRUD actions when upgrade mode is enabled (#54437 ) (#55049 )	2020-04-10 16:07:11 +02:00
James Rodewig	c440754784	[DOCS] EQL: Document `wildcard` function (#54086 )	2020-04-10 09:18:29 -04:00
oneoneonepig	356cc94889	[DOCS] Fix double quote typo in 7.0 breaking changes (#55040 )	2020-04-10 09:11:51 -04:00
Dimitrios Liappis	b062535e27	Mute testSearchableSnapshotAction in TimeSeriesLifecycleActions tests (#55055 ) Backport of #55052 Details in #55050	2020-04-10 16:03:09 +03:00
Jason Tedor	a370668fcc	Clean up even more instances of "metaData" We recently cleaned up the use of the word "metadata" across the codebase. Even more additional uses have trickled in, likely from in-progress work. This commit cleans up these last few additional instances. Relates #54519	2020-04-10 08:52:37 -04:00
Jason Tedor	9eeae59a83	Clarify available processors (#54907 ) The use of available processors, the terminology, and the settings around it have evolved over time. This commit cleans up some places in the codes and in the docs to adjust to the current terminology.	2020-04-10 08:48:27 -04:00
James Rodewig	51326432be	[DOCS] Add query reference docs template (#52292 )	2020-04-10 08:47:54 -04:00
James Rodewig	d5a609a2e5	[DOCS] Add token filter reference docs template (#52290 ) Creates a reusable template for token filter reference documentation. Contributors can make a copy of this template and customize it when documenting new token filters.	2020-04-10 08:45:10 -04:00
Przemko Robakowski	35c195b224	Prevent putting V2 index template when overlapping with existing template (#54933 ) (#55042 ) * Prevent putting V2 index template when overlapping with existing template This change prevents putting V2 index template when it would overlap with existing V2 template of the same priority Relates to #53101	2020-04-10 10:31:37 +02:00
Costin Leau	a7e4f79e8f	EQL: Deprecate lenient sequence declaration (#55032 ) Deprecate alternative sequence parameter declaration (with then by) Disallow lack of time units inside maxspan Fix #55023 Relate #54680 (cherry picked from commit 201adafba9def1de4bf843760defb9def3394f63)	2020-04-10 10:30:07 +03:00
Marios Trivyzas	bf0cadb602	SQL: Implement DATETIME_PARSE function for parsing strings (#54960 ) (#55035 ) Implement DATETIME_PARSE(<datetime_str>, <pattern_str>) function which allows to parse a datetime string according to the specified pattern into a datetime object. The patterns allowed are those of java.time.format.DateTimeFormatter. Relates to #53714 (cherry picked from commit 3febcd8f3cdf9fdda4faf01f23a5f139f38b57e0)	2020-04-10 01:16:29 +02:00
Vishal Patel	51cb0c5c7b	[DOCS] Collapse nested objects in cluster reroute docs (#54851 )	2020-04-09 15:29:22 -04:00
Mark Vieira	12f056b833	Update IDE integration to reflect Java 14 requirement (#54990 )	2020-04-09 12:27:57 -07:00
Nik Everett	62d6bc31bf	Reduce memory for big aggs run against many shards (#54758 ) (#55024 ) This changes the behavior of aggregations when search is performed against enough shards to enable "batch reduce" mode. In this case we force always store aggregations in serialized form rather than a traditional java reference. This should shrink the memory usage of large aggregations at the cost of slightly slowing down aggregations where the coordinating node is also a data node. Because we're only doing this when there are many shards this is likely to be fairly rare. As a side effect this lets us add logs for the memory usage of the aggs buffer: ``` [2020-04-03T17:03:57,052][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1320->448] max [1320] [2020-04-03T17:03:57,089][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328] [2020-04-03T17:03:57,102][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328] [2020-04-03T17:03:57,103][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328] [2020-04-03T17:03:57,105][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs final reduction [888] max [1328] ``` These are useful, but you need to keep some things in mind before trusting them: 1. The buffers are oversized ala Lucene's ArrayUtils. This means that we are using more space than we need, but probably not much more. 2. Before they are merged the aggregations are inflated into their traditional Java objects which probably take up a lot more space than the serialized form. That is, after all, the reason why we store them in serialized form in the first place. And, just because I can, here is another example of the log: ``` [2020-04-03T17:06:18,731][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,750][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,809][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,827][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528] [2020-04-03T17:06:18,829][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs final reduction [98352] max [147528] ``` I got that last one by building a ten shard index with a million docs in it and running a `sum` in three layers of `terms` aggregations, all on `long` fields, and with a `batched_reduce_size` of `3`.	2020-04-09 14:58:42 -04:00
Julie Tibshirani	850ea7c0be	Correct the name of the docvalues_fields object parser.	2020-04-09 11:36:28 -07:00

1 2 3 4 5 ...

51117 Commits All Branches Search

51117 Commits

All Branches