OpenSearch

Commit Graph

Author	SHA1	Message	Date
Martijn van Groningen	4ac25b23f6	Add support for a more compact enrich values format (#45033 ) In the case that source and target are the same in `enrich_values` then a string array can be specified. For example instead of this: ``` PUT /_ingest/pipeline/my-pipeline { "processors": [ { "enrich" : { "policy_name": "my-policy", "enrich_values": [ { "source": "first_name", "target": "first_name" }, { "source": "last_name", "target": "last_name" }, { "source": "address", "target": "address" }, { "source": "city", "target": "city" }, { "source": "state", "target": "state" }, { "source": "zip", "target": "zip" } ] } } ] } ``` This more compact format can be specified: ``` PUT /_ingest/pipeline/my-pipeline { "processors": [ { "enrich" : { "policy_name": "my-policy", "targets": [ "first_name", "last_name", "address", "city", "state", "zip" ] } } ] } ``` And the `enrich_values` key has been renamed to `set_from`. Relates to #32789	2019-08-09 12:40:58 +02:00
Alpar Torok	634a070430	Restrict which tasks can use testclusters (#45198 ) * Restrict which tasks can use testclusters This PR fixes a problem between the interaction of test-clusters and build cache. Before this any task could have used a cluster without tracking it as input. With this change a new interface is introduced to track the tasks that can use clusters and we do consider the cluster as input for all of them.	2019-08-09 13:38:01 +03:00
Yannick Welsch	5ddeb488a6	Allow _update on write alias (#45318 ) Using the document update API on aliases with a write index does not work. Follow-up to #31520	2019-08-09 11:44:24 +02:00
Martijn van Groningen	5e1c0d598c	Added HLRC support for enrich put policy API. (#45183 ) This PR also adds HLRC docs. Relates to #32789	2019-08-09 09:30:56 +02:00
Martijn van Groningen	f1ee29f22e	Added a custom api to perform the msearch more efficiently for enrich processor (#43965 ) Currently the msearch api is used to execute buffered search requests; however the msearch api doesn't deal with search requests in an intelligent way. It basically executes each search separately in a concurrent manner. This api reuses the msearch request and response classes and executes the searches as one request in the node holding the enrich index shard. Things like engine.searcher and query shard context are only created once. Also there are less layers than executing a regular msearch request. This results in an interesting speedup. Without this change, in a single node cluster, enriching documents with a bulk size of 5000 items, the ingest time in each bulk response varied from 174ms to 822ms. With this change the ingest time in each bulk response varied from 54ms to 109ms. I think we should add a change like this based on this improvement in ingest time. However I do wonder if instead of doing this change, we should improve the msearch api to execute more efficiently. That would be more complicated then this change, because in this change the custom api can only search enrich index shards and these are special because they always have a single primary shard. If msearch api is to be improved then that should work for any search request to any indices. Making the same optimization for indices with more than 1 primary shard requires much more work. The current change is isolated in the enrich plugin and LOC / complexity is small. So this good enough for now.	2019-08-09 09:11:04 +02:00
Hendrik Muhs	7d0aff0ed5	[ML-DataFrame] fix test failure in checkpoint retrieval (#45297 ) gracefully handle if index response returns null, increase and assert timeout closes #45238	2019-08-09 09:04:53 +02:00
Armin Braun	a501d68f23	Upgrade to Netty 4.1.38 (#45132 ) (#45364 ) * A number of fixes to buffer handling in the .37 and .38 -> we should stay up to date	2019-08-09 03:38:14 +02:00
Tal Levy	2a99eaa7c2	Revert "removes the CellIdSource abstraction from geo-grid aggs (#45307 ) (#45353 )" This reverts commit `7b0a8040de`.	2019-08-08 17:40:03 -07:00
Ryan Ernst	1794718e8e	Make git revision loading lazy (#45358 ) This commit makes the gitRevision property a lazy loaded value by returning an Object implementing toString(). The Dockerfile template is also changed to use groovy templates instead of the mavenfilter hack, so converting to String will not happen until runtime.	2019-08-08 17:08:07 -07:00
Armin Braun	12ed6dc999	Only retain reasonable history for peer recoveries (#45208 ) (#45355 ) Today if a shard is not fully allocated we maintain a retention lease for a lost peer for up to 12 hours, retaining all operations that occur in that time period so that we can recover this replica using an operations-based recovery if it returns. However it is not always reasonable to perform an operations-based recovery on such a replica: if the replica is a very long way behind the rest of the replication group then it can be much quicker to perform a file-based recovery instead. This commit introduces a notion of "reasonable" recoveries. If an operations-based recovery would involve copying only a small number of operations, but the index is large, then an operations-based recovery is reasonable; on the other hand if there are many operations to copy across and the index itself is relatively small then it makes more sense to perform a file-based recovery. We measure the size of the index by computing its number of documents (including deleted documents) in all segments belonging to the current safe commit, and compare this to the number of operations a lease is retaining below the local checkpoint of the safe commit. We consider an operations-based recovery to be reasonable iff it would involve replaying at most 10% of the documents in the index. The mechanism for this feature is to expire peer-recovery retention leases early if they are retaining so much history that an operations-based recovery using that lease would be unreasonable. Relates #41536	2019-08-09 01:56:32 +02:00
Tal Levy	7b0a8040de	removes the CellIdSource abstraction from geo-grid aggs (#45307 ) (#45353 ) CellIdSource is a helper ValuesSource that encodes GeoPoint into a long-encoded representation of the grid bucket the point is associated with. This complicates thing as usage evolves to support shapes that are associated with more than one bucket ordinal.	2019-08-08 16:33:16 -07:00
Hendrik Muhs	68f9102550	[ML-DataFrame] audit changes in the source index (#45282 ) add audits when the set of source indexes changes and in a special case runs empty	2019-08-08 23:31:55 +02:00
Andrei Stefan	740d58fd46	SQL: Uniquely named inner_hits sections for each nested field condition (#45341 ) * Name each inner_hits section of nested queries differently and extract and combine the multiple values it generates into a single list. This also introduces a limitation (its origin it's with Elasticsearch though) on the sorting capabilities when the sorting is based on the nested fields filtered: only one of the conditions applied to nested documents will be used in the nested sorting. (cherry picked from commit cfc5cf68f6e83b07bb9006986d0903d6be418ec6)	2019-08-09 00:22:49 +03:00
Tim Brooks	af908efa41	Disable netty direct buffer pooling by default (#44837 ) Elasticsearch does not grant Netty reflection access to get Unsafe. The only mechanism that currently exists to free direct buffers in a timely manner is to use Unsafe. This leads to the occasional scenario, under heavy network load, that direct byte buffers can slowly build up without being freed. This commit disables Netty direct buffer pooling and moves to a strategy of using a single thread-local direct buffer for interfacing with sockets. This will reduce the memory usage from networking. Elasticsearch currently derives very little value from direct buffer usage (TLS, compression, Lucene, Elasticsearch handling, etc all use heap bytes). So this seems like the correct trade-off until that changes.	2019-08-08 15:10:31 -06:00
Armin Braun	b19de55095	Add missing wait to testAutomaticReleaseOfIndexBlock (#45342 ) (#45351 ) Today the test waits for one of the shards to be blocked, but this does not mean that the block has been applied on all nodes, so a subsequent indexing operation may still go through. Fixes #45338	2019-08-08 22:39:22 +02:00
Henning Andersen	d139896b66	Reindex share retry between hit sources (#44203 ) (#45348 ) The client and remote hit sources had each their own retry mechanism, which would do the same. Supporting resiliency we would have to expand on the retry mechanisms and as a preparation for that, the retry mechanism is now shared such that each sub class is only responsible for sending requests and converting responses/failures to common format. Part of #42612	2019-08-08 22:01:29 +02:00
Mark Vieira	214cbb28df	Fix for build runtime classpath instability (#45347 ) (cherry picked from commit dee4ee2f0d4190ab54d0a4f0aa251d8c03e9db6d)	2019-08-08 12:41:17 -07:00
Christoph Büscher	a552b33276	Fix occasional SuggestSearchIT failure (#45330 ) Refreshes happening during indexing can result differen segment counts and slightly skewed term statistics, which in turn has the potential to change suggestion output slightly. In order to prevent this, disable refresh for the affected tests. Closes #43261	2019-08-08 21:06:32 +02:00
Mark Vieira	0ae103c40f	Avoid unnecessary eager creation of Gradle tasks (#45098 ) (#45310 )	2019-08-08 10:50:09 -07:00
Jack Conradson	b716b840d3	Remove loop counter from Reserved in Painless AST. (#45298 ) This change adds a compiler pass to give each node the chance to store settings necessary for analysis and writing. This removes the need to pass this in a somewhat convoluted way through an additional class called Reserved, and also removes the need to have the Walker set values for settings on reserved. This is next step in decoupling the Painless grammar from the Painless AST.	2019-08-08 09:34:51 -07:00
David Roberts	14545f8958	[ML-DataFrame] Combine task_state and indexer_state in _stats (#45324 ) This commit replaces task_state and indexer_state in the data frame _stats output with a single top level state that combines the two. It is defined as: - failed if what's currently reported as task_state is failed - stopped if there is no persistent task - Otherwise what's currently reported as indexer_state Backport of #45276	2019-08-08 16:24:26 +01:00
Martijn van Groningen	bb429d3b5c	required changes after merge	2019-08-08 17:04:18 +02:00
Dimitris Athanasiou	e53bb050db	Mute testAutomaticReleaseOfIndexBlock Relates #45338	2019-08-08 17:56:41 +03:00
Martijn van Groningen	708f856940	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-08 16:52:45 +02:00
Andrey Ershov	07c656fba9	Mute testCustomDataPaths on Windows See #45333 (cherry picked from commit 671e1ad1068aee4b593ad0c8ab13ff60b4f125b8)	2019-08-08 16:26:56 +02:00
Mayya Sharipova	f0f2294695	Add filters in examples of vector functions (#45327 )	2019-08-08 09:44:59 -04:00
Zachary Tong	86d6597890	Use newIndexSearcher() instead of newSearcher() (#45248 ) `newSearcher()` from lucene can randomly choose index readers which are not compatible with our tests, like ParallelCompositeReader. The `newIndexSearcher()` method on AggregatorTestCase is a wrapper similar to newSearcher but compatible with our tests	2019-08-08 09:34:38 -04:00
István Zoltán Szabó	4e32470827	[DOCS] Reformats cluster reroute API. (#45328 )	2019-08-08 15:27:54 +02:00
István Zoltán Szabó	4d96c83854	[DOCS] Reformats cluster pending tasks API (#45280 ) Co-Authored-By: James Rodewig <james.rodewig@elastic.co>	2019-08-08 14:49:32 +02:00
James Rodewig	e72bda1703	[DOCS] Reformats cat nodes API (#45285 )	2019-08-08 08:38:02 -04:00
James Rodewig	b806e6edde	[DOCS] Reformats cat pending tasks API (#45287 )	2019-08-08 08:32:04 -04:00
István Zoltán Szabó	276e9c6697	[DOCS] Adds supported time units ref to the ML and DF API params. (#45322 )	2019-08-08 14:26:19 +02:00
Martijn van Groningen	e066133016	Change the ingest simulate api to not include dropped documents (#44161 ) If documents are dropped by the `drop` processor then these documents are returned as a `null` value in the response. === Example Create pipeline: ``` PUT _ingest/pipeline/droppipeline { "processors": [ { "set": { "field": "bla", "value": "val" } }, { "drop": {} } ] } ``` Simulate request: POST _ingest/pipeline/droppipeline/_simulate { "docs": [ { "_source": { "message": "text" } } ] } Response: ``` { "docs": [ null ] } ``` Response if verbose is enabled: ``` { "docs": [ { "processor_results": [ { "doc": { "_index": "_index", "_type": "_doc", "_id": "_id", "_source": { "message": "text", "bla": "val" }, "_ingest": { "timestamp": "2019-07-10T11:07:10.758315Z" } } }, null ] } ] } ``` Closes #36150 * Abort pipeline simulation in verbose mode when document has been dropped by drop processor	2019-08-08 13:04:33 +02:00
Ioannis Kakavas	99ddb8b3d8	Allow empty token endpoint for implicit flow (#45038 ) When using the implicit flow in OpenID Connect, the op.token_endpoint_url should not be mandatory as there is no need to contact the token endpoint of the OP.	2019-08-08 12:50:53 +03:00
David Turner	ddcc38cf1c	More read-only-allow-delete docs (#45320 ) Adds to the `index.blocks.read_only_allow_delete` docs the information that this block may be added or removed automatically, and rewords the breaking-changes docs to mention the blocks explicitly and to recommend using a different block. Relates #42559	2019-08-08 09:58:23 +01:00
Martijn van Groningen	e3fd1e6c7d	Add support for overwrite parameter in the enrich processor. (#45029 ) Similar to how it is supported in the set processor: https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html Relates to #32789	2019-08-08 10:33:19 +02:00
István Zoltán Szabó	9f62c04637	[DOCS] Reformats cluster health and cluster state APIs (#45206 ) Co-Authored-By: James Rodewig <james.rodewig@elastic.co>	2019-08-08 10:25:05 +02:00
Martijn van Groningen	fb959d188c	Backport: Add description to force-merge tasks (#41365 ) (#45191 ) * Add description to force-merge tasks (#41365) This is static information that is part of the force merge request. Relates to #15975	2019-08-08 08:15:09 +02:00
Mark Vieira	fca458f1c8	Use system properties for build cache configuration (#45295 )	2019-08-07 13:14:54 -07:00
Gordon Brown	e3599fded7	Add warning about versions to Deprecation API docs (#36624 ) Add a note that the Deprecation API may not be up to date with all breaking changes until the last minor version in a major version series.	2019-08-07 14:11:05 -06:00
Michael Basnight	89861d0884	Add ingest processor existence helper method (#45156 ) This commit adds a helper method to the ingest service allowing it to inspect a pipeline by id and verify the existence of a processor in the pipeline. This work exposed a potential bug in that some processors contain inner processors that are passed in at instantiation. These processors needed a common way to expose their inner processors, so the WrappingProcessor was created in order to expose the inner processor.	2019-08-07 11:19:04 -05:00
Mark Vieira	341ab48ec0	Improve SCM info in build scans (#45264 )	2019-08-07 09:06:11 -07:00
Benjamin Trent	5db9982f71	[7.x] [ML][Data Frame] Add update transform api endpoint (#45154 ) (#45279 ) * [ML][Data Frame] Add update transform api endpoint (#45154) This adds the ability to `_update` stored data frame transforms. All mutable fields are applied when the next checkpoint starts. The exception being `description`. This PR contains all that is necessary for this addition: * HLRC * Docs * Server side	2019-08-07 10:37:35 -05:00
Benjamin Trent	3a71b91dca	[ML][Data Frame] add support for geo_bounds aggregation (#44441 ) (#45281 ) This adds support for `geo_bounds` aggregation inside the `pivot.aggregations` configuration. The two points returned from the `geo_bounds` aggregation are transformed into `geo_shape` whose types are dynamic given the point's similarity. * `point` if the two points are identical * `linestring` if the two points share either a latitude or longitude * `polygon` if the two points are completely different The automatically deduced mapping for the resulting field is a `geo_shape`.	2019-08-07 10:37:09 -05:00
István Zoltán Szabó	95d3a8e8ad	[DOCS] Reformats cluster stats API and expands common params (#45270 ) Co-Authored-By: James Rodewig <james.rodewig@elastic.co>	2019-08-07 16:49:58 +02:00
Lee Hinman	c7ec0b8431	Include in-progress snapshot for a policy with get SLM policy… (#45245 ) This commit adds the "in_progress" key to the SLM get policy API, returning a policy that looks like: ```json { "daily-snapshots" : { "version" : 1, "modified_date" : "2019-08-05T18:41:48.778Z", "modified_date_millis" : 1565030508778, "policy" : { "name" : "<production-snap-{now/d}>", "schedule" : "0 30 1 * * ?", "repository" : "repo", "config" : { "indices" : [ "foo-*", "important" ], "ignore_unavailable" : true, "include_global_state" : false }, "retention" : { "expire_after" : "10m" } }, "last_success" : { "snapshot_name" : "production-snap-2019.08.05-oxctmnobqye3luim4uejhg", "time_string" : "2019-08-05T18:42:23.257Z", "time" : 1565030543257 }, "next_execution" : "2019-08-06T01:30:00.000Z", "next_execution_millis" : 1565055000000, "in_progress" : { "name" : "production-snap-2019.08.05-oxctmnobqye3luim4uejhg", "uuid" : "t8Idqt6JQxiZrzp0Vt7z6g", "state" : "STARTED", "start_time" : "2019-08-05T18:42:22.998Z", "start_time_millis" : 1565030542998 } } } ``` These are only visible while the snapshot is being taken (or failed), since it reads from the cluster state rather than from the repository itself.	2019-08-07 08:29:49 -06:00
István Zoltán Szabó	9384774b4c	[DOCS] Adds supported time units ref to the frequency and delay params. (#45283 )	2019-08-07 16:19:59 +02:00
Alpar Torok	0ea00e4861	Change how we pick bwc versions to check out (#45189 ) Prior to this PR we always checked out the latest bwc branches and had an external mechanism to store the bwc versions used for every CI run so we could both reproduce those builds and run additional tests using the same combination. This adds complexities in setting up and maintaining CI and makes it difficult to set up multi jobs. This change replaces that mechanism with a time based approach that looks at the commit date of the current revision and picks the newest on the bwc branch that's still older than that. It also makes sure there are no merge commits in this interval. This new behavior will is ment to be enabled in CI only, for everything except PR checks that will still use last available bwc revision.	2019-08-07 16:44:38 +03:00
James Rodewig	46fc989ca2	[DOCS] Reformats cat nodeattrs API (#45255 )	2019-08-07 09:31:37 -04:00
James Rodewig	5ade756275	[DOCS] Reformats cat indices API (#45239 )	2019-08-07 09:08:35 -04:00

... 5 6 7 8 9 ...

47557 Commits All Branches Search

47557 Commits

All Branches