OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	886db84ad2	Expose Lucene's FeatureField. (#30618 ) Lucene has a new `FeatureField` which gives the ability to record numeric features as term frequencies. Its main benefit is that it allows to boost queries with the values of these features and efficiently skip non-competitive documents at the same time using block-max WAND and indexed impacts.	2018-05-23 08:55:21 +02:00
Nhat Nguyen	1918a30237	Upgrade to Lucene-7.4.0-snapshot-cc2ee23050 (#30778 ) The new snapshot includes LUCENE-8324 which fixes missing checkpoint after a fully deletes segment is dropped on flush. This snapshot should resolves failed tests in the CorruptedFileIT suite. Closes #30741 Closes #30577	2018-05-22 13:11:48 -04:00
Tim Brooks	31251c9a6d	Make http pipelining support mandatory (#30695 ) This is related to #29500 and #28898. This commit removes the abilitiy to disable http pipelining. After this commit, any elasticsearch node will support pipelined requests from a client. Additionally, it extracts some of the http pipelining work to the server module. This extracted work is used to implement pipelining for the nio plugin.	2018-05-22 09:29:31 -06:00
Itamar Syn-Hershko	5f172b6795	[Feature] Adding a char_group tokenizer (#24186 ) === Char Group Tokenizer The `char_group` tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of the <<analysis-pattern-tokenizer, `pattern` tokenizer>> is not acceptable. === Configuration The `char_group` tokenizer accepts one parameter: `tokenize_on_chars`:: A string containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a new token is started. Also supports escaped values like `\\n` and `\\f`, and in addition `\\s` to represent whitespace, `\\d` to represent digits and `\\w` to represent letters. Defaults to an empty list. === Example output ```The 2 QUICK Brown-Foxes jumped over the lazy dog's bone for $2``` When the configuration `\\s-:<>` is used for `tokenize_on_chars`, the above sentence would produce the following terms: ```[ The, 2, QUICK, Brown, Foxes, jumped, over, the, lazy, dog's, bone, for, $2 ]```	2018-05-22 16:26:31 +02:00
Ryan Ernst	34180f2285	Scripting: Remove getDate methods from ScriptDocValues (#30690 ) The getDate() and getDates() existed prior to 5.x on long fields in scripting. In 5.x, a new Date type for ScriptDocValues was added. The getDate() and getDates() methods were left on long fields and added to date fields to ease the transition. This commit removes those methods for 7.0.	2018-05-18 21:26:26 -07:00
Nhat Nguyen	67d8fc222d	Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726 ) This snapshot resolves issues related to ShrinkIndexIT.	2018-05-18 18:21:39 -04:00
Zachary Tong	d120fb222c	[TEST] Adjust version skips for movavg/movfn tests Since the MovFn PR was backported to 6.x, we can adjust the version skip numbers in master to correctly match 6.3.99 instead of 6.4.0	2018-05-17 18:07:52 +00:00
Christoph Büscher	b6340658f4	Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209 ) The camel case name `nGram` should be removed in favour of `ngram` and similar for `edgeNGram` and `edge_ngram`. Before removal, we need to deprecate the camel case names first. This change adds deprecation warnings for indices with versions 6.4.0 and higher and logs deprecation warnings.	2018-05-17 12:52:22 +02:00
Shashwat Anand	f0da3da6b0	Reindex: Fixed typo in assertion failure message (#30619 ) Fix a typo in an assertion failure message.	2018-05-16 16:26:23 -04:00
Ke Li	d2b9a765cf	Remove version argument in RangeFieldType (#30411 ) The argument `indexVersionCreated` is not needed any more and can be removed.	2018-05-16 17:42:44 +02:00
Zachary Tong	df853c49c0	Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (#29594 ) This pipeline aggregation gives the user the ability to script functions that "move" across a window of data, instead of single data points. It is the scripted version of MovingAvg pipeline agg. Through custom script contexts, we expose a number of convenience methods: - MovingFunctions.max() - MovingFunctions.min() - MovingFunctions.sum() - MovingFunctions.unweightedAvg() - MovingFunctions.linearWeightedAvg() - MovingFunctions.ewma() - MovingFunctions.holt() - MovingFunctions.holtWinters() - MovingFunctions.stdDev() The user can also define any arbitrary logic via their own scripting, or combine with the above methods.	2018-05-16 10:57:00 -04:00
Tim Brooks	99b9ab58e2	Add nio http server transport (#29587 ) This commit is related to #28898. It adds an nio driven http server transport. Currently it only supports basic http features. Cors, pipeling, and read timeouts will need to be added in future PRs.	2018-05-15 16:37:14 -06:00
Julie Tibshirani	4f9dd37169	Add support for search templates to the high-level REST client. (#30473 )	2018-05-15 13:07:58 -07:00
Jason Tedor	4a4e3d70d5	Default to one shard (#30539 ) This commit changes the default out-of-the-box configuration for the number of shards from five to one. We think this will help address a common problem of oversharding. For users with time-based indices that need a different default, this can be managed with index templates. For users with non-time-based indices that find they need to re-shard with the split API in place they no longer need to resort only to reindexing. Since this has the impact of changing the default number of shards used in REST tests, we want to ensure that we still have coverage for issues that could arise from multiple shards. As such, we randomize (rarely) the default number of shards in REST tests to two. This is managed via a global index template. However, some tests check the templates that are in the cluster state during the test. Since this template is randomly there, we need a way for tests to skip adding the template used to set the number of shards to two. For this we add the default_shards feature skip. To avoid having to write our docs in a complicated way because sometimes they might be behind one shard, and sometimes they might be behind two shards we apply the default_shards feature skip to all docs tests. That is, these tests will always run with the default number of shards (one).	2018-05-14 12:22:35 -04:00
Christoph Büscher	cc93131318	Forbid expensive query parts in ranking evaluation (#30151 ) Currently the ranking evaluation API accepts the full query syntax for the queries specified in the evaluation set and executes them via multi search. This potentially runs costly aggregations and suggestions too. This change adds checks that forbid using aggregations, suggesters, highlighters and the explain and profile options in the queries that are run as part of the ranking evaluation since they are irrelevent in the context of this API.	2018-05-14 17:36:26 +02:00
Alpar Torok	9a5555963b	Add missing dependencies on testClasses (#30527 )	2018-05-14 16:06:56 +03:00
Martijn van Groningen	7b95470897	Moved tokenizers to analysis common module (#30538 ) The following tokenizers were moved: classic, edge_ngram, letter, lowercase, ngram, path_hierarchy, pattern, thai, uax_url_email and whitespace. Left keyword tokenizer factory in server module, because normalizers directly depend on it.This should be addressed on a follow up change. Relates to #23658	2018-05-14 07:55:01 +02:00
Daniel Mitterdorfer	09cf530f4b	Derive max composite buffers from max content len With this commit we determine the maximum number of buffers that Netty keeps while accumulating one HTTP request based on the maximum content length (default 1500 bytes, overridable with the system property `es.net.mtu`). Previously, we kept the default value of 1024 which is too small for bulk requests which leads to unnecessary copies of byte buffers internally. Relates #29448	2018-05-11 10:01:09 +02:00
Nhat Nguyen	519768b5d3	Upgrade to Lucene-7.4-snapshot-6705632810 (#30519 ) This snapshot is to include LUCENE-8298 which allows DocValues updates to reset a value. This is needed for the Lucene rollback work.	2018-05-10 12:31:45 -04:00
Nik Everett	51fa8739ea	Reindex: Fold "with all deps" project into reindex (#30154 ) This folds the `:qa:smoke-test-reindex-with-all-modules` project into `:modules:reindex` by declaring the reindex's integration testing cluster requires the `parent-join` and `lang-painless` plugins and then moving all of the integration tests that depended on parent-join and painless into reindex. It saves us one cluster start up during the build at the cost of a little of the reindex module's "purity". Since the reindex module does have unit tests that test scripting without painless I'm fairly ok with that.	2018-05-10 08:02:23 -04:00
Nik Everett	b4502dbf74	LLClient: Add setJsonEntity (#30447 ) Adds `Request#setJsonEntity(String)` which short circuits the process of sending a json string which is super common.	2018-05-09 18:33:03 -04:00
Yu	2228e6e663	BulkProcessor to retry based on status code (#29329 ) Previously `BulkProcessor` retry logic was based on the exception type of the failed response (`EsRejectedExecutionException`). This commit changes it to be based on the returned status code. This allows us to reproduce the same retry behaviour when the `BulkProcessor` is used from the high-level REST client, which was previously not the case as we cannot rebuild the same exception type when parsing back the response. This change has no effect on the transport client. Closes #28885	2018-05-09 14:27:58 +02:00
Nik Everett	ef4ecb1f1e	Reindex: Use request flavored methods (#30317 ) Use the new request flavored methods for the low level rest client introduced in #29623 in reindex.	2018-05-07 17:14:38 -04:00
Jim Ferenczi	dbd857341f	Upgrade to 7.4.0-snapshot-1ed95c097b (#30357 ) Upgrade to lucene-7.4.0-snapshot-1ed95c097b This version contains: * An Analyzer for Korean * An IntervalQuery and IntervalsSource that retrieve minimum intervals of positional queries. * A new API to retrieve matches (offsets and positions) of a query for a single document. * Support for soft deletes in the index writer. * A fixed shingle filter that handles index time synonyms. * Support for emoji sequence in ICUTokenizer (with an upgrade to icu 61.1)	2018-05-04 11:44:22 +02:00
Ryan Ernst	fb0aa562a5	Network: Remove http.enabled setting (#29601 ) This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase. closes #12792	2018-05-02 11:42:05 -07:00
Adrien Grand	368ddc408f	Remove MapperService#types(). (#29617 ) This isn't be necessary with a single type per index.	2018-05-02 11:35:12 +02:00
Adrien Grand	231a63fdf8	Remove useless version checks in REST tests. (#30165 ) Many tests are added with a version check so that they do not run against a version that doesn't have the feature yet. Master is 7.0, so all tests that do not run against 6.0+ can be removed and the version check can be removed on all tests that always run on 6.0+.	2018-05-02 11:34:15 +02:00
Nik Everett	0be443c5bb	REST Client: Add Request object flavored methods (#29623 ) Adds two new methods to `RestClient` that take a `Request` object. These methods will allows us to add more per-request customizable options without creating more and more and more overloads of the `performRequest` and `performRequestAsync` methods. These new methods look like: ``` Response performRequest(Request request) ``` and ``` void performRequestAsync(Request request, ResponseListener responseListener) ``` This change doesn't add any actual features but enables adding things like per request timeouts and per request node selectors. This change does rework the `HighLevelRestClient` and its tests to use these new `Request` objects and it does update the docs.	2018-05-01 14:31:23 -04:00
Nik Everett	d12e644206	Build: Log a warning if disabling reindex-from-old (#30304 ) We disable the reindex-from-old tests if we're running on windows or in a directory that contains a space. This adds a warning to the logs when we do that so that you can tell that it happened. This will be nice to have when looking at CI and will be a hint to anyone developing locally.	2018-05-01 11:23:18 -04:00
David Turner	d2ca16b4c7	Suppress reindex-from-old tests if there are spaces in the path	2018-05-01 14:32:13 +01:00
Nik Everett	9c8e015552	Build: Mostly silence warning about html4 javadoc (#30220 ) This mostly silences `javadoc`'s warning about defaulting to generating html4 files by enabling generating html5 file for the projects for which that works. It didn't work in a half dozen projects, about half of which I've fixed in this PR, entirely by replacing `<tt>thing</tt>` with `{@code thing}`. There are a few remaining projects that contain javadoc with invalid html5. I'll fix those projects in a followup.	2018-04-28 09:50:54 -04:00
Nik Everett	8401eac425	Test: Switch painless test to 1 shard We think that #28600 is caused by warnings not being collected during one of the fan out phases of search but we're not 100% sure how this is happening. This commit drops the number of shards used for the test to 1 so there isn't a fan out phase. If this makes the issue go away we'll have more information.	2018-04-27 15:01:42 -04:00
Nik Everett	912fbb2211	Reindex: Fold "from old" tests into reindex module (#30142 ) This folds the `:qa:reindex-from-old` project into the `:modules:reindex` project. This should speed up the build marginally by removing a single clsuter start up at the cost of having to wait for old versions of Elasticsearch to start up when checking reindex's integration tests. Those don't take that long so this feels worth it.	2018-04-27 14:04:37 -04:00
Tanguy Leroux	b15631ee54	[Test] Fix RenameProcessorTests.testRenameExistingFieldNullValue() (#29655 ) This test fails when the new field name already exists in the ingest document.	2018-04-26 17:26:37 +02:00
Christoph Büscher	d0f6657d90	Add tests for ranking evaluation with aliases (#29452 ) The ranking evaluation requests so far were not tested against aliases but they should run regardless of the targeted index is a real index or an alias. This change adds cases for this to the integration and rest tests.	2018-04-19 17:00:52 +02:00
Christoph Büscher	24763d881e	Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (#27429 ) The camel case name `htmlStip` should be removed in favour of `html_strip`, but we need to deprecate it first. This change adds deprecation warnings for indices with version starting with 6.3.0 and logs deprecation warnings in this cases.	2018-04-19 16:48:17 +02:00
Christoph Büscher	7c56cc2624	Make ranking evaluation details accessible for client Allow high level java rest client to access details of the metric calculation by making them accessible across packages. Also renaming the inner `Breakdown` classes of the evaluation metrics to `Detail` to better communicate their use.	2018-04-19 14:39:41 +02:00
Jason Tedor	c12c2a6cc9	Rename the bulk thread pool to write thread pool (#29593 ) This commit renames the bulk thread pool to the write thread pool. This is to better reflect the fact that the underlying thread pool is used to execute any document write request (single-document index/delete/update requests, and bulk requests). With this change, we add support for fallback settings thread_pool.bulk.* which will be supported until 7.0.0. We also add a system property so that the display name of the thread pool remains as "bulk" if needed to avoid breaking users.	2018-04-19 08:18:58 -04:00
Christoph Büscher	fa1052017c	[Test] Minor changes to rank_eval tests (#29577 ) Removing an enum in favour of local constants to simplify tests and removing a few deprecated method calls and warnings.	2018-04-19 13:50:18 +02:00
Martijn van Groningen	8afa7c174f	Added painless execute api. (#29164 ) Added an api that allows to execute an arbitrary script and a result to be returned. ``` POST /_scripts/painless/_execute { "script": { "source": "params.var1 / params.var2", "params": { "var1": 1, "var2": 1 } } } ``` Relates to #27875	2018-04-19 09:33:34 +02:00
Jack Conradson	da9a6899ff	Painless: modify grammar to allow more statement delimiters (#29566 ) This allows the grammar to determine when and what delimiters statements will use by splitting up the statements into regular statements and delimited statements, those that do not require a delimiter versus those that do. This allows consumers of the statements to determine what delimiters the statements will use so that in certain cases semicolons are not necessary like when there's a closing right bracket. This change removes the need for semicolon insertion in the lexer, simplifying the existing lexer quite a bit. It also ensures that there isn't a need to track semicolons being inserted into places that aren't necessary such as array initializers.	2018-04-18 10:32:42 -07:00
Adrien Grand	ebd6b5b7ba	Deprecate filtering on `_type`. (#29468 ) As indices are only allowed to have one type now, and types are going away in the future, we should deprecate filtering by `_type`. Relates #15613	2018-04-13 09:07:51 +02:00
Jim Ferenczi	fb81e2cacf	Fix template _msearch with extra tokens This change removes the check for extra tokens when parsing a source generated by a templated _msearch request. This was added unintentionally in #29428 but the intent of this modification was to validate simple _search request only.	2018-04-11 18:04:10 +02:00
Jim Ferenczi	1b6d5e531b	Fail _search request with trailing tokens (#29428 ) This change validates that the `_search` request does not have trailing tokens after the main object and fails the request with a parsing exception otherwise. Closes #28995	2018-04-11 13:10:22 +02:00
Adrien Grand	4918924fae	Remove legacy mapping code. (#29224 ) Some features have been deprecated since `6.0` like the `_parent` field or the ability to have multiple types per index. This allows to remove quite some code, which in-turn will hopefully make it easier to proceed with the removal of types.	2018-04-11 09:41:37 +02:00
Adrien Grand	a091d950a7	Deprecate slicing on `_uid`. (#29353 ) Deprecate slicing on `_uid`. `_id` should be used instead on 6.x.	2018-04-10 14:28:30 +02:00
Martijn van Groningen	182cf11f37	Fixed bug when non percolator docs end up in the search hits. In the case that a document with a percolator field is matched when using the `percolate` query then the fetch phase can fail due to the fact that the percolator can't resolve any query from that document. Closes #29429	2018-04-10 13:33:31 +02:00
Martijn van Groningen	2346f7fa89	removed unused import	2018-04-10 07:44:51 +02:00
Martijn van Groningen	f4395c0c94	Fixed a msm accounting error that can occur during analyzing a percolator query. In case of a disjunction query with both range and term based clauses and msm specified, the query analyzer needs to also reduce the msn if a range based clause for the same field is encountered. This did not happen. Instead of fixing this bug the logic has been simplified to just set a percolator query's msm to 1 if a disjunction contains range clauses and msm on disjunction has been specified. The logic would otherwise just get to complex and the performance gain isn't that much for this kind of percolator queries. In case a percolator query has clauses that have duplicate terms or ranges then for disjunction clauses with a minimum should match the query extraction of the clause with the lowest msm should be used and for conjunction queries query extractions wiht duplicate terms/ranges the msn should be ignored. If this is not done then percolator queries that should match never match. Example percolator query: value1 OR value2 OR value2 OR value3 OR value3 OR value3 OR value4 OR value5 (msm set to 3) In the above example query the extracted msm would be 3 Example document1: value1 value2 value3 With the msm and extracted terms this would match and is expected behaviour Example document2: value3 This document should match too (value3 appears in 3 clauses), but with msm set to 3 and the fact that fact that only distinct values are indexed in extracted terms field this document would Also added another random duel test. Closes #29393	2018-04-10 07:25:12 +02:00
Adrien Grand	0f00277851	Simplify analysis of `bool` queries. (#29430 ) This change tries to simplify the extraction logic of boolean queries by concentrating the logic into two methods: one that merges results for conjunctions, and another one for disjunctions. Other concerns, like the impact of prohibited clauses or how an `UnsupportedQueryException` should be treated are applied on top of those two methods. This is mostly a code reorganization, it doesn't change the result of query extraction except in the case that a query both has required clauses and a minimum number of `SHOULD` clauses that is greater than 1, which we now rewrite into a pure conjunction. For instance `(+A B C)~1` is rewritten into `(+A +(B C))` prior to extraction.	2018-04-09 16:34:45 +02:00

1 2 3 4 5 ...

4648 Commits