OpenSearch

Commit Graph

Author	SHA1	Message	Date
Alan Woodward	a646f85a99	Ensure TokenFilters only produce single tokens when parsing synonyms (#34331 ) A number of tokenfilters can produce multiple tokens at the same position. This is a problem when using token chains to parse synonym files, as the SynonymMap requires that there are no stacked tokens in its input. This commit ensures that when used to parse synonyms, these tokenfilters either produce a single version of their input token, or that they throw an error when mappings are generated. In indexes created in elasticsearch 6.x deprecation warnings are emitted in place of the error. * asciifolding and cjk_bigram produce only the folded or bigrammed token * decompounders, synonyms and keyword_repeat are skipped * n-grams, word-delimiter-filter, multiplexer, fingerprint and phonetic throw errors Fixes #34298	2018-11-29 10:35:38 +00:00
Nik Everett	a45626deb5	Analysis: Wrap at 140 columns (#34494 ) Applies our standard column width to all analysis plugins.	2018-10-17 16:17:25 -04:00
Jim Ferenczi	7ad71f906a	Upgrade to a Lucene 8 snapshot (#33310 ) The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly. Some comments about the change: * Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309 Closes #32899	2018-09-06 14:42:06 +02:00
Russ Cam	e2b665c2e6	Consistent encoder names (#29492 ) This commit updates encoder names to be consistent within documentation and align with snake casing convention.	2018-07-24 09:21:43 +10:00
Tanguy Leroux	bf58660482	Remove all unused imports and fix CRLF (#31207 ) The X-Pack opening and the recent other refactorings left a lot of unused imports in the codebase. This commit removes them all.	2018-06-11 15:12:12 +02:00
Jim Ferenczi	b82017cbfe	Fix daitch_mokotoff phonetic filter to use the dedicated Lucene filter (#28225 ) This commit changes the phonetic filter factory to use a DaitchMokotoffSoundexFilter instead of a PhoneticFilter with a daitch_mokotoff encoder when daitch_mokotoff is selected. The latter does not hanlde branching when computing the soundex and fails to encode multiple variations when possible. Closes #28211	2018-01-15 19:35:54 +01:00
Christoph Büscher	9253ea8aec	Fix beidermorse phonetic token filter for unspecified `languageset` (#27112 ) Currently, when we create a BeiderMorseFilter with an unspecified `languageset`, the filter will not guess the language, which should be the default behaviour. This change fixes this and adds a simple test for the cases with and without provided `languageset` settings. Closes #26771	2017-10-27 10:07:36 +02:00
Simon Willnauer	cdd7c1e6c2	Return List instead of an array from settings (#26903 ) Today we return a `String[]` that requires copying values for every access. Yet, we already store the setting as a list so we can also directly return the unmodifiable list directly. This makes list / array access in settings a much cheaper operation especially if lists are large.	2017-10-09 09:52:08 +02:00
Simon Willnauer	aab4655e63	Unify Settings xcontent reading and writing (#26739 ) This change adds a fromXContent method to Settings that allows to read the xcontent that is produced by toXContent. It also replaces the entire settings loader infrastructure and removes the structured map representation. Future PRs will also tackle the `getAsMap` that exposes the internal represenation of settings for better encapsulation.	2017-09-25 13:23:01 +02:00
Adrien Grand	eb782492be	Remove support for lenient booleans. Closes #22298	2017-08-28 09:56:01 +02:00
Ryan Ernst	2a65bed243	Tests: Change rest test extension from .yaml to .yml (#24659 ) This commit renames all rest test files to use the .yml extension instead of .yaml. This way the extension used within all of elasticsearch for yaml is consistent.	2017-05-16 17:24:35 -07:00
Nik Everett	bb06d8ec4f	Allow plugins to build pre-configured token filters (#24223 ) This changes the way we register pre-configured token filters so that plugins can declare them and starts to move all of the pre-configured token filters out of core. It doesn't finish the job because doing so would make the change unreviewably large. So this PR includes a shim that keeps the "old" way of registering pre-configured token filters around. The Lowercase token filter is special because there is a "special" interaction between it and the lowercase tokenizer. I'm not sure exactly what to do about it so for now I'm leaving it alone with the intent of figuring out what to do with it in a followup. This also renames these pre-configured token filters from "pre-built" to "pre-configured" because that seemed like a more descriptive name. This is a part of #23658	2017-05-09 14:50:49 -04:00
Koen De Groote	0fef5acd01	Cleanup collections construction This commit cleans up some cases where a list or map was being constructed, and then an existing collection was copied into the new collection. The clean is to instead use an appropriate constructor to directly copy the existing collection in during collection construction. The advantage of this is that the new collection is sized appropriately. Relates #24409	2017-04-30 21:26:51 -04:00
Ryan Ernst	212f24aa27	Tests: Clean up rest test file handling (#21392 ) This change simplifies how the rest test runner finds test files and removes all leniency. Previously multiple prefixes and suffixes would be tried, and tests could exist inside or outside of the classpath, although outside of the classpath never quite worked. Now only classpath tests are supported, and only one resource prefix is supported, `/rest-api-spec/tests`. closes #20240	2017-04-18 15:07:08 -07:00
Luca Cavanna	cc65a94fd4	[TEST] improve yaml test sections parsing (#23407 ) Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to. Closes #21980 * [TEST] fix indentation in matrix_stats yaml tests * [TEST] fix indentation in painless yaml test * [TEST] fix indentation in analysis yaml tests * [TEST] fix indentation in generated docs yaml tests * [TEST] fix indentation in multi_cluster_search yaml tests	2017-03-02 12:43:20 +01:00
Jason Tedor	9781b88a38	Fix deprecation logging for lenient booleans This commit fixes an issue with deprecation logging for lenient booleans. The underlying issue is that adding deprecation logging for lenient booleans added a static deprecation logger to the Settings class. However, the Settings class is initialized very early and in CLI tools can be initialized before logging is initialized. This leads to status logger error messages. Additionally, the deprecation logging for a lot of the settings does not provide useful context (for example, in the token filter factories, the deprecation logging only produces the name of the setting, but gives no context which token filter factory it comes from). This commit addresses both of these issues by changing the call sites to push a deprecation logger through to the lenient boolean parsing. Relates #22696	2017-01-19 12:30:33 -05:00
Daniel Mitterdorfer	aece89d6a1	Make boolean conversion strict (#22200 ) This PR removes all leniency in the conversion of Strings to booleans: "true" is converted to the boolean value `true`, "false" is converted to the boolean value `false`. Everything else raises an error.	2017-01-19 07:59:18 +01:00
Nik Everett	f5f2149ff2	Remove much ceremony from parsing client yaml test suites (#22311 ) * Remove a checked exception, replacing it with `ParsingException`. * Remove all Parser classes for the yaml sections, replacing them with static methods. * Remove `ClientYamlTestFragmentParser`. Isn't used any more. * Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods. I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.	2016-12-22 11:00:34 -05:00
Ryan Ernst	7a2c984bcc	Test: Remove multi process support from rest test runner (#21391 ) At one point in the past when moving out the rest tests from core to their own subproject, we had multiple test classes which evenly split up the tests to run. However, we simplified this and went back to a single test runner to have better reproduceability in tests. This change removes the remnants of that multiplexing support.	2016-11-07 15:07:34 -08:00
Jun Ohtani	a66c76eb44	Merge pull request #20704 from johtani/remove_request_params_in_analyze_api Removing request parameters in _analyze API	2016-10-27 17:43:18 +09:00
Tanguy Leroux	44ac5d057a	Remove empty javadoc (#20871 ) This commit removes as many as empty javadocs comments my regexp has found	2016-10-12 10:27:09 +02:00
Jun Ohtani	370f0b885e	Removing request parameters in _analyze API Remove request params in _analyze API without index param Change rest-api-test using JSON Change docs using JSON Closes #20246	2016-10-07 16:23:24 +09:00
Simon Willnauer	fe1803c957	Remove AnalysisService and reduce it to a simple name to analyzer mapping (#20627 ) Today we hold on to all possible tokenizers, tokenfilters etc. when we create an index service on a node. This was mainly done to allow the `_analyze` API to directly access all these primitive. We fixed this in #19827 and can now get rid of the AnalysisService entirely and replace it with a simple map like class. This ensures we don't create a gazillion long living objects that are entirely useless since they are never used in most of the indices. Also those objects might consume a considerable amount of memory since they might load stopwords or synonyms etc. Closes #19828	2016-09-23 08:53:50 +02:00
Chris Earle	1cf694b63e	Use StringBuilder in favor of StringBuffer This removes all instances of StringBuffer that are removeable. Uncontended synchronization in Java is pretty cheap, but it's unnecessary.	2016-08-25 16:20:03 -04:00
Nik Everett	9270e8b22b	Rename client yaml test infrastructure This makes it obvious that these tests are for running the client yaml suites. Now that there are other ways of running tests using the REST client against a running cluster we can't go on calling the shared client yaml tests "REST tests". They are rest tests, but they aren't the rest tests.	2016-07-26 13:53:44 -04:00
Nik Everett	a95d4f4ee7	Add Location header and improve REST testing This adds a header that looks like `Location: /test/test/1` to the response for the index/create/update API. The requirement for the header comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html https://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative URIs are OK. So we use an absolute path which should resolve to the appropriate location. Closes #19079 This makes large changes to our rest test infrastructure, allowing us to write junit tests that test a running cluster via the rest client. It does this by splitting ESRestTestCase into two classes: * ESRestTestCase is the superclass of all tests that use the rest client to interact with a running cluster. * ESClientYamlSuiteTestCase is the superclass of all tests that use the rest client to run the yaml tests. These tests are shared across all official clients, thus the `ClientYamlSuite` part of the name.	2016-07-25 17:02:40 -04:00
Ali Beyad	19d0dbcd17	Removes waiting for yellow cluster health upon index (#19460 ) creation in the REST tests, as we no longer need it due to index creation now waiting for active shard copies before returning (by default, it waits for the primary of each shard, which is the same as ensuring yellow health). Relates #19450	2016-07-15 17:18:34 -04:00
Tanguy Leroux	8c40b2b54e	Fix order of modifiers	2016-07-01 16:57:14 +02:00
Nik Everett	71b95fb63c	Switch analysis from push to pull Instead of plugins calling `registerTokenizer` to extend the analyzer they now instead have to implement `AnalysisPlugin` and override `getTokenizer`. This lines up extending plugins in with extending scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry` immediately as part of its constructor which makes testing anslysis much simpler. This also moves the default analysis configuration into `AnalysisModule` which is how search is setup. Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`. Instead it is only responsible for building `AnslysisRegistry`. We still bind `AnalysisRegistry` but we only do so in `Node`. This is means it is available at module construction time so we slowly remove the need to bind it in guice.	2016-06-26 07:15:42 -04:00
Adrien Grand	7ba5bceebe	Add a MultiTermAwareComponent marker interface to analysis factories. #19028 This is the same as what Lucene does for its analysis factories, and we hawe tests that make sure that the elasticsearch factories are in sync with Lucene's. This is a first step to move forward on #9978 and #18064.	2016-06-23 10:19:24 +02:00
Ryan Ernst	a4503c2aed	Plugins: Remove name() and description() from api In 2.0 we added plugin descriptors which require defining a name and description for the plugin. However, we still have name() and description() which must be overriden from the Plugin class. This still exists for classpath plugins. But classpath plugins are mainly for tests, and even then, referring to classpath plugins with their class is a better idea. This change removes name() and description(), replacing the name for classpath plugins with the full class name.	2016-06-15 17:12:22 -07:00
Adrien Grand	42526ac28e	Remove Settings.settingsBuilder. We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly the same thing, so we should keep only one. I kept `Settings.builder` since it has my preference but also it is the one that we use in examples of the Java API.	2016-04-08 18:10:02 +02:00
Simon Willnauer	e91a141233	Prevent index level setting from being configured on a node level Today we allow to set all kinds of index level settings on the node level which is error prone and difficult to get right in a consistent manner. For instance if some analyzers are setup in a yaml config file some nodes might not have these analyzers and then index creation fails. Nevertheless, this change allows some selected settings to be specified on a node level for instance: * `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index * `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses All other index level setting must be specified on the index level. For existing clusters the index must be closed and all settings must be updated via the API on each of the indices. Closes #16799	2016-03-17 14:42:18 +01:00
Simon Willnauer	7a53a396e4	Remove Unneded @Inject annotations	2016-03-09 12:10:47 +01:00
Adrien Grand	eef19be072	Deprecate string in favor of text/keyword. #16877 This commit removes the ability to use string fields on indices created on or after 5.0. Dynamic mappings now generate text fields by default for strings but there are plans to also add a sub keyword field (in a future PR). Most of the changes in this commit are just about replacing string with keyword or text. Some tests have been removed because they existed because of corner cases of string mappings like setting ignore-above on a text field or enabling term vectors on a keyword field which are now impossible. The plan is to remove strings entirely in 6.0.	2016-03-03 10:20:56 +01:00
Simon Willnauer	e02d2e004e	Rewrite SettingsFilter to be immutable This change rewrites the entire settings filtering mechanism to be immutable. All filters must be registered up-front in the SettingsModule. Filters that are comma-sparated are not allowed anymore and check on registration. This commit also adds settings filtering to the default settings recently added to ensure we don't render filtered settings.	2016-02-03 20:05:55 +01:00
Boaz Leskes	2a137b5548	Make index uuid available in Index, ShardRouting & ShardId In the early days Elasticsearch used to use the index name as the index identity. Around 1.0.0 we introduced a unique index uuid which is stored in the index setting. Since then we used that uuid in a few places but it is by far not the main identifier when working with indices, partially because it's not always readily available in all places. This PR start to make a move in the direction of using uuids instead of name by making sure that the uuid is available on the Index class (currently just a wrapper around the name) and as such also available via ShardRouting and ShardId. Note that this is by no means an attempt to do the right thing with the uuid in all places. In almost all places it falls back to the name based comparison that was done before. It is meant as a first step towards slowly improving the situation. Closes #16217	2016-01-28 08:40:10 +01:00
Daniel Mitterdorfer	e9bb3d31a3	Convert "path.*" and "pidfile" to new settings infra	2016-01-22 15:14:13 +01:00
Simon Willnauer	fbfa9f4925	Merge branch 'master' into new_index_settings	2016-01-19 10:13:48 +01:00
Simon Willnauer	8e0390b09e	Register index.version_created for several analysis plugin tests	2016-01-19 09:35:32 +01:00
Ryan Ernst	ef4f0a8699	Test: Make rest test framework accept http directly for the test cluster The rest test framework, because it used to be tightly integrated with ESIntegTestCase, currently expects the addresses for the test cluster to be passed using the transport protocol port. However, it only uses this to then find the http address. This change makes ESRestTestCase extend from ESTestCase instead of ESIntegTestCase, and changes the sysprop used to tests.rest.cluster, which now takes the http address. closes #15459	2016-01-18 16:44:14 -08:00
Nik Everett	0786c506dc	Remove a few more Xlint skips	2016-01-06 23:28:13 -05:00
Ryan Ernst	4ea19995cf	Remove wildcard imports	2015-12-18 12:43:47 -08:00
Simon Willnauer	9f6598b18d	Fix compile errors	2015-11-26 13:41:00 +01:00
David Pilato	52bf365013	Add support for `daitch_mokotoff` [Daitch Mokotoff](https://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex) support has been added in Lucene 5. So we can now support it as well.	2015-11-18 15:41:39 +01:00
Boaz Leskes	ac0da91bf7	Extend usage of IndexSetting class I decided to leave external listeners (used by plugins) alone, for now. Closes #14731	2015-11-13 14:30:23 +01:00
Simon Willnauer	aa38d053d7	Simplify Analysis registration and configuration This change moves all the analysis component registration to the node level and removes the significant API overhead to register tokenfilter, tokenizer, charfilter and analyzer. All registration is done without guice interaction such that real factories via functional interfaces are passed instead of class objects that are instantiated at runtime. This change also hides the internal analyzer caching that was done previously in the IndicesAnalysisService entirely and decouples all analysis registration and creation from dependency injection.	2015-10-30 11:40:18 +01:00
Simon Willnauer	8a9dd871d3	Make IndexSettings also own the IndexMetaData and separate node settings	2015-10-23 10:53:39 +02:00
Simon Willnauer	66d5d0c4f2	Replace IndexSettings annotation with a full-fledged class The @IndexSettings annoationat has been used to differentiate between node-level and index level settings. It was also decoupled from realtime-updates such that the settings object that a class got injected when it was created was static and not subject to change when an update was applied. This change removes the annoation and replaces it with a full-fledged class that adds type-safety and encapsulates additional functionality as well as checks on the settings.	2015-10-22 20:43:41 +02:00
Nik Everett	2cc97a0d3e	Remove and ban @Test There are three ways `@Test` was used. Way one: ```java @Test public void flubTheBlort() { ``` This way was always replaced with: ```java public void testFlubTheBlort() { ``` Or, maybe with a better method name if I was feeling generous. Way two: ```java @Test(throws=IllegalArgumentException.class) public void testFoo() { methodThatThrows(); } ``` This way of using `@Test` is actually pretty OK, but to get the tools to ban `@Test` entirely it can't be used. Instead: ```java public void testFoo() { try { methodThatThrows(); fail("Expected IllegalArgumentException"); } catch (IllegalArgumentException e ) { assertThat(e.getMessage(), containsString("something")); } } ``` This is longer but tests more than the old ways and is much more precise. Compare: ```java @Test(throws=IllegalArgumentException.class) public void testFoo() { some(); copy(); and(); pasted(); methodThatThrows(); code(); // <---- This was left here by mistake and is never called } ``` to: ```java @Test(throws=IllegalArgumentException.class) public void testFoo() { some(); copy(); and(); pasted(); try { methodThatThrows(); fail("Expected IllegalArgumentException"); } catch (IllegalArgumentException e ) { assertThat(e.getMessage(), containsString("something")); } } ``` The final use of test is: ```java @Test(timeout=1000) public void testFoo() { methodThatWasSlow(); } ``` This is the most insidious use of `@Test` because its tempting but tragically flawed. Its flaws are: 1. Hard and fast timeouts can look like they are asserting that something is faster and even do an ok job of it when you compare the timings on the same machine but as soon as you take them to another machine they start to be invalid. On a slow VM both the new and old methods fail. On a super-fast machine the slower and faster ways succeed. 2. Tests often contain slow `assert` calls so the performance of tests isn't sure to predict the performance of non-test code. 3. These timeouts are rude to debuggers because the test just drops out from under it after the timeout. Confusingly, timeouts are useful in tests because it'd be rude for a broken test to cause CI to abort the whole build after it hits a global timeout. But those timeouts should be very very long "backstop" timeouts and aren't useful assertions about speed. For all its flaws `@Test(timeout=1000)` doesn't have a good replacement __in__ __tests__. Nightly benchmarks like http://benchmarks.elasticsearch.org/ are useful here because they run on the same machine but they aren't quick to check and it takes lots of time to figure out the regressions. Sometimes its useful to compare dueling implementations but that requires keeping both implementations around. All and all we don't have a satisfactory answer to the question "what do you replace `@Test(timeout=1000)`" with. So we handle each occurrence on a case by case basis. For files with `@Test` this also: 1. Removes excess blank lines. They don't help anything. 2. Removes underscores from method names. Those would fail any code style checks we ever care to run and don't add to readability. Since I did this manually I didn't do it consistently. 3. Make sure all test method names start with `test`. Some used to end in `Test` or start with `verify` or `check` and they were picked up using the annotation. Without the annotation they always need to start with `test`. 4. Organizes imports using the rules we generate for Eclipse. For the most part this just removes `` imports which is a win all on its own. It was "required" to quickly remove `@Test`. 5. Removes unneeded casts. This is just a setting I have enabled in Eclipse and forgot to turn off before I did this work. It probably isn't hurting anything. 6. Removes trailing whitespace. Again, another Eclipse setting I forgot to turn off that doesn't hurt anything. Hopefully. 7. Swaps some tests override superclass tests to make them empty with `assumeTrue` so that the reasoning for the skips is logged in the test run and it doesn't "look like" that thing is being tested when it isn't. 8. Adds an oxford comma to an error message. The total test count doesn't change. I know. I counted. ```bash git checkout master && mvn clean && mvn install \| tee with_test git no_test_annotation master && mvn clean && mvn install \| tee not_test grep 'Tests summary' with_test > with_test_summary grep 'Tests summary' not_test > not_test_summary diff with_test_summary not_test_summary ``` These differ somewhat because some tests are skipped based on the random seed. The total shouldn't differ. But it does! ``` 1c1 < [INFO] Tests summary: 564 suites (1 ignored), 3171 tests, 31 ignored (31 assumptions) --- > [INFO] Tests summary: 564 suites (1 ignored), 3167 tests, 17 ignored (17 assumptions) ``` These are the core unit tests. So we dig further: ```bash cat with_test \| perl -pe 's/\n// if /^Suite/;s/.\n// if /IGNOR/;s/.\n// if /Assumption #/;s/.\n// if /HEARTBEAT/;s/Completed .+?,//' \| grep Suite > with_test_suites cat not_test \| perl -pe 's/\n// if /^Suite/;s/.\n// if /IGNOR/;s/.\n// if /Assumption #/;s/.*\n// if /HEARTBEAT/;s/Completed .+?,//' \| grep Suite > not_test_suites diff <(sort with_test_suites) <(sort not_test_suites) ``` The four tests with lower test numbers are all extend `AbstractQueryTestCase` and all have a method that looks like this: ```java @Override public void testToQuery() throws IOException { assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0); super.testToQuery(); } ``` It looks like this method was being double counted on master and isn't anymore. Closes #14028	2015-10-20 17:37:36 -04:00

1 2

64 Commits