OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jun Ohtani	c75c8b6e9d	Expose discard_compound_token option to kuromoji_tokenizer (#57421 ) This commit exposes the new Lucene option `discard_compound_token` to the Elasticsearch Kuromoji plugin.	2020-06-05 15:41:01 +02:00
Jason Tedor	5fcda57b37	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 17:24:38 -04:00
Jim Ferenczi	fe2a7523ec	Add support for inlined user dictionary in the Kuromoji plugin (#45489 ) This change adds a new option called user_dictionary_rules to Kuromoji's tokenizer. It can be used to set additional tokenization rules to the Japanese tokenizer directly in the settings (instead of using a file). This commit also adds a check that no rules are duplicated since this is not allowed in the UserDictionary. Closes #25343	2019-08-21 16:28:30 +02:00
Alan Woodward	4b99255fed	Add name() method to TokenizerFactory (#43909 ) This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory, and removes the need to pass around tokenizer names when building custom analyzers. As this means that TokenizerFactory is no longer a functional interface, the commit also adds a factory method to TokenizerFactory to make construction simpler.	2019-07-04 11:28:55 +01:00
Julie Tibshirani	c2e9d13ebd	Default include_type_name to false in the yml test harness. (#38058 ) This PR removes the temporary change we made to the yml test harness in #37285 to automatically set `include_type_name` to `true` in index creation requests if it's not already specified. This is possible now that the vast majority of index creation requests were updated to be typeless in #37611. A few additional tests also needed updating here. Additionally, this PR updates the test harness to set `include_type_name` to `false` in index creation requests when communicating with 6.x nodes. This mirrors the logic added in #37611 to allow for typeless document write requests in test set-up code. With this update in place, we can remove many references to `include_type_name: false` from the yml tests.	2019-02-01 11:44:13 -08:00
Colin Goodheart-Smithe	21e392e95e	Removes typed calls from YAML REST tests (#37611 ) This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour. The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.	2019-01-30 16:32:58 +00:00
Jim Ferenczi	18866c4c0b	Make hits.total an object in the search response (#35849 ) This commit changes the format of the `hits.total` in the search response to be an object with a `value` and a `relation`. The `value` indicates the number of hits that match the query and the `relation` indicates whether the number is accurate (in which case the relation is equals to `eq`) or a lower bound of the total (in which case it is equals to `gte`). This change also adds a parameter called `rest_total_hits_as_int` that can be used in the search APIs to opt out from this change (retrieve the total hits as a number in the rest response). Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain `hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a follow up (to allow numbers to be passed to `track_total_hits`). Relates #33028	2018-12-05 19:49:06 +01:00
Jim Ferenczi	e37a0ef844	Upgrade to lucene-8.0.0-snapshot-67cdd21996 (#35816 )	2018-11-22 15:42:59 +01:00
Nik Everett	a45626deb5	Analysis: Wrap at 140 columns (#34494 ) Applies our standard column width to all analysis plugins.	2018-10-17 16:17:25 -04:00
Armin Braun	ed3b44fb4c	Handle TokenizerFactory TODOs (#32063 ) * Don't replace Replace TokenizerFactory with Supplier, this approach was rejected in #32063 * Remove unused parameter from constructor	2018-07-17 14:14:02 +02:00
Jim Ferenczi	891d3bd9c3	Expose the Lucene Korean analyzer module in a plugin (#30397 ) This change adds a new plugin called `analysis-nori` that exposes Korean text analysis in es using the new Lucene Korean analyzer module named (`nori`). The plugin adds: * a Korean analyzer: `nori` * a Korean tokenizer: `nori_tokenizer` * a part of speech stop filter: `nori_part_of_speech` * a filter that can replace Hanja characters with their Hangul transcription: `nori_readingform`	2018-05-04 20:46:13 +02:00
Simon Willnauer	aab4655e63	Unify Settings xcontent reading and writing (#26739 ) This change adds a fromXContent method to Settings that allows to read the xcontent that is produced by toXContent. It also replaces the entire settings loader infrastructure and removes the structured map representation. Future PRs will also tackle the `getAsMap` that exposes the internal represenation of settings for better encapsulation.	2017-09-25 13:23:01 +02:00
Claudio Bley	7184cf8b5b	Fix kuromoji default stoptags (#26600 ) Initialize the default stop-tags in `KuromojiPartOfSpeechFilterFactory` if the `stoptags` are not given in the config. Also adding a test which checks that part-of-speech tokens are removed when using the kuromoji_part_of_speech filter.	2017-09-15 12:25:09 +02:00
Adrien Grand	eb782492be	Remove support for lenient booleans. Closes #22298	2017-08-28 09:56:01 +02:00
desmorto	292dd8f992	(refactor) some opportunities to use diamond operator (#25585 ) * (refactor) some opportunities to use diamond operator * Update ExceptionRetryIT.java update typo	2017-08-15 16:36:42 -06:00
Ryan Ernst	2a65bed243	Tests: Change rest test extension from .yaml to .yml (#24659 ) This commit renames all rest test files to use the .yml extension instead of .yaml. This way the extension used within all of elasticsearch for yaml is consistent.	2017-05-16 17:24:35 -07:00
Nik Everett	bb06d8ec4f	Allow plugins to build pre-configured token filters (#24223 ) This changes the way we register pre-configured token filters so that plugins can declare them and starts to move all of the pre-configured token filters out of core. It doesn't finish the job because doing so would make the change unreviewably large. So this PR includes a shim that keeps the "old" way of registering pre-configured token filters around. The Lowercase token filter is special because there is a "special" interaction between it and the lowercase tokenizer. I'm not sure exactly what to do about it so for now I'm leaving it alone with the intent of figuring out what to do with it in a followup. This also renames these pre-configured token filters from "pre-built" to "pre-configured" because that seemed like a more descriptive name. This is a part of #23658	2017-05-09 14:50:49 -04:00
Ryan Ernst	212f24aa27	Tests: Clean up rest test file handling (#21392 ) This change simplifies how the rest test runner finds test files and removes all leniency. Previously multiple prefixes and suffixes would be tried, and tests could exist inside or outside of the classpath, although outside of the classpath never quite worked. Now only classpath tests are supported, and only one resource prefix is supported, `/rest-api-spec/tests`. closes #20240	2017-04-18 15:07:08 -07:00
Jason Tedor	9781b88a38	Fix deprecation logging for lenient booleans This commit fixes an issue with deprecation logging for lenient booleans. The underlying issue is that adding deprecation logging for lenient booleans added a static deprecation logger to the Settings class. However, the Settings class is initialized very early and in CLI tools can be initialized before logging is initialized. This leads to status logger error messages. Additionally, the deprecation logging for a lot of the settings does not provide useful context (for example, in the token filter factories, the deprecation logging only produces the name of the setting, but gives no context which token filter factory it comes from). This commit addresses both of these issues by changing the call sites to push a deprecation logger through to the lenient boolean parsing. Relates #22696	2017-01-19 12:30:33 -05:00
Daniel Mitterdorfer	aece89d6a1	Make boolean conversion strict (#22200 ) This PR removes all leniency in the conversion of Strings to booleans: "true" is converted to the boolean value `true`, "false" is converted to the boolean value `false`. Everything else raises an error.	2017-01-19 07:59:18 +01:00
Nik Everett	f5f2149ff2	Remove much ceremony from parsing client yaml test suites (#22311 ) * Remove a checked exception, replacing it with `ParsingException`. * Remove all Parser classes for the yaml sections, replacing them with static methods. * Remove `ClientYamlTestFragmentParser`. Isn't used any more. * Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods. I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.	2016-12-22 11:00:34 -05:00
Ryan Ernst	7a2c984bcc	Test: Remove multi process support from rest test runner (#21391 ) At one point in the past when moving out the rest tests from core to their own subproject, we had multiple test classes which evenly split up the tests to run. However, we simplified this and went back to a single test runner to have better reproduceability in tests. This change removes the remnants of that multiplexing support.	2016-11-07 15:07:34 -08:00
Jun Ohtani	a66c76eb44	Merge pull request #20704 from johtani/remove_request_params_in_analyze_api Removing request parameters in _analyze API	2016-10-27 17:43:18 +09:00
Tanguy Leroux	44ac5d057a	Remove empty javadoc (#20871 ) This commit removes as many as empty javadocs comments my regexp has found	2016-10-12 10:27:09 +02:00
Jun Ohtani	370f0b885e	Removing request parameters in _analyze API Remove request params in _analyze API without index param Change rest-api-test using JSON Change docs using JSON Closes #20246	2016-10-07 16:23:24 +09:00
Simon Willnauer	fe1803c957	Remove AnalysisService and reduce it to a simple name to analyzer mapping (#20627 ) Today we hold on to all possible tokenizers, tokenfilters etc. when we create an index service on a node. This was mainly done to allow the `_analyze` API to directly access all these primitive. We fixed this in #19827 and can now get rid of the AnalysisService entirely and replace it with a simple map like class. This ensures we don't create a gazillion long living objects that are entirely useless since they are never used in most of the indices. Also those objects might consume a considerable amount of memory since they might load stopwords or synonyms etc. Closes #19828	2016-09-23 08:53:50 +02:00
Mike McCandless	0ccfe69789	Upgrade to Lucene 6.2.0	2016-08-24 17:26:28 -04:00
Nik Everett	9270e8b22b	Rename client yaml test infrastructure This makes it obvious that these tests are for running the client yaml suites. Now that there are other ways of running tests using the REST client against a running cluster we can't go on calling the shared client yaml tests "REST tests". They are rest tests, but they aren't the rest tests.	2016-07-26 13:53:44 -04:00
Nik Everett	a95d4f4ee7	Add Location header and improve REST testing This adds a header that looks like `Location: /test/test/1` to the response for the index/create/update API. The requirement for the header comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html https://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative URIs are OK. So we use an absolute path which should resolve to the appropriate location. Closes #19079 This makes large changes to our rest test infrastructure, allowing us to write junit tests that test a running cluster via the rest client. It does this by splitting ESRestTestCase into two classes: * ESRestTestCase is the superclass of all tests that use the rest client to interact with a running cluster. * ESClientYamlSuiteTestCase is the superclass of all tests that use the rest client to run the yaml tests. These tests are shared across all official clients, thus the `ClientYamlSuite` part of the name.	2016-07-25 17:02:40 -04:00
Ali Beyad	19d0dbcd17	Removes waiting for yellow cluster health upon index (#19460 ) creation in the REST tests, as we no longer need it due to index creation now waiting for active shard copies before returning (by default, it waits for the primary of each shard, which is the same as ensuring yellow health). Relates #19450	2016-07-15 17:18:34 -04:00
Nik Everett	71b95fb63c	Switch analysis from push to pull Instead of plugins calling `registerTokenizer` to extend the analyzer they now instead have to implement `AnalysisPlugin` and override `getTokenizer`. This lines up extending plugins in with extending scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry` immediately as part of its constructor which makes testing anslysis much simpler. This also moves the default analysis configuration into `AnalysisModule` which is how search is setup. Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`. Instead it is only responsible for building `AnslysisRegistry`. We still bind `AnalysisRegistry` but we only do so in `Node`. This is means it is available at module construction time so we slowly remove the need to bind it in guice.	2016-06-26 07:15:42 -04:00
Adrien Grand	7ba5bceebe	Add a MultiTermAwareComponent marker interface to analysis factories. #19028 This is the same as what Lucene does for its analysis factories, and we hawe tests that make sure that the elasticsearch factories are in sync with Lucene's. This is a first step to move forward on #9978 and #18064.	2016-06-23 10:19:24 +02:00
Ryan Ernst	a4503c2aed	Plugins: Remove name() and description() from api In 2.0 we added plugin descriptors which require defining a name and description for the plugin. However, we still have name() and description() which must be overriden from the Plugin class. This still exists for classpath plugins. But classpath plugins are mainly for tests, and even then, referring to classpath plugins with their class is a better idea. This change removes name() and description(), replacing the name for classpath plugins with the full class name.	2016-06-15 17:12:22 -07:00
Jun Ohtani	9eb242a5fe	Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189	2016-04-21 18:05:11 +09:00
Adrien Grand	42526ac28e	Remove Settings.settingsBuilder. We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly the same thing, so we should keep only one. I kept `Settings.builder` since it has my preference but also it is the one that we use in examples of the Java API.	2016-04-08 18:10:02 +02:00
Simon Willnauer	1988b8b387	[TEST] Reuse EsTestCase#createAnalysisService in KuromojiAnalysisTests	2016-03-22 13:45:20 +01:00
Jun Ohtani	a9a0f262af	Analysis Kuromoji: Add nbest option and NumberFilter Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory Add KuromojiNumberFilterFactory	2016-03-22 20:09:56 +09:00
Simon Willnauer	e91a141233	Prevent index level setting from being configured on a node level Today we allow to set all kinds of index level settings on the node level which is error prone and difficult to get right in a consistent manner. For instance if some analyzers are setup in a yaml config file some nodes might not have these analyzers and then index creation fails. Nevertheless, this change allows some selected settings to be specified on a node level for instance: * `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index * `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses All other index level setting must be specified on the index level. For existing clusters the index must be closed and all settings must be updated via the API on each of the indices. Closes #16799	2016-03-17 14:42:18 +01:00
Simon Willnauer	7a53a396e4	Remove Unneded @Inject annotations	2016-03-09 12:10:47 +01:00
Adrien Grand	eef19be072	Deprecate string in favor of text/keyword. #16877 This commit removes the ability to use string fields on indices created on or after 5.0. Dynamic mappings now generate text fields by default for strings but there are plans to also add a sub keyword field (in a future PR). Most of the changes in this commit are just about replacing string with keyword or text. Some tests have been removed because they existed because of corner cases of string mappings like setting ignore-above on a text field or enabling term vectors on a keyword field which are now impossible. The plan is to remove strings entirely in 6.0.	2016-03-03 10:20:56 +01:00
Simon Willnauer	e02d2e004e	Rewrite SettingsFilter to be immutable This change rewrites the entire settings filtering mechanism to be immutable. All filters must be registered up-front in the SettingsModule. Filters that are comma-sparated are not allowed anymore and check on registration. This commit also adds settings filtering to the default settings recently added to ensure we don't render filtered settings.	2016-02-03 20:05:55 +01:00
Boaz Leskes	2a137b5548	Make index uuid available in Index, ShardRouting & ShardId In the early days Elasticsearch used to use the index name as the index identity. Around 1.0.0 we introduced a unique index uuid which is stored in the index setting. Since then we used that uuid in a few places but it is by far not the main identifier when working with indices, partially because it's not always readily available in all places. This PR start to make a move in the direction of using uuids instead of name by making sure that the uuid is available on the Index class (currently just a wrapper around the name) and as such also available via ShardRouting and ShardId. Note that this is by no means an attempt to do the right thing with the uuid in all places. In almost all places it falls back to the name based comparison that was done before. It is meant as a first step towards slowly improving the situation. Closes #16217	2016-01-28 08:40:10 +01:00
Daniel Mitterdorfer	e9bb3d31a3	Convert "path.*" and "pidfile" to new settings infra	2016-01-22 15:14:13 +01:00
Simon Willnauer	fbfa9f4925	Merge branch 'master' into new_index_settings	2016-01-19 10:13:48 +01:00
Simon Willnauer	20fe8ac854	Register index.version.created in Kuromoji unittest since it's a node level setting here	2016-01-19 09:14:41 +01:00
Ryan Ernst	ef4f0a8699	Test: Make rest test framework accept http directly for the test cluster The rest test framework, because it used to be tightly integrated with ESIntegTestCase, currently expects the addresses for the test cluster to be passed using the transport protocol port. However, it only uses this to then find the http address. This change makes ESRestTestCase extend from ESTestCase instead of ESIntegTestCase, and changes the sysprop used to tests.rest.cluster, which now takes the http address. closes #15459	2016-01-18 16:44:14 -08:00
Ryan Ernst	4ea19995cf	Remove wildcard imports	2015-12-18 12:43:47 -08:00
Simon Willnauer	9f6598b18d	Fix compile errors	2015-11-26 13:41:00 +01:00
Boaz Leskes	ac0da91bf7	Extend usage of IndexSetting class I decided to leave external listeners (used by plugins) alone, for now. Closes #14731	2015-11-13 14:30:23 +01:00
Simon Willnauer	aa38d053d7	Simplify Analysis registration and configuration This change moves all the analysis component registration to the node level and removes the significant API overhead to register tokenfilter, tokenizer, charfilter and analyzer. All registration is done without guice interaction such that real factories via functional interfaces are passed instead of class objects that are instantiated at runtime. This change also hides the internal analyzer caching that was done previously in the IndicesAnalysisService entirely and decouples all analysis registration and creation from dependency injection.	2015-10-30 11:40:18 +01:00

1 2

70 Commits