OpenSearch

Commit Graph

Author	SHA1	Message	Date
James Rodewig	7ef906fde8	[DOCS] Add tutorials section to analysis topic (#50809 ) Adds a 'Configure text analysis' page to house tutorial content for the analysis topic. Also relocates the following pages as children as this new page: * 'Test an analyzer' * 'Configuring built-in analyzers' * 'Create a custom analyzer' I plan to add a tutorial for specifying index-time and search-time analyzers to this section as part of a future PR.	2020-01-16 13:12:06 -05:00
Nik Everett	01293ebad5	Fix docs typos (#50365 ) (#50464 ) Fixes a few typos in the docs. Co-authored-by: Xiang Dai <764524258@qq.com>	2019-12-23 12:38:17 -05:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
Wilder Pereira	8c73e215b2	[DOCS] Remove unneeded spaces from custom analyzer snippet (#47332 )	2019-10-15 15:53:16 -04:00
James Rodewig	b59ecde041	[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353 ) (#46502 )	2019-09-09 13:38:14 -04:00
James Rodewig	bb7bff5e30	[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295 ) (#46418 )	2019-09-06 09:22:08 -04:00
James Rodewig	3e62cf9d74	[DOCS] Correct custom analyzer callouts (#46030 )	2019-08-29 10:08:18 -04:00
Guilherme Ferreira	48a17d5768	[Docs] Correct default stop list constant (#41342 )	2019-04-23 19:13:51 +02:00
Guilherme Ferreira	414debd740	[Docs] Correct spelling the "_none_" stopwords element (#41191 )	2019-04-15 14:12:26 +02:00
Mayya Sharipova	0e1b1959fe	Correct rebuilt persian analyzer (#38724 ) (#38744 ) Make substitution of \u200C with a space explicit The problem with this symbol `\u200C` in a test string, that SHOULD be substituted with space in the rebuilt Persian analyzer, but it is not. Correcting this line `"mappings": [ "\\u200C=> "] <1>` to `"mappings": [ "\\u200C=>\\u0020"] <1>` in solves the problem. This change explicitly says to substitute ZWNJ with a space. Closes #38188	2019-02-11 14:17:18 -05:00
Christoph Büscher	34f2d2ec91	Remove remaining occurances of "include_type_name=true" in docs (#37646 )	2019-01-22 15:13:52 +01:00
Christoph Büscher	25aac4f77f	Remove `include_type_name` in asciidoc where possible (#37568 ) The "include_type_name" parameter was temporarily introduced in #37285 to facilitate moving the default parameter setting to "false" in many places in the documentation code snippets. Most of the places can simply be reverted without causing errors. In this change I looked for asciidoc files that contained the "include_type_name=true" addition when creating new indices but didn't look likey they made use of the "_doc" type for mappings. This is mostly the case e.g. in the analysis docs where index creating often only contains settings. I manually corrected the use of types in some places where the docs still used an explicit type name and not the dummy "_doc" type.	2019-01-18 09:34:11 +01:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
Josh Soref	edb48321ba	[DOCS] Various spelling corrections (#37046 )	2019-01-07 14:44:12 +01:00
Alan Woodward	f6a43b5939	Add a prebuilt ICU Analyzer (#34958 ) The ICU plugin provides the building blocks of an analysis chain, but doesn't actually have a prebuilt analyzer. It would be a better for users if there was a simple analyzer that they could use out of the box, and also something we can point to from the CJK Analyzer docs as a superior alternative. Relates to #34285	2018-11-21 09:00:48 +00:00
Nikolay Vasiliev	16956a1a05	[DOCS] Clarify 'type' parameter meaning for custom analyzer (#34012 ) This pull request improves the docs on the meaning of type parameter on the custom analyzer doc page. Closes #33456	2018-09-25 15:32:27 +02:00
Jim Ferenczi	7ad71f906a	Upgrade to a Lucene 8 snapshot (#33310 ) The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly. Some comments about the change: * Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309 Closes #32899	2018-09-06 14:42:06 +02:00
Jim Ferenczi	bdb79d021a	Fix docs failure on language analyzers (#30722 ) This commit fixes docs failure on language analyzers when compared to the built in analyzers. The `elision` filters used by the rebuilt language analyzers should be case insensitive to match the definition of the prebuilt analyzers. Closes #30557	2018-05-22 09:58:12 +02:00
Nik Everett	9881bfaea5	Docs: Document how to rebuild analyzers (#30498 ) Adds documentation for how to rebuild all the built in analyzers and tests for that documentation using the mechanism added in #29535. Closes #29499	2018-05-14 18:40:54 -04:00
Nik Everett	f9dc86836d	Docs: Test examples that recreate lang analyzers (#29535 ) We have a pile of documentation describing how to rebuild the built in language analyzers and, previously, our documentation testing framework made sure that the examples successfully built an analyzer but they didn't assert that the analyzer built by the documentation matches the built in anlayzer. Unsuprisingly, some of the examples aren't quite right. This adds a mechanism that tests that the analyzers built by the docs. The mechanism is fairly simple and brutal but it seems to be working: build a hundred random unicode sequences and send them through the `_analyze` API with the rebuilt analyzer and then again through the built in analyzer. Then make sure both APIs return the same results. Each of these calls to `_anlayze` takes about 20ms on my laptop which seems fine.	2018-05-09 09:23:10 -04:00
deepybee	48c8098e15	Fixed several typos in analyzers section (#28247 )	2018-01-18 08:51:53 +00:00
Adrien Grand	1b660821a2	Allow `_doc` as a type. (#27816 ) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751	2017-12-14 17:47:53 +01:00
Md. Abdulla-Al-Sun	a40c474e10	Added Bengali Analyzer to Elasticsearch with respect to the lucene update(PR#238)	2017-10-05 13:25:05 +02:00
Tahmim Ahmed Shibli	34662c9e6d	[Docs] Fix name of character filter in example. (#26724 )	2017-09-20 17:08:43 +02:00
Nik Everett	514187be8e	Fix language in some docs The pattern-analyzer docs contained a snippet that was an expanded regex that was marked as `[source,js]`. This changes it to `[source,regex]`. The htmlstrip-charfilter and pattern-replace-charfilter docs had examples that were actually a list of tokens but marked `[source,js]`. This marks them as `[source,text]` so they don't count as unconverted CONSOLE snippets. The pattern-replace-charfilter also had a doc who's test was skipped because of funny interaction with the test framework. This fixes the test. Three more down, eighty-two to go. Relates to #18160	2017-04-01 14:45:44 -04:00
Nik Everett	9baa48a928	CONSOLEify lang-analyzer docs CONSOLEifies the lang-analyzer docs and replaces the (invalid) empty `keyword_marker` setups that were on the page with one that contains the word "example" translated into the appropriate language. Relates to #18160	2017-04-01 14:21:58 -04:00
markwalkom	ced99dde50	Update stop-analyzer.asciidoc (#23195 ) Clarified where the stopwords file needs to live	2017-02-16 13:36:15 +01:00
Francesc Gil	dec6fc2d40	Repeated language analyzers (#22240 ) * Repeated language analyzers The `catalan` analyzer was repeated on the supported list :) * Reordered the languages to have alphabetic order * Added space for format * Reordered the languages and removed repeated	2016-12-21 17:32:02 +01:00
Clinton Gormley	22f1acde94	Docs: Pattern analyzer does not support a max_token_length parameter Closes #20713	2016-10-08 12:27:33 +02:00
Clinton Gormley	2f6d0119f1	Added warning messages about the dangers of pathological regexes to: * pattern-replace charfilter * pattern-capture and pattern-replace token filters * pattern tokenizer * pattern analyzer Relates to #20038	2016-09-09 09:53:07 +02:00
Jim Ferenczi	4682fc34ae	Add the ability to disable the retrieval of the stored fields entirely This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation. To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this: ```` POST _search { "stored_fields": "_none_" } ````	2016-08-24 16:40:08 +02:00
Nik Everett	7aeea764ba	Remove wait_for_status=yellow from the docs It is no longer required after `687e2e12b3`.	2016-07-15 16:02:07 -04:00
Nik Everett	a0585269be	[docs] s/lags/Flags/ Copy and paste lots an `F`.	2016-06-09 13:08:53 -04:00
Clinton Gormley	5da9e5dcbc	Docs: Improved tokenizer docs (#18356 ) * Docs: Improved tokenizer docs Added descriptions and runnable examples * Addressed Nik's comments * Added TESTRESPONSEs for all tokenizer examples * Added TESTRESPONSEs for all analyzer examples too * Added docs, examples, and TESTRESPONSES for character filters * Skipping two tests: One interprets "$1" as a stack variable - same problem exists with the REST tests The other because the "took" value is always different * Fixed tests with "took" * Fixed failing tests and removed preserve_original from fingerprint analyzer	2016-05-19 19:42:23 +02:00
Zachary Tong	5ee5cc25cc	Move AsciiFolding earlier in FingerprintAnalyzer filter chain Rearranges the FingerprintAnalyzer so that AsciiFolding comes earlier in the chain (after lowercasing, before stop removal, for maximum deduping power) Closes #18266	2016-05-12 09:34:15 -04:00
Clinton Gormley	97a41ee973	First pass at improving analyzer docs (#18269 ) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos	2016-05-11 14:17:56 +02:00
Clinton Gormley	3f594089c2	Renamed all AUTOSENSE snippets to CONSOLE (#18210 )	2016-05-09 15:42:23 +02:00
Nik Everett	3912761572	[docs] Add wait_until_yellow to fix build failure The snippet in the docs creates and index and uses it with the _analyze api. The trouble is that if the index hasn't been created fully the _analyze API will fail. This adds a GET _cluster/health?wait_for_status=yellow which fixes the issue. While this does make the docs more cluttered, it also makes the snippets actually runnable. Closes #18165	2016-05-05 16:02:00 -04:00
Nik Everett	4b1c116461	Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start.	2016-05-05 13:58:03 -04:00
Zachary Tong	80288ad60c	Add `fingerprint` token filter and `fingerprint` analyzer Adds a `fingerprint` token filter which uses Lucene's FingerprintFilter, and a `fingerprint` analyzer that combines the Fingerprint filter with lowercasing, stop word removal and asciifolding. Closes #13325	2016-04-20 16:10:56 -04:00
Adrien Grand	b42f66c8ac	Document 5.0 mapping changes.	2016-03-22 16:22:58 +01:00
Adrien Grand	f8e802c028	Merge pull request #15794 from damienalexandre/french-doc [Doc] Fix french analyzer elision token filter doc	2016-01-06 18:39:26 +01:00
Damien Alexandre	23a64f8214	Fix french analyzer elision token filter doc Fix #15774	2016-01-06 18:26:03 +01:00
Clinton Gormley	98028419a5	Merge pull request #14610 from yokotaso/patch-1 Update snowball document page.	2015-11-17 14:17:30 +01:00
Robert Muir	0d3e3f81fc	Lithuanian analysis	2015-09-01 08:52:10 -04:00
xuzha	fb2be6d6a1	The name "position_offset_gap" is confusing because Lucene has three similar sounding things: * Analyzer#getPositionIncrementGap * Analyzer#getOffsetGap * IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS and * FieldType#storeTermVectorOffsets Rename position_offset_gap to position_increment_gap closes #13056	2015-08-26 14:56:35 -07:00
Nik Everett	4b9664beeb	Mapping: Default position_offset_gap to 100 This is much more fiddly than you'd expect it to be because of the way position_offset_gap is applied in StringFieldMapper. Instead of setting the default to 100 its simpler to make sure that all the analyzers default to 100 and that StringFieldMapper doesn't override the default unless the user specifies something different. Unless the index was created before 2.1, in which case the old default of 0 has to take. Also postition_offset_gaps less than 0 aren't allowed at all. New tests test that: 1. the new default doesn't match phrases across values with reasonably low slop (5) 2. the new default doest match phrases across values with reasonably high slop (50) 3. you can override the value and phrases work as you'd expect 4. if you leave the value undefined in the mapping and define it on a custom analyzer the the value from the custom analyzer shines through Closes #7268	2015-08-25 14:21:50 -04:00
Britta Weber	eeeb29f900	spell correct and add single quotes	2015-05-26 11:41:19 +02:00
Britta Weber	37782c1745	analyzers: custom analyzers names and aliases must not start with _ closes #9596	2015-05-26 11:38:15 +02:00
Clinton Gormley	3a69b65e88	Docs: Fixed the backslash escaping on the pattern analyzer docs Closes #11099	2015-05-15 18:40:16 +02:00

1 2

73 Commits