OpenSearch

Commit Graph

Author	SHA1	Message	Date
Robert Muir	b9a09c2b06	Analysis: Add additional Analyzers, Tokenizers, and TokenFilters from Lucene Add `irish` analyzer Add `sorani` analyzer (Kurdish) Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc. Add `thai` tokenizer: segments thai text into words. Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself Add `german_normalization` tokenfilter: umlaut/sharp S normalization Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages Add `sorani_normalization` tokenfilter: normalizes kurdish text Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization` Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk` Add support access to default Thai stopword set "_thai_" Fix some bugs and broken links in documentation. Closes #5935	2014-07-03 05:47:49 -04:00
Clinton Gormley	cf059378d1	Docs: Updated stop token filter docs	2014-06-21 18:42:38 +02:00
Clinton Gormley	69350dc426	Update stemmer-override-tokenfilter.asciidoc	2014-06-18 11:34:20 +02:00
Clinton Gormley	f546662e8f	Docs: Hunspell tidied Tidied some formatting	2014-06-11 21:49:02 +02:00
Clinton Gormley	04dacaaf27	Docs: Use the "stemmer" token filter for the english analyzer, to be consistent	2014-06-11 13:47:07 +02:00
Clinton Gormley	8a94b71b75	Docs: Corrected the use of keyword_marker on the lang analyzers	2014-06-11 13:43:02 +02:00
Clinton Gormley	673ef3db3f	The StemmerTokenFilter had a number of issues: * `english` returned the slow snowball English stemmer * `porter2` returned the snowball Porter stemmer (v1) * `portuguese` was used twice, preventing the second version from working Changes: * `english` now returns the fast PorterStemmer (for indices created from v1.3.0 onwards) * `porter2` now returns the snowball English stemmer (for indices created from v1.3.0 onwards) * `light_english` now returns the `kstem` stemmer (`kstem` still works) * `portuguese_rslp` returns the PortugueseStemmer * `dutch_kp` is a synonym for `kp` Tests and docs updated Fixes #6345 Fixes #6213 Fixes #6330	2014-06-11 12:30:16 +02:00
Clinton Gormley	e323e577e8	Docs: Fixed bad ref on cjk_width/bigram pages	2014-06-09 23:36:58 +02:00
Clinton Gormley	5e40868f44	Docs: Fixed a bad ref on lang analyzers page	2014-06-09 23:03:12 +02:00
Clinton Gormley	5c5c1da06c	Docs: Fixed some errors on the language analyzers page	2014-06-09 22:51:28 +02:00
Clinton Gormley	585b0ef730	Docs: Added custom-analyzer equivalents of all the language analyzers	2014-06-09 22:41:25 +02:00
Clinton Gormley	bc402d5f87	Docs: Documented the cjk_width and cjk_bigram token filters	2014-06-09 22:40:58 +02:00
Simon Willnauer	9d5507047f	Update Documentation Feature Flags [1.2.0]	2014-05-22 15:06:42 +02:00
Simon Willnauer	f79b28375d	Add missing coming tag Relates to #6188 Relates to #5539	2014-05-18 10:54:17 +02:00
Richard Boulton	fdb5eb6555	Update keyword-tokenizer.asciidoc	2014-05-07 15:04:07 +02:00
Matthieu Bacconnier	7fd5f18539	Update asciifolding-tokenfilter.asciidoc Typo	2014-05-06 16:30:09 +02:00
Ali Bozorgkhan	f1af845795	[DOCS] Fixed a typo Close #5963	2014-05-06 10:28:13 +02:00
Robert Muir	8e0a479316	Upgrade to Lucene 4.8 Closes #5932	2014-04-28 06:45:50 -04:00
Clinton Gormley	c1e03bf860	Update keyword-repeat-tokenfilter.asciidoc	2014-04-24 16:44:02 +02:00
Kevin Wang	374b633a4b	add uppercase token filter closes #5539	2014-03-26 15:07:43 +07:00
bleskes	5d832374dd	Update Documentation Feature Flags [1.1.0]	2014-03-25 17:51:30 +01:00
Clinton Gormley	4c34615686	[DOCS] Fixed some bad UTF8	2014-03-19 12:46:06 +01:00
Simon Willnauer	9160516b28	Expose `filler_token` via ShingleTokenFilterFactory Lucene 4.7 supports a setter for the `filler_token` that is inserted if there are gaps in the token stream. This change exposes this setting. Closes #4307	2014-02-26 22:21:10 +01:00
Nik Everett	5c3f4ceafb	Add preserve original token option to ASCIIFolding Closes #4931	2014-02-14 19:37:00 +01:00
Alexander Reelsen	c6155c5142	release [1.0.0.RC1]	2014-01-15 17:02:22 +00:00
Benjamin Vetter	ba8e012be9	Referring to stop analyzer for stopword docs #329	2014-01-14 11:53:30 +01:00
Benjamin Vetter	22a96e6a18	Added stopwords: _none_ to the docs #329	2014-01-14 11:53:29 +01:00
Simon Willnauer	7f63ddf94e	Default stopwords list should be `_none_` for all but language-specific analyzers `standard_html_strip` and `pattern` analyzer support stopwords which are set to the default `english` stopwords by default. Those analyzers should not use stopwords by default since they are language neutral Closes #4699	2014-01-13 14:44:10 +01:00
Yousef	302c762d5e	Wrong link to Token Filter	2013-12-03 10:39:13 +01:00
Lee Hinman	9939e81d88	[DOCS] Fix porter stem filter name in other stemming docs	2013-11-28 22:14:47 -07:00
Lee Hinman	fb4e903e35	[DOCS] Fix name of porter stemming token filter	2013-11-28 22:01:19 -07:00
Simon Willnauer	77bc5d5ecf	release [1.0.0.Beta1]	2013-11-06 15:32:43 +01:00
Simon Willnauer	9654631186	Change 'standart' analyzer to use emtpy stopword list by default. The 'default' / 'standard' analyzer can be a trappy default sicne it filters english stopwords by default. Yet a default should not be dedicated to a certain language since elasticsearch is used in many different scenarios where a standard analysis chain with specialization to english full-text might be rather counter productive. This commit changes the 'standard' analyzer to use an empty stopword list for indices that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy for older indices. Closes #3775	2013-11-05 21:07:21 +01:00
Boaz Leskes	a9fdcadf01	[DOCS] Added documentation for the keep word token filter	2013-11-04 18:38:44 +01:00
Clinton Gormley	4206cc988e	[DOCS] Typo on shingle tokenfilter	2013-10-31 20:18:00 +01:00
Ben McCann	cc4bc7d57d	Fix nonsensical sentence in standard analyzer documentation so that it is more understandable	2013-10-25 00:18:32 +02:00
Alexander Reelsen	4d19239ec4	Add support for Lucene SuggestStopFilter The suggest stop filter is an improved version of the stop filter, which takes stopwords only into account if the last char of a query is a whitespace. This allows you to keep stopwords, but to allow suggesting for "a". Example: Index document content "a word". You are now able to suggest for "a" and get back results in the completion suggester, if the suggest stop filter is used on the query side, but will not get back any results for "a " as this is identified as a stopword. The implementation allows to set the `remove_trailing` parameter for a custom stop filter and thus use the suggest stop filter instead of the standard stop filter.	2013-10-15 16:12:02 +02:00
Britta Weber	c3ab79a10e	[DOCS] Add doc for delimited payload token filter	2013-10-14 13:41:35 +02:00
Clinton Gormley	d062409309	[DOCS] Removed enable_position_increments in stop filter	2013-10-05 17:06:13 +02:00
Clinton Gormley	ea05f4538c	[DOCS] Updated ICU-Plugin docs from the repo README	2013-10-05 16:31:52 +02:00
Lee Hinman	ba40aa374e	Uniquify anchor links to fix asciidoc/docbook generation	2013-09-30 15:32:00 -06:00
Lee Hinman	0442b737be	Add more anchor links to documentation Related to #3679	2013-09-30 13:13:16 -06:00
Adrien Grand	90524d7ad2	Fix formatting of the documentation. Remaining '@'s have been replaced with '`'s.	2013-09-18 12:35:44 +02:00
Clinton Gormley	393c28bee4	[DOCS] Removed outdated new/deprecated version notices	2013-09-03 21:28:31 +02:00
Boaz Leskes	e807c99f27	Fixed a typo in the config of light finnish stemmer (old last_finish is still supported for backward compatibility) Closes #3594	2013-08-29 10:15:40 +02:00
Clinton Gormley	822043347e	Migrated documentation into the main repo	2013-08-29 01:24:34 +02:00

1 2

96 Commits