OpenSearch

Commit Graph

Author	SHA1	Message	Date
Clinton Gormley	e4baa56f4b	Docs: Language analyzers Clarified the use of stem_exclusion and the keyword_marker token filter Closes #6613	2014-07-07 10:06:18 +02:00
Clinton Gormley	54790eea10	Update lang-analyzer.asciidoc Clarified the use of the `stem_exclusion` token filter. Closes #6613	2014-07-04 17:50:43 +02:00
Robert Muir	b9a09c2b06	Analysis: Add additional Analyzers, Tokenizers, and TokenFilters from Lucene Add `irish` analyzer Add `sorani` analyzer (Kurdish) Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc. Add `thai` tokenizer: segments thai text into words. Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself Add `german_normalization` tokenfilter: umlaut/sharp S normalization Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages Add `sorani_normalization` tokenfilter: normalizes kurdish text Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization` Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk` Add support access to default Thai stopword set "_thai_" Fix some bugs and broken links in documentation. Closes #5935	2014-07-03 05:47:49 -04:00
Clinton Gormley	04dacaaf27	Docs: Use the "stemmer" token filter for the english analyzer, to be consistent	2014-06-11 13:47:07 +02:00
Clinton Gormley	8a94b71b75	Docs: Corrected the use of keyword_marker on the lang analyzers	2014-06-11 13:43:02 +02:00
Clinton Gormley	5e40868f44	Docs: Fixed a bad ref on lang analyzers page	2014-06-09 23:03:12 +02:00
Clinton Gormley	5c5c1da06c	Docs: Fixed some errors on the language analyzers page	2014-06-09 22:51:28 +02:00
Clinton Gormley	585b0ef730	Docs: Added custom-analyzer equivalents of all the language analyzers	2014-06-09 22:41:25 +02:00
Alexander Reelsen	c6155c5142	release [1.0.0.RC1]	2014-01-15 17:02:22 +00:00
Benjamin Vetter	ba8e012be9	Referring to stop analyzer for stopword docs #329	2014-01-14 11:53:30 +01:00
Benjamin Vetter	22a96e6a18	Added stopwords: _none_ to the docs #329	2014-01-14 11:53:29 +01:00
Simon Willnauer	7f63ddf94e	Default stopwords list should be `_none_` for all but language-specific analyzers `standard_html_strip` and `pattern` analyzer support stopwords which are set to the default `english` stopwords by default. Those analyzers should not use stopwords by default since they are language neutral Closes #4699	2014-01-13 14:44:10 +01:00
Simon Willnauer	77bc5d5ecf	release [1.0.0.Beta1]	2013-11-06 15:32:43 +01:00
Simon Willnauer	9654631186	Change 'standart' analyzer to use emtpy stopword list by default. The 'default' / 'standard' analyzer can be a trappy default sicne it filters english stopwords by default. Yet a default should not be dedicated to a certain language since elasticsearch is used in many different scenarios where a standard analysis chain with specialization to english full-text might be rather counter productive. This commit changes the 'standard' analyzer to use an empty stopword list for indices that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy for older indices. Closes #3775	2013-11-05 21:07:21 +01:00
Ben McCann	cc4bc7d57d	Fix nonsensical sentence in standard analyzer documentation so that it is more understandable	2013-10-25 00:18:32 +02:00
Clinton Gormley	822043347e	Migrated documentation into the main repo	2013-08-29 01:24:34 +02:00

16 Commits