OpenSearch

Commit Graph

Author	SHA1	Message	Date
James Rodewig	43199a8c82	[DOCS] Remove double space in WDG docs	2020-03-23 17:18:04 -04:00
James Rodewig	553d8a9ca9	[DOCS] Fix "letter case" typo Changes "lettercase" to "letter case" in the `uppercase` token filter docs.	2020-03-23 17:11:59 -04:00
lgypro	be3090138e	[Docs] Fix typo in _analyze api docs (#53837 )	2020-03-20 12:02:40 +01:00
James Rodewig	8f4a3eb07f	[DOCS] Add token graph concept docs (#53339 ) Adds conceptual docs for token graphs. These docs cover: * How a token graph is constructed from a token stream * How synonyms and multi-position tokens impact token graphs * How token graphs are used during search * Why some token filters produce invalid token graphs Also makes the following supporting changes: * Adds anchors to the 'Anatomy of an Analyzer' docs for cross-linking * Adds several SVGs for token graph diagrams	2020-03-19 07:43:18 -04:00
James Rodewig	e4b7af11ab	[DOCS] Remove `light_bengali` stemmer (#53697 ) Only the `bengali` stemmer is available in Lucene and surfaced through Elasticsearch. This removes the incorrect `light_bengali` link in our docs.	2020-03-18 08:34:27 -04:00
James Rodewig	e1eebea846	[DOCS] Reformat `remove_duplicates` token filter (#53608 ) Makes the following changes to the `remove_duplicates` token filter docs: * Rewrites description and adds Lucene link * Adds detailed analyze example * Adds custom analyzer example	2020-03-16 11:37:06 -04:00
Jim Ferenczi	97621e7f65	Removes old Lucene's experimental flag from analyzer documentations (#53217 ) This change removes the Lucene's experimental flag from the documentations of the following tokenizer/filters: * Simple Pattern Split Tokenizer * Simple Pattern tokenizer * Flatten Graph Token Filter * Word Delimiter Graph Token Filter The flag is still present in Lucene codebase but we're fully supporting these tokenizers/filters in ES for a long time now so the docs flag is misleading. Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-03-12 21:18:19 +01:00
James Rodewig	933a9c6fca	[DOCS] Reformat `word_delimiter` token filter (#53387 ) Makes the following changes to the `word_delimiter` token filter docs: * Adds a warning admonition recommending the `word_delimiter_graph` filter instead. This warning includes a link to the deprecated Lucene `WordDelimiterFilter`. * Updates the description * Adds detailed analyze snippet * Adds custom analyzer and custom filter snippets * Reorganizes and updates parameter documentation	2020-03-11 09:03:57 -04:00
James Rodewig	a9dd7773d2	[DOCS] Use keyword tokenizer in word delimiter graph examples (#53384 ) In a tip admonition, we recommend using the `keyword` tokenizer with the `word_delimiter_graph` token filter. However, we only use the `whitespace` tokenizer in the example snippets. This updates those snippets to use the `keyword` tokenizer instead. Also corrects several spacing issues for arrays in these docs.	2020-03-11 04:46:33 -04:00
James Rodewig	166b5a92f6	[DOCS] Correct anchor in word delimiter graph token filter docs	2020-03-10 10:32:42 -04:00
James Rodewig	28cb4a167d	[DOCS] Reformat `word_delimiter_graph` token filter (#53170 ) (#53272 ) Makes the following changes to the `word_delimiter_graph` token filter docs: * Updates the Lucene experimental admonition. * Updates description * Adds analyze snippet * Adds custom analyzer and custom filter snippets * Reorganizes and updates parameter list * Expands and updates section re: differences between `word_delimiter` and `word_delimiter_graph`	2020-03-09 06:45:44 -04:00
James Rodewig	9bb9f63364	[DOCS] Note that `trim` filter doesn't change offsets (#53220 ) The [word delimiter graph token filter docs][0] note that the `trim` filter changes the length of tokens without changing their offsets. This explicitly mentions that in the `trim` filter docs. [0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-word-delimiter-graph-tokenfilter.html	2020-03-06 07:32:35 -05:00
James Rodewig	4bc6d2dbec	[DOCS] Correct link for Lucene StopFilter	2020-03-05 14:52:25 -05:00
James Rodewig	0c4bf64095	[DOCS] Fix several Asciidoctor double arrow replacements (#52827 ) Per the [Asciidoctor docs][0], Asciidoctor replaces the following syntax with double arrows in the rendered HTML: * => renders as ⇒ * <= renders as ⇐ This escapes several unintended replacements, such as in the Painless docs. Where appropriate, it also replaces some double arrow instances with single arrows for consistency. [0]: https://asciidoctor.org/docs/user-manual/#replacements	2020-03-04 08:43:19 -05:00
James Rodewig	cf87724ff6	[DOCS] Reformat `stop` token filter (#53059 ) Makes the following changes to the `stop` token filter docs: * Updates description * Adds a link to the related Lucene filter * Adds detailed analyze snippet * Updates custom analyzer and custom filter snippets * Adds a list of predefined stop words by language Co-authored-by: ScottieL <36999642+ScottieL@users.noreply.github.com>	2020-03-03 13:22:52 -05:00
James Rodewig	d336faa0b0	[DOCS] Reformat trim token filter docs (#51649 ) Makes the following changes to the `trim` token filter docs: * Updates description * Adds a link to the related Lucene filter * Adds tip about removing whitespace using tokenizers * Adds detailed analyze snippets * Adds custom analyzer snippet	2020-03-02 07:48:23 -05:00
rhymes	7eb4c07f1f	[DOCS] Fix typo in index and search analysis docs (#52988 )	2020-03-02 07:25:01 -05:00
debadair	291713f284	[DOCS] Fixed typo in jump link. (#52302 )	2020-02-12 17:53:00 -08:00
James Rodewig	36b2663e98	[DOCS] Add attribute for Lucene analysis links (#51687 ) Adds a `lucene-analysis-docs` attribute for the Lucene `/analysis/` javadocs directory. This should prevent typos and keep the docs DRY.	2020-01-30 11:24:01 -05:00
James Rodewig	4fcf5a9de4	[DOCS] Rewrite analysis intro (#51184 ) * [DOCS] Rewrite analysis intro. Move index/search analysis content. * Rewrites 'Text analysis' page intro as high-level definition. Adds guidance on when users should configure text analysis * Rewrites and splits index/search analysis content: * Conceptual content -> 'Index and search analysis' under 'Concepts' * Task-based content -> 'Specify an analyzer' under 'Configure...' * Adds detailed examples for when to use the same index/search analyzer and when not. * Adds new example snippets for specifying search analyzers * clarifications * Add toc. Decrement headings. * Reword 'When to configure' section * Remove sentence from tip	2020-01-30 09:32:16 -05:00
James Rodewig	70e4ae3381	[DOCS] Reformat unique token filter docs (#50748 ) * Updates the description * Adds analyze, custom analyzer, and custom filter snippets * Adds parameter documentation	2020-01-28 10:42:25 -05:00
James Rodewig	23b65390ab	[DOCS] Add response snippets to 'Testing analyzers' page (#51427 ) Adds response snippets to the `POST _analyze` snippets in the 'Testing analyzers' page. Co-authored-by: Emmanuel DEMEY <demey.emmanuel@gmail.com>	2020-01-27 08:41:44 -05:00
James Rodewig	7ef906fde8	[DOCS] Add tutorials section to analysis topic (#50809 ) Adds a 'Configure text analysis' page to house tutorial content for the analysis topic. Also relocates the following pages as children as this new page: * 'Test an analyzer' * 'Configuring built-in analyzers' * 'Create a custom analyzer' I plan to add a tutorial for specifying index-time and search-time analyzers to this section as part of a future PR.	2020-01-16 13:12:06 -05:00
James Rodewig	ef26763ca9	[DOCS] Add concepts section to analysis topic (#50801 ) This helps the topic better match the structure of our machine learning docs, e.g. https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts' child page, but I plan to add other concepts, such as 'Index time vs. search time', with later PRs.	2020-01-16 13:00:39 -05:00
James Rodewig	1edaf2b101	[DOCS] Retitle analysis reference pages (#51071 ) * Changes titles to sentence case. * Appends pages with 'reference' to differentiate their content from conceptual overviews. * Moves the 'Normalizers' page to end of the Analysis topic pages.	2020-01-16 12:30:51 -05:00
PND	1d391f7113	[Docs] Fix example output of edge n-gram token filter. (#51085 )	2020-01-16 11:34:00 +01:00
James Rodewig	78c9eee5ea	[DOCS] Add section ID to analysis overview page	2020-01-08 14:43:41 -06:00
James Rodewig	9d1567b13b	[DOCS] Add overview page to analysis topic (#50515 ) Adds a 'text analysis overview' page to the analysis topic docs. The goals of this page are: * Concisely summarize the analysis process while avoiding in-depth concepts, tutorials, or API examples * Explain why analysis is important, largely through highlighting problems with full-text searches missing analysis * Highlight how analysis can be used to improve search results	2020-01-08 12:54:00 -06:00
James Rodewig	20eba1e410	[DOCS] Reformat reverse token filter docs (#50672 ) * Updates the description and adds a Lucene link * Adds analyze and custom analyzer snippets	2020-01-07 11:01:55 -06:00
James Rodewig	8009b07ccb	[DOCS] Reformat truncate token filter docs (#50687 ) * Updates the description and adds a Lucene link * Adds analyze, custom analyzer, and custom filter snippets * Adds parameter documentation	2020-01-07 10:33:57 -06:00
James Rodewig	e6a469cc74	[DOCS] Reformat uppercase token filter docs (#50555 ) * Updates the description and adds a Lucene link * Adds analyze and custom analyzer snippets	2020-01-03 08:39:08 -05:00
James Rodewig	7a14607a25	[DOCS] Abbreviate token filter titles (#50511 )	2019-12-27 11:01:52 -05:00
Nik Everett	01293ebad5	Fix docs typos (#50365 ) (#50464 ) Fixes a few typos in the docs. Co-authored-by: Xiang Dai <764524258@qq.com>	2019-12-23 12:38:17 -05:00
James Rodewig	cd04021961	[DOCS] Reformat token count limit filter docs (#49835 )	2019-12-13 08:44:39 -05:00
James Rodewig	1186a5dc09	[DOCS] Reformat lowercase token filter docs (#49935 )	2019-12-12 09:50:12 -05:00
James Rodewig	87a73b6bdf	[DOCS] Reformat length token filter docs (#49805 ) * Adds a title abbreviation * Updates the description and adds a Lucene link * Reformats the parameters section * Adds analyze, custom analyzer, and custom filter snippets Relates to #44726.	2019-12-04 09:59:08 -05:00
James Rodewig	ade72b97b7	[DOCS] Reformat keep types and keep words token filter docs (#49604 ) * Adds title abbreviations * Updates the descriptions and adds Lucene links * Reformats parameter definitions * Adds analyze and custom analyzer snippets * Adds explanations of token types to keep types token filter and tokenizer docs	2019-12-02 09:40:50 -05:00
James Rodewig	2fd58bb845	[DOCS] Add missing "_type" to delimited payload token filter docs	2019-11-25 16:16:05 -05:00
James Rodewig	c40449ac22	[DOCS] Reformat delimited payload token filter docs (#49380 ) * Adds a title abbreviation * Relocates the older name deprecation warning * Updates the description and adds a Lucene link * Adds a note to explain payloads and how to store them * Adds analyze and custom analyzer snippets * Adds a 'Return stored payloads' example	2019-11-25 15:40:05 -05:00
James Rodewig	d06c71eb82	[DOCS] Fix edge n-gram tokenizer nav Adds a missing float tag to the edge n-gram tokenizer docs. This tag ensures the edge n-gram tokenizer docs display on the same page.	2019-11-22 15:54:07 -05:00
James Rodewig	562607d3f5	[DOCS] Reformat n-gram token filter docs (#49438 ) Reformats the edge n-gram and n-gram token filter docs. Changes include: * Adds title abbreviations * Updates the descriptions and adds Lucene links * Reformats parameter definitions * Adds analyze and custom analyzer snippets * Adds notes explaining differences between the edge n-gram and n-gram filters Additional changes: * Switches titles to use "n-gram" throughout. * Fixes a typo in the edge n-gram tokenizer docs * Adds an explicit anchor for the `index.max_ngram_diff` setting	2019-11-22 10:38:50 -05:00
Christoph Büscher	4ffa050735	Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894	2019-11-20 10:37:12 +01:00
James Rodewig	a26916cc23	[DOCS] Reformat elision token filter docs (#49262 )	2019-11-19 10:55:22 -05:00
James Rodewig	8639ddab5e	[DOCS] Reformat fingerprint token filter docs (#49311 )	2019-11-19 10:55:21 -05:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
James Rodewig	095c34359f	[DOCS] Note limitations of `max_gram` parm in `edge_ngram` tokenizer for index analyzers (#49007 ) The `edge_ngram` tokenizer limits tokens to the `max_gram` character length. Autocomplete searches for terms longer than this limit return no results. To prevent this, you can use the `truncate` token filter to truncate tokens to the `max_gram` character length. However, this could return irrelevant results. This commit adds some advisory text to make users aware of this limitation and outline the tradeoffs for each approach. Closes #48956.	2019-11-13 14:28:12 -05:00
James Rodewig	838af15d29	[DOCS] Reformat compound word token filters (#49006 ) * Separates the compound token filters doc pages into separate token filter pages: * Dictionary decompounder token filter * Hyphenation decompounder token filter * Adds analyze API examples for each compound token filter * Adds a redirect for the removed compound token filters page Co-Authored-By: debadair <debadair@elastic.co>	2019-11-13 09:36:52 -05:00
James Rodewig	dd92830801	[DOCS] Reformat condition token filter (#48775 )	2019-11-11 08:49:44 -05:00
Julian Simioni	5e4501eb3f	[Docs] Consolidate single example into a single line (#48904 ) The first example of splitting rules for the `word_delimiter` token filter was spread across two bullet points. This makes it look like they are two separate splitting rules.	2019-11-08 15:12:45 -05:00
James Rodewig	700a316bb3	[DOCS] Reformat decimal digit token filter docs (#48722 )	2019-11-01 12:38:14 -04:00

1 2 3 4 5 ...

286 Commits