OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-03-09 14:34:43 +00:00

Author	SHA1	Message	Date
James Rodewig	ade72b97b7	[DOCS] Reformat keep types and keep words token filter docs (#49604 ) * Adds title abbreviations * Updates the descriptions and adds Lucene links * Reformats parameter definitions * Adds analyze and custom analyzer snippets * Adds explanations of token types to keep types token filter and tokenizer docs	2019-12-02 09:40:50 -05:00
James Rodewig	2fd58bb845	[DOCS] Add missing "_type" to delimited payload token filter docs	2019-11-25 16:16:05 -05:00
James Rodewig	c40449ac22	[DOCS] Reformat delimited payload token filter docs (#49380 ) * Adds a title abbreviation * Relocates the older name deprecation warning * Updates the description and adds a Lucene link * Adds a note to explain payloads and how to store them * Adds analyze and custom analyzer snippets * Adds a 'Return stored payloads' example	2019-11-25 15:40:05 -05:00
James Rodewig	d06c71eb82	[DOCS] Fix edge n-gram tokenizer nav Adds a missing float tag to the edge n-gram tokenizer docs. This tag ensures the edge n-gram tokenizer docs display on the same page.	2019-11-22 15:54:07 -05:00
James Rodewig	562607d3f5	[DOCS] Reformat n-gram token filter docs (#49438 ) Reformats the edge n-gram and n-gram token filter docs. Changes include: * Adds title abbreviations * Updates the descriptions and adds Lucene links * Reformats parameter definitions * Adds analyze and custom analyzer snippets * Adds notes explaining differences between the edge n-gram and n-gram filters Additional changes: * Switches titles to use "n-gram" throughout. * Fixes a typo in the edge n-gram tokenizer docs * Adds an explicit anchor for the `index.max_ngram_diff` setting	2019-11-22 10:38:50 -05:00
Christoph Büscher	4ffa050735	Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894	2019-11-20 10:37:12 +01:00
James Rodewig	a26916cc23	[DOCS] Reformat elision token filter docs (#49262 )	2019-11-19 10:55:22 -05:00
James Rodewig	8639ddab5e	[DOCS] Reformat fingerprint token filter docs (#49311 )	2019-11-19 10:55:21 -05:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
James Rodewig	095c34359f	[DOCS] Note limitations of `max_gram` parm in `edge_ngram` tokenizer for index analyzers (#49007 ) The `edge_ngram` tokenizer limits tokens to the `max_gram` character length. Autocomplete searches for terms longer than this limit return no results. To prevent this, you can use the `truncate` token filter to truncate tokens to the `max_gram` character length. However, this could return irrelevant results. This commit adds some advisory text to make users aware of this limitation and outline the tradeoffs for each approach. Closes #48956.	2019-11-13 14:28:12 -05:00
James Rodewig	838af15d29	[DOCS] Reformat compound word token filters (#49006 ) * Separates the compound token filters doc pages into separate token filter pages: * Dictionary decompounder token filter * Hyphenation decompounder token filter * Adds analyze API examples for each compound token filter * Adds a redirect for the removed compound token filters page Co-Authored-By: debadair <debadair@elastic.co>	2019-11-13 09:36:52 -05:00
James Rodewig	dd92830801	[DOCS] Reformat condition token filter (#48775 )	2019-11-11 08:49:44 -05:00
Julian Simioni	5e4501eb3f	[Docs] Consolidate single example into a single line (#48904 ) The first example of splitting rules for the `word_delimiter` token filter was spread across two bullet points. This makes it look like they are two separate splitting rules.	2019-11-08 15:12:45 -05:00
James Rodewig	700a316bb3	[DOCS] Reformat decimal digit token filter docs (#48722 )	2019-11-01 12:38:14 -04:00
Peter Johnson	3f7aafa421	[DOCS] Fix typo in synonym token filter docs (#48691 )	2019-10-31 09:12:24 -04:00
James Rodewig	3d5b1725a9	[DOCS] Remove unneeded filter from common grams analyze ex (#48748 )	2019-10-31 09:08:14 -04:00
James Rodewig	77acbc4fa9	[DOCS] Reformat common grams token filter (#48426 )	2019-10-30 08:40:56 -04:00
James Rodewig	06dc1fbd96	[DOCS] Reformat ASCII folding token filter docs (#48143 )	2019-10-23 15:06:55 -05:00
James Rodewig	9c75f14a9f	[DOCS] Reformat classic token filter docs (#48314 )	2019-10-23 10:14:25 -05:00
James Rodewig	a66bb2c7ed	[DOCS] Reformat CJK bigram and CJK width token filter docs (#48210 )	2019-10-21 08:44:49 -05:00
James Rodewig	8677653c5b	[DOCS] Reformat apostrophe token filter docs (#48076 )	2019-10-16 08:51:14 -04:00
Wilder Pereira	8c73e215b2	[DOCS] Remove unneeded spaces from custom analyzer snippet (#47332 )	2019-10-15 15:53:16 -04:00
James Rodewig	601a88bede	[DOCS] Sort analyzers, tokenizers, and token filters alphabetically (#48068 )	2019-10-15 15:47:25 -04:00
James Rodewig	af7aba18d4	Fixed sample code for minhash (#46385 ) The sample code is wrong. Field type is required for the sample field. I guess the intention was to give the sample field the name ```fingerprint```, mapping it as ```text``` using the custom analyzer ```my_analyzer```	2019-09-12 13:29:44 -04:00
Abhilash Bolla	20e93bca6b	Fixed grammar in pattern replace char filter docs. (#46546 ) Minor grammar fix in the pattern replace char filter docs.	2019-09-10 11:04:07 -07:00
James Rodewig	b59ecde041	[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353 ) (#46502 )	2019-09-09 13:38:14 -04:00
James Rodewig	f04573f8e8	[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449 ) (#46459 )	2019-09-06 16:09:09 -04:00
James Rodewig	bb7bff5e30	[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295 ) (#46418 )	2019-09-06 09:22:08 -04:00
James Rodewig	3e62cf9d74	[DOCS] Correct custom analyzer callouts (#46030 )	2019-08-29 10:08:18 -04:00
James Rodewig	d46545f729	[DOCS] Update anchors and links for Elasticsearch API relocation (#44500 )	2019-07-19 09:18:23 -04:00
Christoph Büscher	2cc7f5a744	Allow reloading of search time analyzers (#43313 ) Currently changing resources (like dictionaries, synonym files etc...) of search time analyzers is only possible by closing an index, changing the underlying resource (e.g. synonym files) and then re-opening the index for the change to take effect. This PR adds a new API endpoint that allows triggering reloading of certain analysis resources (currently token filters) that will then pick up changes in underlying file resources. To achieve this we introduce a new type of custom analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows swapping out analysis components. Custom analyzers that contain filters that are markes as "updateable" will automatically choose this implementation. This PR also adds this capability to `synonym` token filters for use in search time analyzers. Relates to #29051	2019-06-28 09:55:40 +02:00
Alan Woodward	05a7333eca	Require [articles] setting in elision filter (#43083 ) We should throw an exception at construction time if a list of articles is not provided, otherwise we can get random NPEs during indexing. Relates to #43002	2019-06-27 09:02:36 +01:00
Sachin Frayne	44aedcf97a	Correct the description of generate_word_parts (#43026 )	2019-06-10 11:36:31 +01:00
James Rodewig	5342616a23	[DOCS] Add explicit `articles_case` parameter to Elision Token Filter example (#42987 )	2019-06-07 11:24:43 -04:00
Mayya Sharipova	5a76f46ac6	Fix error with mapping in docs Related to #39630	2019-05-30 10:28:09 -04:00
Peter Dyson	b84b5525e1	[DOCS] path_hierarchy tokenizer examples (#39630 ) Closes #17138	2019-05-30 09:17:55 -04:00
Alan Woodward	3a35427b6d	Improvements to docs around multiplexer and synonyms (#41645 ) This commit fixes a multiplexer doc error concerning synonyms, and adds suggestions on how to combine the two filters.	2019-05-07 09:10:14 +01:00
James Rodewig	d46f55f013	[DOCS] Add attribute to escape minimal pt token link in Asciidoctor (#41613 )	2019-04-30 14:11:48 -04:00
James Rodewig	53702efddd	[DOCS] Add anchors for Asciidoctor migration (#41648 )	2019-04-30 10:20:17 -04:00
Guilherme Ferreira	48a17d5768	[Docs] Correct default stop list constant (#41342 )	2019-04-23 19:13:51 +02:00
Guilherme Ferreira	23e40c040a	[Docs] Correct spelling of "_none_" (#41192 )	2019-04-15 15:12:28 +02:00
Guilherme Ferreira	414debd740	[Docs] Correct spelling the "_none_" stopwords element (#41191 )	2019-04-15 14:12:26 +02:00
Christoph Büscher	dfc70e6ef0	Correct indention in synonym docs (#40711 ) The stopword filter should be on the same level as the synonym filter in the example request. Correcting this for better readability.	2019-04-02 01:44:24 +02:00
Mayya Sharipova	671a209ed9	Correct errors in min_hash filter documentation Related to #39671	2019-03-08 16:21:24 -05:00
Mayya Sharipova	54d41afac1	Add documentation for min_hash filter (#39671 ) Closes #20757	2019-03-07 08:49:48 -05:00
jimczi	ecb6df137c	fix typo in synonym graph filter docs	2019-03-05 18:20:14 +01:00
Christoph Büscher	4b77d0434a	Remove `nGram` and `edgeNGram` token filter names (#39070 ) In #30209 we deprecated the camel case `nGram` filter name in favour of `ngram` and did the same for `edgeNGram` and `edge_ngram` and we are removing those names in 8.0. This change disallows using the deprecated names for new indices created in 7.0 by throwing an error if these filters are used. Relates to #38911	2019-02-21 16:55:40 +01:00
Jim Ferenczi	83402b1320	Remove beta marker from the synonym_graph docs (#38185 )	2019-02-19 10:49:49 +01:00
Mayya Sharipova	0e1b1959fe	Correct rebuilt persian analyzer (#38724 ) (#38744 ) Make substitution of \u200C with a space explicit The problem with this symbol `\u200C` in a test string, that SHOULD be substituted with space in the rebuilt Persian analyzer, but it is not. Correcting this line `"mappings": [ "\\u200C=> "] <1>` to `"mappings": [ "\\u200C=>\\u0020"] <1>` in solves the problem. This change explicitly says to substitute ZWNJ with a space. Closes #38188	2019-02-11 14:17:18 -05:00
Christoph Büscher	34f2d2ec91	Remove remaining occurances of "include_type_name=true" in docs (#37646 )	2019-01-22 15:13:52 +01:00

1 2 3 4 5

250 Commits