246 Commits

Author SHA1 Message Date
James Rodewig
562607d3f5 [DOCS] Reformat n-gram token filter docs (#49438)
Reformats the edge n-gram and n-gram token filter docs. Changes include:

* Adds title abbreviations
* Updates the descriptions and adds Lucene links
* Reformats parameter definitions
* Adds analyze and custom analyzer snippets
* Adds notes explaining differences between the edge n-gram and n-gram
  filters

Additional changes:
* Switches titles to use "n-gram" throughout.
* Fixes a typo in the edge n-gram tokenizer docs
* Adds an explicit anchor for the `index.max_ngram_diff` setting
2019-11-22 10:38:50 -05:00
Christoph Büscher
4ffa050735 Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:37:12 +01:00
James Rodewig
a26916cc23 [DOCS] Reformat elision token filter docs (#49262) 2019-11-19 10:55:22 -05:00
James Rodewig
8639ddab5e [DOCS] Reformat fingerprint token filter docs (#49311) 2019-11-19 10:55:21 -05:00
gpaimla
7d20b50f45 Implement Lucene EstonianAnalyzer, Stemmer (#49149)
This PR adds a new analyzer and stemmer for the Estonian language.

Closes #48895
2019-11-18 17:24:21 +01:00
James Rodewig
095c34359f [DOCS] Note limitations of max_gram parm in edge_ngram tokenizer for index analyzers (#49007)
The `edge_ngram` tokenizer limits tokens to the `max_gram` character
length. Autocomplete searches for terms longer than this limit return
no results.

To prevent this, you can use the `truncate` token filter to truncate
tokens to the `max_gram` character length. However, this could return irrelevant results.

This commit adds some advisory text to make users aware of this limitation and outline the tradeoffs for each approach.

Closes #48956.
2019-11-13 14:28:12 -05:00
James Rodewig
838af15d29 [DOCS] Reformat compound word token filters (#49006)
* Separates the compound token filters doc pages into separate token
  filter pages:
  * Dictionary decompounder token filter
  * Hyphenation decompounder token filter

* Adds analyze API examples for each compound token filter

* Adds a redirect for the removed compound token filters page

Co-Authored-By: debadair <debadair@elastic.co>
2019-11-13 09:36:52 -05:00
James Rodewig
dd92830801 [DOCS] Reformat condition token filter (#48775) 2019-11-11 08:49:44 -05:00
Julian Simioni
5e4501eb3f [Docs] Consolidate single example into a single line (#48904)
The first example of splitting rules for the `word_delimiter` token filter was spread across two bullet points. This makes it look like they are two separate splitting rules.
2019-11-08 15:12:45 -05:00
James Rodewig
700a316bb3 [DOCS] Reformat decimal digit token filter docs (#48722) 2019-11-01 12:38:14 -04:00
Peter Johnson
3f7aafa421 [DOCS] Fix typo in synonym token filter docs (#48691) 2019-10-31 09:12:24 -04:00
James Rodewig
3d5b1725a9 [DOCS] Remove unneeded filter from common grams analyze ex (#48748) 2019-10-31 09:08:14 -04:00
James Rodewig
77acbc4fa9 [DOCS] Reformat common grams token filter (#48426) 2019-10-30 08:40:56 -04:00
James Rodewig
06dc1fbd96 [DOCS] Reformat ASCII folding token filter docs (#48143) 2019-10-23 15:06:55 -05:00
James Rodewig
9c75f14a9f [DOCS] Reformat classic token filter docs (#48314) 2019-10-23 10:14:25 -05:00
James Rodewig
a66bb2c7ed [DOCS] Reformat CJK bigram and CJK width token filter docs (#48210) 2019-10-21 08:44:49 -05:00
James Rodewig
8677653c5b [DOCS] Reformat apostrophe token filter docs (#48076) 2019-10-16 08:51:14 -04:00
Wilder Pereira
8c73e215b2 [DOCS] Remove unneeded spaces from custom analyzer snippet (#47332) 2019-10-15 15:53:16 -04:00
James Rodewig
601a88bede [DOCS] Sort analyzers, tokenizers, and token filters alphabetically (#48068) 2019-10-15 15:47:25 -04:00
James Rodewig
af7aba18d4 Fixed sample code for minhash (#46385)
The sample code is wrong. Field type is required for the sample field.
I guess the intention was to give the sample field the name ```fingerprint```, mapping it as ```text``` using the custom analyzer ```my_analyzer```
2019-09-12 13:29:44 -04:00
Abhilash Bolla
20e93bca6b Fixed grammar in pattern replace char filter docs. (#46546)
Minor grammar fix in the pattern replace char filter docs.
2019-09-10 11:04:07 -07:00
James Rodewig
b59ecde041
[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00
James Rodewig
f04573f8e8
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459) 2019-09-06 16:09:09 -04:00
James Rodewig
bb7bff5e30
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418) 2019-09-06 09:22:08 -04:00
James Rodewig
3e62cf9d74 [DOCS] Correct custom analyzer callouts (#46030) 2019-08-29 10:08:18 -04:00
James Rodewig
d46545f729 [DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:18:23 -04:00
Christoph Büscher
2cc7f5a744
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-28 09:55:40 +02:00
Alan Woodward
05a7333eca Require [articles] setting in elision filter (#43083)
We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Relates to #43002
2019-06-27 09:02:36 +01:00
Sachin Frayne
44aedcf97a Correct the description of generate_word_parts (#43026) 2019-06-10 11:36:31 +01:00
James Rodewig
5342616a23 [DOCS] Add explicit articles_case parameter to Elision Token Filter example (#42987) 2019-06-07 11:24:43 -04:00
Mayya Sharipova
5a76f46ac6 Fix error with mapping in docs
Related to #39630
2019-05-30 10:28:09 -04:00
Peter Dyson
b84b5525e1 [DOCS] path_hierarchy tokenizer examples (#39630)
Closes #17138
2019-05-30 09:17:55 -04:00
Alan Woodward
3a35427b6d Improvements to docs around multiplexer and synonyms (#41645)
This commit fixes a multiplexer doc error concerning synonyms, and adds
suggestions on how to combine the two filters.
2019-05-07 09:10:14 +01:00
James Rodewig
d46f55f013 [DOCS] Add attribute to escape minimal pt token link in Asciidoctor (#41613) 2019-04-30 14:11:48 -04:00
James Rodewig
53702efddd [DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:20:17 -04:00
Guilherme Ferreira
48a17d5768 [Docs] Correct default stop list constant (#41342) 2019-04-23 19:13:51 +02:00
Guilherme Ferreira
23e40c040a [Docs] Correct spelling of "_none_" (#41192) 2019-04-15 15:12:28 +02:00
Guilherme Ferreira
414debd740 [Docs] Correct spelling the "_none_" stopwords element (#41191) 2019-04-15 14:12:26 +02:00
Christoph Büscher
dfc70e6ef0 Correct indention in synonym docs (#40711)
The stopword filter should be on the same level as the synonym filter in the
example request. Correcting this for better readability.
2019-04-02 01:44:24 +02:00
Mayya Sharipova
671a209ed9 Correct errors in min_hash filter documentation
Related to #39671
2019-03-08 16:21:24 -05:00
Mayya Sharipova
54d41afac1 Add documentation for min_hash filter (#39671)
Closes #20757
2019-03-07 08:49:48 -05:00
jimczi
ecb6df137c fix typo in synonym graph filter docs 2019-03-05 18:20:14 +01:00
Christoph Büscher
4b77d0434a Remove nGram and edgeNGram token filter names (#39070)
In #30209 we deprecated the camel case `nGram` filter name in favour of `ngram` and
did the same for `edgeNGram` and `edge_ngram` and we are removing those names in
8.0. This change disallows using the deprecated names for new indices created in 7.0 by
throwing an error if these filters are used.

Relates to #38911
2019-02-21 16:55:40 +01:00
Jim Ferenczi
83402b1320 Remove beta marker from the synonym_graph docs (#38185) 2019-02-19 10:49:49 +01:00
Mayya Sharipova
0e1b1959fe
Correct rebuilt persian analyzer (#38724) (#38744)
Make substitution of \u200C with a space explicit

The problem with this symbol `\u200C` in a test string, 
that **SHOULD** be substituted with space in the rebuilt Persian analyzer, but it is not.

Correcting this line `"mappings": [ "\\u200C=> "] <1>` to
 `"mappings": [ "\\u200C=>\\u0020"] <1>` in solves the problem.
This change explicitly says to substitute ZWNJ with a space.

Closes #38188
2019-02-11 14:17:18 -05:00
Christoph Büscher
34f2d2ec91
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 15:13:52 +01:00
Christoph Büscher
3a96608b3f
Remove more include_type_name and types from docs (#37601) 2019-01-18 14:11:18 +01:00
Christoph Büscher
25aac4f77f
Remove include_type_name in asciidoc where possible (#37568)
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
2019-01-18 09:34:11 +01:00
Julie Tibshirani
36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Josh Soref
edb48321ba [DOCS] Various spelling corrections (#37046) 2019-01-07 14:44:12 +01:00