Makes the following changes to the `stop` token filter docs:
* Updates description
* Adds a link to the related Lucene filter
* Adds detailed analyze snippet
* Updates custom analyzer and custom filter snippets
* Adds a list of predefined stop words by language
Co-authored-by: ScottieL <36999642+ScottieL@users.noreply.github.com>
* [DOCS] Rewrite analysis intro. Move index/search analysis content.
* Rewrites 'Text analysis' page intro as high-level definition.
Adds guidance on when users should configure text analysis
* Rewrites and splits index/search analysis content:
* Conceptual content -> 'Index and search analysis' under 'Concepts'
* Task-based content -> 'Specify an analyzer' under 'Configure...'
* Adds detailed examples for when to use the same index/search analyzer
and when not.
* Adds new example snippets for specifying search analyzers
* clarifications
* Add toc. Decrement headings.
* Reword 'When to configure' section
* Remove sentence from tip
Adds a 'Configure text analysis' page to house tutorial content for the
analysis topic.
Also relocates the following pages as children as this new page:
* 'Test an analyzer'
* 'Configuring built-in analyzers'
* 'Create a custom analyzer'
I plan to add a tutorial for specifying index-time and search-time
analyzers to this section as part of a future PR.
This helps the topic better match the structure of
our machine learning docs, e.g.
https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html
This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts'
child page, but I plan to add other concepts, such as 'Index time vs.
search time', with later PRs.
* Changes titles to sentence case.
* Appends pages with 'reference' to differentiate their content from
conceptual overviews.
* Moves the 'Normalizers' page to end of the Analysis topic pages.
Adds a 'text analysis overview' page to the analysis topic docs.
The goals of this page are:
* Concisely summarize the analysis process while avoiding in-depth concepts, tutorials, or API examples
* Explain why analysis is important, largely through highlighting problems with full-text searches missing analysis
* Highlight how analysis can be used to improve search results
The Analysis docs mention including a default analyzer in the index settings. However, no example snippet is included.
This adds an example snippet that users can easily copy and adjust.
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
There have been at least two PRs trying to fix the spelling of "lazi" because it
isn't very clear from the example that the english analyzer will stem each token
in the example. This adds a short description of the analysis process to make
this clearer.
Relates to #31797
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.
Closes#27750Closes#27751
This adds a new `normalizer` property to `keyword` fields that pre-processes the
field value prior to indexing, but without altering the `_source`. Note that
only the normalization components that work on a per-character basis are
applied, so for instance stemming filters will be ignored while lowercasing or
ascii folding will be applied.
Closes#18064
* Docs: First pass at improving analyzer docs
I've rewritten the intro to analyzers plus the docs
for all analyzers to provide working examples.
I've also removed:
* analyzer aliases (see #18244)
* analyzer versions (see #18267)
* snowball analyzer (see #8690)
Next steps will be tokenizers, token filters, char filters
* Fixed two typos