Adds a 'Configure text analysis' page to house tutorial content for the
analysis topic.
Also relocates the following pages as children as this new page:
* 'Test an analyzer'
* 'Configuring built-in analyzers'
* 'Create a custom analyzer'
I plan to add a tutorial for specifying index-time and search-time
analyzers to this section as part of a future PR.
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
* Docs: Improved tokenizer docs
Added descriptions and runnable examples
* Addressed Nik's comments
* Added TESTRESPONSEs for all tokenizer examples
* Added TESTRESPONSEs for all analyzer examples too
* Added docs, examples, and TESTRESPONSES for character filters
* Skipping two tests:
One interprets "$1" as a stack variable - same problem exists with the REST tests
The other because the "took" value is always different
* Fixed tests with "took"
* Fixed failing tests and removed preserve_original from fingerprint analyzer
* Docs: First pass at improving analyzer docs
I've rewritten the intro to analyzers plus the docs
for all analyzers to provide working examples.
I've also removed:
* analyzer aliases (see #18244)
* analyzer versions (see #18267)
* snowball analyzer (see #8690)
Next steps will be tokenizers, token filters, char filters
* Fixed two typos
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.
Also postition_offset_gaps less than 0 aren't allowed at all.
New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through
Closes#7268