The ICU plugin provides the building blocks of an analysis chain, but doesn't actually have a prebuilt analyzer. It would be a better for users if there was a simple analyzer that they could use out of the box, and also something we can point to from the CJK Analyzer docs as a superior alternative.
Relates to #34285
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
UnicodeSetFilter was only allowed in the icu_folding token filter.
It seems useful to expose this setting in icu_normalizer token filter
and char filter.
Adds a new "icu_collation" field type that exposes lucene's
ICUCollationDocValuesField. ICUCollationDocValuesField is the replacement
for ICUCollationKeyFilter which has been deprecated since Lucene 5.
and be much more stingy about what we consider a console candidate.
* Add `// CONSOLE` to check-running
* Fix version in some snippets
* Mark groovy snippets as groovy
* Fix versions in plugins
* Fix language marker errors
* Fix language parsing in snippets
This adds support for snippets who's language is written like
`[source, txt]` and `["source","js",subs="attributes,callouts"]`.
This also makes language required for snippets which is nice because
then we can be sure we can grep for snippets in a particular language.
We have 1074 snippets that look like they should be converted to
`// CONSOLE`. At least that is what `gradle docs:listConsoleCandidates`
says. This adds `// NOTCONSOLE` to explicitly mark snippets that
*shouldn't* be converted to `// CONSOLE`. After marking the blindingly
obvious ones this cuts the remaining snippet count to 1032.
We weren't doing it before because we weren't starting the plugins.
Now we are.
The hardest part of this was handling the files the tests expect
to be on the filesystem. extraConfigFiles was broken.
The syntax highlighter only supports [source,js].
Also adds a check to the rest test generator that runs during
the build that'll fail the build if it sees `[source,json]`.
Lucene allows to create a ICUTokenizer with a special config argument
enabling the customization of the rule based iterator by providing
custom rules files.
This commit enable this feature. Users could provide a list of RBBI rule
files to ICU tokenizer.
closes#13146
* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community
Closes#11734Closes#11724Closes#11636Closes#11635Closes#11632Closes#11630Closes#12046Closes#12438Closes#12579