OpenSearch/docs/plugins/analysis-stempel.asciidoc

[[analysis-stempel]]
=== Stempel Polish Analysis Plugin

The Stempel Analysis plugin integrates Lucene's Stempel analysis
module for Polish into elasticsearch.

It provides high quality stemming for Polish, based on the
http://www.egothor.org/[Egothor project].

:plugin_name: analysis-stempel
include::install_remove.asciidoc[]

[[analysis-stempel-tokenizer]]
[discrete]
==== `stempel` tokenizer and token filters

The plugin provides the `polish` analyzer and the `polish_stem` and `polish_stop` token filters,
which are not configurable.

==== Reimplementing and extending the analyzers

The `polish` analyzer could be reimplemented as a `custom` analyzer that can
then be extended and configured differently as follows:

[source,console]
----------------------------------------------------
PUT /stempel_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_stempel": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "polish_stop",
            "polish_stem"
          ]
        }
      }
    }
  }
}
----------------------------------------------------
// TEST[s/\n$/\nstartyaml\n  - compare_analyzers: {index: stempel_example, first: polish, second: rebuilt_stempel}\nendyaml\n/]

[[analysis-polish-stop]]
==== `polish_stop` token filter

The `polish_stop` token filter filters out Polish stopwords (`_polish_`), and
any other custom stopwords specified by the user. This filter only supports
the predefined `_polish_` stopwords list.  If you want to use a different
predefined list, then use the
{ref}/analysis-stop-tokenfilter.html[`stop` token filter] instead.

[source,console]
--------------------------------------------------
PUT /polish_stop_example
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "analyzer_with_stop": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "polish_stop"
            ]
          }
        },
        "filter": {
          "polish_stop": {
            "type": "polish_stop",
            "stopwords": [
              "_polish_",
              "jeść"
            ]
          }
        }
      }
    }
  }
}

GET polish_stop_example/_analyze
{
  "analyzer": "analyzer_with_stop",
  "text": "Gdzie kucharek sześć, tam nie ma co jeść."
}
--------------------------------------------------

The above request returns:

[source,console-result]
--------------------------------------------------
{
  "tokens" : [
    {
      "token" : "kucharek",
      "start_offset" : 6,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "sześć",
      "start_offset" : 15,
      "end_offset" : 20,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}
--------------------------------------------------
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00			`[[analysis-stempel]]`
			`=== Stempel Polish Analysis Plugin`

			`The Stempel Analysis plugin integrates Lucene's Stempel analysis`
			`module for Polish into elasticsearch.`

			`It provides high quality stemming for Polish, based on the`
			`http://www.egothor.org/[Egothor project].`

Added "release-state" support to plugin docs 2017-04-20 09:01:37 -04:00			`:plugin_name: analysis-stempel`
			`include::install_remove.asciidoc[]`
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00
			`[[analysis-stempel-tokenizer]]`
[DOCS] Swap `[float]` for `[discrete]` (#60134) Changes instances of `[float]` in our docs for `[discrete]`. Asciidoctor prefers the `[discrete]` tag for floating headings: https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks 2020-07-23 12:42:33 -04:00			`[discrete]`
[Docs] Add example to reimplement stempel analyzer (#42676) Adding an example of how to re-implement the polish stempel analyzer in case a user want to modify or extend it. In order for the analyzer to be able to use polish stopwords, also registering a polish_stop filter for the stempel plugin. Closes #13150 2019-06-03 07:22:10 -04:00			==== `stempel` tokenizer and token filters
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00
[Docs] Add example to reimplement stempel analyzer (#42676) Adding an example of how to re-implement the polish stempel analyzer in case a user want to modify or extend it. In order for the analyzer to be able to use polish stopwords, also registering a polish_stop filter for the stempel plugin. Closes #13150 2019-06-03 07:22:10 -04:00			The plugin provides the `polish` analyzer and the `polish_stem` and `polish_stop` token filters,
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00			`which are not configurable.`
[Docs] Add example to reimplement stempel analyzer (#42676) Adding an example of how to re-implement the polish stempel analyzer in case a user want to modify or extend it. In order for the analyzer to be able to use polish stopwords, also registering a polish_stop filter for the stempel plugin. Closes #13150 2019-06-03 07:22:10 -04:00
			`==== Reimplementing and extending the analyzers`

			The `polish` analyzer could be reimplemented as a `custom` analyzer that can
			`then be extended and configured differently as follows:`

[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00			`[source,console]`
[Docs] Add example to reimplement stempel analyzer (#42676) Adding an example of how to re-implement the polish stempel analyzer in case a user want to modify or extend it. In order for the analyzer to be able to use polish stopwords, also registering a polish_stop filter for the stempel plugin. Closes #13150 2019-06-03 07:22:10 -04:00			`----------------------------------------------------`
			`PUT /stempel_example`
			`{`
			`"settings": {`
			`"analysis": {`
			`"analyzer": {`
			`"rebuilt_stempel": {`
			`"tokenizer": "standard",`
			`"filter": [`
			`"lowercase",`
			`"polish_stop",`
			`"polish_stem"`
			`]`
			`}`
			`}`
			`}`
			`}`
			`}`
			`----------------------------------------------------`
			`// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: stempel_example, first: polish, second: rebuilt_stempel}\nendyaml\n/]`

			`[[analysis-polish-stop]]`
			==== `polish_stop` token filter

			The `polish_stop` token filter filters out Polish stopwords (`_polish_`), and
			`any other custom stopwords specified by the user. This filter only supports`
			the predefined `_polish_` stopwords list. If you want to use a different
			`predefined list, then use the`
			{ref}/analysis-stop-tokenfilter.html[`stop` token filter] instead.

[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00			`[source,console]`
[Docs] Add example to reimplement stempel analyzer (#42676) Adding an example of how to re-implement the polish stempel analyzer in case a user want to modify or extend it. In order for the analyzer to be able to use polish stopwords, also registering a polish_stop filter for the stempel plugin. Closes #13150 2019-06-03 07:22:10 -04:00			`--------------------------------------------------`
			`PUT /polish_stop_example`
			`{`
			`"settings": {`
			`"index": {`
			`"analysis": {`
			`"analyzer": {`
			`"analyzer_with_stop": {`
			`"tokenizer": "standard",`
			`"filter": [`
			`"lowercase",`
			`"polish_stop"`
			`]`
			`}`
			`},`
			`"filter": {`
			`"polish_stop": {`
			`"type": "polish_stop",`
			`"stopwords": [`
			`"_polish_",`
			`"jeść"`
			`]`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`

			`GET polish_stop_example/_analyze`
			`{`
			`"analyzer": "analyzer_with_stop",`
			`"text": "Gdzie kucharek sześć, tam nie ma co jeść."`
			`}`
			`--------------------------------------------------`

			`The above request returns:`

[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418) 2019-09-06 09:22:08 -04:00			`[source,console-result]`
[Docs] Add example to reimplement stempel analyzer (#42676) Adding an example of how to re-implement the polish stempel analyzer in case a user want to modify or extend it. In order for the analyzer to be able to use polish stopwords, also registering a polish_stop filter for the stempel plugin. Closes #13150 2019-06-03 07:22:10 -04:00			`--------------------------------------------------`
			`{`
			`"tokens" : [`
			`{`
			`"token" : "kucharek",`
			`"start_offset" : 6,`
			`"end_offset" : 14,`
			`"type" : "<ALPHANUM>",`
			`"position" : 1`
			`},`
			`{`
			`"token" : "sześć",`
			`"start_offset" : 15,`
			`"end_offset" : 20,`
			`"type" : "<ALPHANUM>",`
			`"position" : 2`
			`}`
			`]`
			`}`
			`--------------------------------------------------`