OpenSearch/docs/reference/analysis/analyzers/configuring.asciidoc

[[configuring-analyzers]]
=== Configuring built-in analyzers

The built-in analyzers can be used directly without any configuration.  Some
of them, however, support configuration options to alter their behaviour.  For
instance, the <<analysis-standard-analyzer,`standard` analyzer>> can be configured
to support a list of stop words:

[source,console]
--------------------------------
PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_english": { <1>
          "type":      "standard",
          "stopwords": "_english_"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_text": {
        "type":     "text",
        "analyzer": "standard", <2>
        "fields": {
          "english": {
            "type":     "text",
            "analyzer": "std_english" <3>
          }
        }
      }
    }
  }
}

POST my-index-000001/_analyze
{
  "field": "my_text", <2>
  "text": "The old brown cow"
}

POST my-index-000001/_analyze
{
  "field": "my_text.english", <3>
  "text": "The old brown cow"
}

--------------------------------

<1> We define the `std_english` analyzer to be based on the `standard`
    analyzer, but configured to remove the pre-defined list of English stopwords.
<2> The `my_text` field uses the `standard` analyzer directly, without
    any configuration.  No stop words will be removed from this field.
    The resulting terms are: `[ the, old, brown, cow ]`
<3> The `my_text.english` field uses the `std_english` analyzer, so
    English stop words will be removed.  The resulting terms are:
    `[ old, brown, cow ]`


/////////////////////

[source,console-result]
----------------------------
{
  "tokens": [
    {
      "token": "old",
      "start_offset": 4,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "brown",
      "start_offset": 8,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "cow",
      "start_offset": 14,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}
----------------------------

/////////////////////
First pass at improving analyzer docs (#18269) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos 2016-05-11 08:17:56 -04:00			`[[configuring-analyzers]]`
			`=== Configuring built-in analyzers`

			`The built-in analyzers can be used directly without any configuration. Some`
			`of them, however, support configuration options to alter their behaviour. For`
			instance, the <<analysis-standard-analyzer,`standard` analyzer>> can be configured
			`to support a list of stop words:`

[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00			`[source,console]`
First pass at improving analyzer docs (#18269) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos 2016-05-11 08:17:56 -04:00			`--------------------------------`
[DOCS] Update my-index examples (#60132) (#60248) Changes the following example index names to `my-index-000001` for consistency: * `my-index` * `my_index` * `myindex` 2020-07-27 15:58:26 -04:00			`PUT my-index-000001`
First pass at improving analyzer docs (#18269) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos 2016-05-11 08:17:56 -04:00			`{`
			`"settings": {`
			`"analysis": {`
			`"analyzer": {`
			`"std_english": { <1>`
			`"type": "standard",`
			`"stopwords": "_english_"`
			`}`
			`}`
			`}`
			`},`
			`"mappings": {`
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 09:13:52 -05:00			`"properties": {`
			`"my_text": {`
			`"type": "text",`
			`"analyzer": "standard", <2>`
			`"fields": {`
			`"english": {`
			`"type": "text",`
			`"analyzer": "std_english" <3>`
First pass at improving analyzer docs (#18269) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos 2016-05-11 08:17:56 -04:00			`}`
			`}`
			`}`
			`}`
			`}`
			`}`

[DOCS] Update my-index examples (#60132) (#60248) Changes the following example index names to `my-index-000001` for consistency: * `my-index` * `my_index` * `myindex` 2020-07-27 15:58:26 -04:00			`POST my-index-000001/_analyze`
First pass at improving analyzer docs (#18269) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos 2016-05-11 08:17:56 -04:00			`{`
			`"field": "my_text", <2>`
			`"text": "The old brown cow"`
			`}`

[DOCS] Update my-index examples (#60132) (#60248) Changes the following example index names to `my-index-000001` for consistency: * `my-index` * `my_index` * `myindex` 2020-07-27 15:58:26 -04:00			`POST my-index-000001/_analyze`
First pass at improving analyzer docs (#18269) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos 2016-05-11 08:17:56 -04:00			`{`
			`"field": "my_text.english", <3>`
			`"text": "The old brown cow"`
			`}`

			`--------------------------------`

			<1> We define the `std_english` analyzer to be based on the `standard`
			`analyzer, but configured to remove the pre-defined list of English stopwords.`
			<2> The `my_text` field uses the `standard` analyzer directly, without
			`any configuration. No stop words will be removed from this field.`
			The resulting terms are: `[ the, old, brown, cow ]`
			<3> The `my_text.english` field uses the `std_english` analyzer, so
			`English stop words will be removed. The resulting terms are:`
			`[ old, brown, cow ]`

Docs: Improved tokenizer docs (#18356) * Docs: Improved tokenizer docs Added descriptions and runnable examples * Addressed Nik's comments * Added TESTRESPONSEs for all tokenizer examples * Added TESTRESPONSEs for all analyzer examples too * Added docs, examples, and TESTRESPONSES for character filters * Skipping two tests: One interprets "$1" as a stack variable - same problem exists with the REST tests The other because the "took" value is always different * Fixed tests with "took" * Fixed failing tests and removed preserve_original from fingerprint analyzer 2016-05-19 13:42:23 -04:00
			`/////////////////////`

[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418) 2019-09-06 09:22:08 -04:00			`[source,console-result]`
Docs: Improved tokenizer docs (#18356) * Docs: Improved tokenizer docs Added descriptions and runnable examples * Addressed Nik's comments * Added TESTRESPONSEs for all tokenizer examples * Added TESTRESPONSEs for all analyzer examples too * Added docs, examples, and TESTRESPONSES for character filters * Skipping two tests: One interprets "$1" as a stack variable - same problem exists with the REST tests The other because the "took" value is always different * Fixed tests with "took" * Fixed failing tests and removed preserve_original from fingerprint analyzer 2016-05-19 13:42:23 -04:00			`----------------------------`
			`{`
			`"tokens": [`
			`{`
			`"token": "old",`
			`"start_offset": 4,`
			`"end_offset": 7,`
			`"type": "<ALPHANUM>",`
			`"position": 1`
			`},`
			`{`
			`"token": "brown",`
			`"start_offset": 8,`
			`"end_offset": 13,`
			`"type": "<ALPHANUM>",`
			`"position": 2`
			`},`
			`{`
			`"token": "cow",`
			`"start_offset": 14,`
			`"end_offset": 17,`
			`"type": "<ALPHANUM>",`
			`"position": 3`
			`}`
			`]`
			`}`
			`----------------------------`

			`/////////////////////`