Merge pull request #515 from alicejw-aws/ip_range-478

[issue 478] description for ip_range parameter + mappings guide
2022-05-03 14:31:40 -05:00 · 2022-05-03 14:31:40 -05:00 · 38d3ae6c4c
parent 9cc549f0d9 bf36e0f481
commit 38d3ae6c4c
9 changed files with 138 additions and 77 deletions
--- a/2
+++ b/2
@ -29,4 +29,4 @@ end
 gem "tzinfo-data", platforms: [:mingw, :mswin, :x64_mingw, :jruby]

 # Performance-booster for watching directories on Windows
-gem "wdm", "~> 0.1.0" if Gem.win_platform?
+gem "wdm", "~> 0.1.0" if Gem.win_platform?
--- a/_opensearch/bucket-agg.md
+++ b/_opensearch/bucket-agg.md
@ -709,7 +709,19 @@ GET opensearch_dashboards_sample_data_logs/_search
 }
 }
 ```
+If you add a document with malformed fields to an index that has `ip_range` set to `false` in its mappings, OpenSearch rejects the entire document. You can set `ignore_malformed` to `true` to specify that OpenSearch should ignore malformed fields. The default is `false`.

+```json
+...
+"mappings": {
+  "properties": {
+    "ips": {
+      "type": "ip_range",
+      "ignore_malformed": true
+    }
+  }
+}
+```
 ## filter, filters

 A `filter` aggregation is a query clause, exactly like a search query — `match` or `term` or `range`. You can use the `filter` aggregation to narrow down the entire set of documents to a specific set before creating buckets.
--- a/_opensearch/cluster.md
+++ b/_opensearch/cluster.md
@ -34,14 +34,12 @@ After you assess all these requirements, we recommend you use a benchmark testin

 This page demonstrates how to work with the different node types. It assumes that you have a four-node cluster similar to the preceding illustration.

-
 ## Prerequisites

 Before you get started, you must install and configure OpenSearch on all of your nodes. For information about the available options, see [Install and configure OpenSearch]({{site.url}}{{site.baseurl}}/opensearch/install/).

 After you're done, use SSH to connect to each node, then open the `config/opensearch.yml` file. You can set all configurations for your cluster in this file.

-
 ## Step 1: Name a cluster

 Specify a unique name for the cluster. If you don't specify a cluster name, it's set to `opensearch` by default. Setting a descriptive cluster name is important, especially if you want to run multiple clusters inside a single network.
@ -60,12 +58,10 @@ cluster.name: opensearch-cluster

 Make the same change on all the nodes to make sure that they'll join to form a cluster.

-
 ## Step 2: Set node attributes for each node in a cluster

 After you name the cluster, set node attributes for each node in your cluster.

-
 #### Master node

 Give your master node a name. If you don't specify a name, OpenSearch assigns a machine-generated name that makes the node difficult to monitor and troubleshoot.
@ -80,7 +76,6 @@ You can also explicitly specify that this node is a master node. This is already
 node.roles: [ master ]
 ```

-
 #### Data nodes

 Change the name of two nodes to `opensearch-d1` and `opensearch-d2`, respectively:
@ -88,6 +83,7 @@ Change the name of two nodes to `opensearch-d1` and `opensearch-d2`, respectivel
 ```yml
 node.name: opensearch-d1
 ```
+
 ```yml
 node.name: opensearch-d2
 ```
@ -100,7 +96,6 @@ node.roles: [ data, ingest ]

 You can also specify any other attributes that you'd like to set for the data nodes.

-
 #### Coordinating node

 Change the name of the coordinating node to `opensearch-c1`:
@ -115,7 +110,6 @@ Every node is a coordinating node by default, so to make this node a dedicated c
 node.roles: []
 ```

-
 ## Step 3: Bind a cluster to specific IP addresses

 `network_host` defines the IP address used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6:
@ -132,7 +126,6 @@ network.host: <IP address of the node>

 Make sure to configure these settings on all of your nodes.

-
 ## Step 4: Configure discovery hosts for a cluster

 Now that you've configured the network hosts, you need to configure the discovery hosts.
@ -147,7 +140,6 @@ For example, for `opensearch-master` the line looks something like this:
 discovery.seed_hosts: ["<private IP of opensearch-d1>", "<private IP of opensearch-d2>", "<private IP of opensearch-c1>"]
 ```

-
 ## Step 5: Start the cluster

 After you set the configurations, start OpenSearch on all nodes:
@ -178,7 +170,6 @@ x.x.x.x           23          38   0    0.12    0.07     0.06 md        -      o

 To better understand and monitor your cluster, use the [cat API]({{site.url}}{{site.baseurl}}/opensearch/catapis/).

-
 ## (Advanced) Step 6: Configure shard allocation awareness or forced awareness

 If your nodes are spread across several geographical zones, you can configure shard allocation awareness to allocate all replica shards to a zone that’s different from their primary shard.
@ -190,6 +181,7 @@ To configure shard allocation awareness, add zone attributes to `opensearch-d1`
 ```yml
 node.attr.zone: zoneA
 ```
+
 ```yml
 node.attr.zone: zoneB
 ```
@ -230,7 +222,6 @@ If that is not the case, and `opensearch-d1` and `opensearch-d2` do not have the

 Choosing allocation awareness or forced awareness depends on how much space you might need in each zone to balance your primary and replica shards.

-
 ## (Advanced) Step 7: Set up a hot-warm architecture

 You can design a hot-warm architecture where you first index your data to hot nodes---fast and expensive---and after a certain period of time move them to warm nodes---slow and cheap.
@ -244,6 +235,7 @@ To configure a hot-warm storage architecture, add `temp` attributes to `opensear
 ```yml
 node.attr.temp: hot
 ```
+
 ```yml
 node.attr.temp: warm
 ```
@ -314,7 +306,6 @@ A popular approach is to configure your [index templates]({{site.url}}{{site.bas

 You can then use the [Index State Management (ISM)]({{site.url}}{{site.baseurl}}/im-plugin/) plugin to periodically check the age of an index and specify actions to take on it. For example, when the index reaches a specific age, change the `index.routing.allocation.require.temp` setting to `warm` to automatically move your data from hot nodes to warm nodes.

-
 ## Next steps

 If you are using the security plugin, the previous request to `_cat/nodes?v` might have failed with an initialization error. For full guidance around using the security plugin, see [Security configuration]({{site.url}}{{site.baseurl}}/security-plugin/configuration/index/).
--- a/_opensearch/index-alias.md
+++ b/_opensearch/index-alias.md
@ -6,13 +6,13 @@ nav_order: 12

 # Index aliases

-An alias is a virtual index name that can point to one or more indices.
+An alias is a virtual index name that can point to one or more indexes.

-If your data is spread across multiple indices, rather than keeping track of which indices to query, you can create an alias and query it instead.
+If your data is spread across multiple indexes, rather than keeping track of which indexes to query, you can create an alias and query it instead.

-For example, if you’re storing logs into indices based on the month and you frequently query the logs for the previous two months, you can create a `last_2_months` alias and update the indices it points to each month.
+For example, if you’re storing logs into indexes based on the month and you frequently query the logs for the previous two months, you can create a `last_2_months` alias and update the indexes it points to each month.

-Because you can change the indices an alias points to at any time, referring to indices using aliases in your applications allows you to reindex your data without any downtime.
+Because you can change the indexes an alias points to at any time, referring to indexes using aliases in your applications allows you to reindex your data without any downtime.

 ---

@ -63,7 +63,7 @@ To check if `alias1` refers to `index-1`, run the following command:
 GET alias1
 ```

-## Add or remove indices
+## Add or remove indexes

 You can perform multiple actions in the same `_aliases` operation.
 For example, the following command removes `index-1` and adds `index-2` to `alias1`:
@ -90,7 +90,7 @@ POST _aliases

 The `add` and `remove` actions occur atomically, which means that at no point will `alias1` point to both `index-1` and `index-2`.

-You can also add indices based on an index pattern:
+You can also add indexes based on an index pattern:

 ```json
 POST _aliases
@ -108,7 +108,7 @@ POST _aliases

 ## Manage aliases

-To list the mapping of aliases to indices, run the following command:
+To list the mapping of aliases to indexes, run the following command:

 ```json
 GET _cat/aliases?v
@ -121,7 +121,7 @@ alias     index   filter    routing.index   routing.search
 alias1    index-1   *             -                 -
 ```

-To check which indices an alias points to, run the following command:
+To check which indexes an alias points to, run the following command:

 ```json
 GET _alias/alias1
@ -166,7 +166,7 @@ PUT index-1

 ## Create filtered aliases

-You can create a filtered alias to access a subset of documents or fields from the underlying indices.
+You can create a filtered alias to access a subset of documents or fields from the underlying indexes.

 This command adds only a specific timestamp field to `alias1`:

--- a/_opensearch/index-data.md
+++ b/_opensearch/index-data.md
@ -68,16 +68,16 @@ PUT movies/_doc/1

 Because you must specify an ID, if you run this command 10 times, you still have just one document indexed with the `_version` field incremented to 10.

-Indices default to one primary shard and one replica. If you want to specify non-default settings, create the index before adding documents:
+Indexes default to one primary shard and one replica. If you want to specify non-default settings, create the index before adding documents:

 ```json
 PUT more-movies
 { "settings": { "number_of_shards": 6, "number_of_replicas": 2 } }
 ```

-## Naming restrictions for indices
+## Naming restrictions for indexes

-OpenSearch indices have the following naming restrictions:
+OpenSearch indexes have the following naming restrictions:

 - All letters must be lowercase.
 - Index names can't begin with underscores (`_`) or hyphens (`-`).
--- a/_opensearch/index-templates.md
+++ b/_opensearch/index-templates.md
@ -6,7 +6,7 @@ nav_order: 15

 # Index templates

-Index templates let you initialize new indices with predefined mappings and settings. For example, if you continuously index log data, you can define an index template so that all of these indices have the same number of shards and replicas.
+Index templates let you initialize new indexes with predefined mappings and settings. For example, if you continuously index log data, you can define an index template so that all of these indexes have the same number of shards and replicas.

 ### Create a template

@ -95,7 +95,7 @@ GET logs-2020-01-01
 }
 ```

-Any additional indices that match this pattern---`logs-2020-01-02`, `logs-2020-01-03`, and so on---will inherit the same mappings and settings.
+Any additional indexes that match this pattern---`logs-2020-01-02`, `logs-2020-01-03`, and so on---will inherit the same mappings and settings.

 Index patterns cannot contain any of the following characters: `:`, `"`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, and `<`.

@ -127,7 +127,7 @@ HEAD _index_template/<name>

 ### Configure multiple templates

-You can create multiple index templates for your indices. If the index name matches more than one template, OpenSearch merges all mappings and settings from all matching templates and applies them to the index.
+You can create multiple index templates for your indexes. If the index name matches more than one template, OpenSearch merges all mappings and settings from all matching templates and applies them to the index.

 The settings from the more recently created index templates override the settings of older index templates. So, you can first define a few common settings in a generic template that can act as a catch-all and then add more specialized settings as required.

--- a/_opensearch/mappings.md
+++ b/_opensearch/mappings.md
@ -0,0 +1,105 @@
+---
+layout: default
+title: Mapping
+nav_order: 13
+---
+
+# About Mappings
+
+You can define how documents and their fields are stored and indexed by creating a mapping.
+
+If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings.
+
+For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. Using dynamic mapping OpenSearch might interpret both `year` and `age` as integers.
+
+This section provides an example for how to create an index mapping, and how to add a document to it that will get ip_range validated.
+
+#### Table of contents
+1. TOC
+{:toc}
+
+
+---
+## Dynamic mapping
+
+When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping.
+
+#### Dynamic mapping types
+
+Type | Description
+:--- | :---
+null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values.
+boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.`
+float | A single-precision 32-bit floating point number.
+double | A double-precision 64-bit floating point number.
+integer | A signed 32-bit number.
+object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`.
+array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values.
+text | A string sequence of characters that represent full-text values.
+keyword | A string sequence of structured characters, such as an email address or ZIP code.
+date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date.
+numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, and `unsigned_long`. Default is disabled.
+
+## Explicit mapping
+
+If you know exactly what your field data types need to be, you can specify them in your request body when creating your index.
+
+```json
+{
+  "mappings": {
+    "properties": {
+      "year":    { "type" : "text" },
+      "age":     { "type" : "integer" },
+      "director":{ "type" : "text" }
+    }
+  }
+}
+```
+
+### Response
+```json
+{
+    "acknowledged": true,
+    "shards_acknowledged": true,
+    "index": "sample-index1"
+}
+```
+
+---
+## Mapping example usage
+
+The following example shows how to create a mapping to specify that OpenSearch should ignore any documents with malformed ip addresses that do not conform to the `ip_range` data type. You accomplish this by setting the `ignore_malformed` parameter to `true`.
+
+### Create an index with an ip_range mapping
+
+To create an index, use a PUT request:
+
+```json
+PUT _index_ip
+{
+  "mappings": {
+    "dynamic_templates": [
+     {
+        "ip_range": {
+        "match": "*ip_range",
+        "mapping": {
+           "type": "ip_range",
+           "ignore_malformed": true
+      }
+     }
+    }
+   ]
+  }
+}
+```
+
+You can add a document to your index that has an IP range specified:
+
+```json
+PUT _index_ip/_doc/<id>
+{
+  "source_ip_range": "192.168.1.1/32"
+}
+```
+
+This indexed ip_range does not throw an error because `ignore_malformed` is set to true.
--- a/_opensearch/rest-api/index-apis/create-index.md
+++ b/_opensearch/rest-api/index-apis/create-index.md
@ -113,52 +113,4 @@ index.routing.allocation.enable | Specifies options for the index’s shard allo
 index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow rebalancing for all shards), `primaries` (allow rebalancing only for primary shards), `replicas` (allow rebalancing only for replicas), and `none` (do not allow rebalancing). Default is `all`.
 index.gc_deletes | Amount of time to retain a deleted document's version number. Default is `60s`.
 index.default_pipeline | The default ingest node pipeline for the index. If the default pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
-index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
-
-
-### Mappings
-
-Mappings define how a documents and its fields are stored and indexed. If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and their fields. However, if you know exactly what types your data fall under and want to enforce that standard, then you can use explicit mappings.
-
-For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. Using dynamic mapping, OpenSearch might interpret both `year` and `age` as integers.
-
-#### Dynamic mapping types
-
-Type | Description
-:--- | :---
-null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values.
-boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.`
-float | A single-precision 32-bit floating point number.
-double | A double-precision 64-bit floating point number.
-integer | A signed 32-bit number.
-object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`.
-array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values.
-text | A string sequence of characters that represent full-text values.
-keyword | A string sequence of structured characters, such as an email or ZIP code.
-date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date.
-numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, `unsigned_long`. Default is disabled.
-
-#### Explicit mapping
-
-If you know exactly what your data's typings need to be, you can specify them in your request body when creating your index.
-
-```json
-{
-  "mappings": {
-    "properties": {
-      "year":    { "type" : "text" },
-      "age":     { "type" : "integer" },
-      "director":{ "type" : "text" }
-    }
-  }
-}
-```
-
-## Response
-```json
-{
-    "acknowledged": true,
-    "shards_acknowledged": true,
-    "index": "sample-index1"
-}
-```
+index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline.
--- a/_opensearch/rest-api/index-apis/put-mapping.md
+++ b/_opensearch/rest-api/index-apis/put-mapping.md
@ -55,6 +55,7 @@ Parameter | Data Type | Description
 allow_no_indices | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is `true`.
 expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are `all` (match all indexes), `open` (match open indexes), `closed` (match closed indexes), `hidden` (match hidden indexes), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`.
 ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response.
+ignore_malformed | Boolean | Use this parameter with the `ip_range` data type to specify that OpenSearch should ignore malformed fields. If `true`, OpenSearch does not include entries that do not match the IP range specified in the index in the response. The default is `false`.
 master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`.
 timeout | Time | How long to wait for the response to return. Default is `30s`.
 write_index_only | Boolean | Whether OpenSearch should apply mapping updates only to the write index.