[DOC] Add new documentation for IP2Geo (#4998)
* Approved through tech, doc, and editorial Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Publish documentation Signed-off-by: Melissa Vagi <vagimeli@amazon.com> --------- Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
This commit is contained in:
parent
8e44362c62
commit
0fcc7d7060
|
@ -0,0 +1,243 @@
|
|||
---
|
||||
layout: default
|
||||
title: IP2Geo
|
||||
parent: Ingest processors
|
||||
grand_parent: Ingest APIs
|
||||
nav_order: 130
|
||||
---
|
||||
|
||||
# IP2Geo
|
||||
Introduced 2.10
|
||||
{: .label .label-purple }
|
||||
|
||||
The `ip2geo` processor adds information about the geographical location of an IPv4 or IPv6 address. The `ip2geo` processor uses IP geolocation (GeoIP) data from an external endpoint and therefore requires an additional component, `datasource`, that defines from where to download GeoIP data and how frequently to update the data.
|
||||
|
||||
{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/info-icon.png" class="inline-icon" alt="info icon"/>{:/} **NOTE**<br>The `ip2geo` processor maintains the GeoIP data mapping in system indexes. The GeoIP mapping is retrieved from these indexes during data ingestion to perform the IP-to-geolocation conversion on the incoming data. For optimal performance, it is preferable to have a node with both ingest and data roles, as this configuration avoids internode calls reducing latency. Also, as the `ip2geo` processor searches GeoIP mapping data from the indexes, search performance is impacted.
|
||||
{: .note}
|
||||
|
||||
## Getting started
|
||||
|
||||
To get started with the `ip2geo` processor, the `opensearch-geospatial` plugin must be installed. See [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) to learn more.
|
||||
|
||||
## Cluster settings
|
||||
|
||||
The IP2Geo data source and `ip2geo` processor node settings are listed in the following table.
|
||||
|
||||
| Key | Description | Default |
|
||||
|--------------------|-------------|---------|
|
||||
| plugins.geospatial.ip2geo.datasource.endpoint | Default endpoint for creating the data source API. | Defaults to https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json. |
|
||||
| plugins.geospatial.ip2geo.datasource.update_interval_in_days | Default update interval for creating the data source API. | Defaults to 3. |
|
||||
| plugins.geospatial.ip2geo.datasource.batch_size | Maximum number of documents to ingest in a bulk request during the IP2Geo data source creation process. | Defaults to 10,000. |
|
||||
| plugins.geospatial.ip2geo.processor.cache_size | Maximum number of results that can be cached. There is only one cache used for all IP2Geo processors in each node | Defaults to 1,000. |
|
||||
|-------------------|-------------|---------|
|
||||
|
||||
## Creating the IP2Geo data source
|
||||
|
||||
Before creating the pipeline that uses the `ip2geo` processor, create the IP2Geo data source. The data source defines the endpoint value that will download GeoIP data and specifies the update interval.
|
||||
|
||||
OpenSearch provides the following endpoints for GeoLite2 City, GeoLite2 Country, and GeoLite2 ASN databases from [MaxMind](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data), which is shared under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license:
|
||||
|
||||
* GeoLite2 City: https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json
|
||||
* GeoLite2 Country: https://geoip.maps.opensearch.org/v1/geolite2-country/manifest.json
|
||||
* GeoLite2 ASN: https://geoip.maps.opensearch.org/v1/geolite2-asn/manifest.json
|
||||
|
||||
If an OpenSearch cluster cannot update a data source from the endpoints within 30 days, the cluster does not add GeoIP data to the documents and instead adds `"error":"ip2geo_data_expired"`.
|
||||
|
||||
### Data source options
|
||||
|
||||
The following table lists the data source options for the `ip2geo` processor.
|
||||
|
||||
| Name | Required | Default | Description |
|
||||
|------|----------|---------|-------------|
|
||||
| `endpoint` | Optional | https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json | The endpoint that downloads the GeoIP data. |
|
||||
| `update_interval_in_days` | Optional | 3 | How frequently, in days, the GeoIP data is updated. The minimum value is 1. |
|
||||
|
||||
To create an IP2Geo data source, run the following query:
|
||||
|
||||
```json
|
||||
PUT /_plugins/geospatial/ip2geo/datasource/my-datasource
|
||||
{
|
||||
"endpoint" : "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json",
|
||||
"update_interval_in_days" : 3
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
A `true` response means that the request was successful and that the server was able to process the request. A `false` response indicates that you should check the request to make sure it is valid, check the URL to make sure it is correct, or try again.
|
||||
|
||||
### Sending a GET request
|
||||
|
||||
To get information about one or more IP2Geo data sources, send a GET request:
|
||||
|
||||
```json
|
||||
GET /_plugins/geospatial/ip2geo/datasource/my-datasource
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
You'll receive the following response:
|
||||
|
||||
```json
|
||||
{
|
||||
"datasources": [
|
||||
{
|
||||
"name": "my-datasource",
|
||||
"state": "AVAILABLE",
|
||||
"endpoint": "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json",
|
||||
"update_interval_in_days": 3,
|
||||
"next_update_at_in_epoch_millis": 1685125612373,
|
||||
"database": {
|
||||
"provider": "maxmind",
|
||||
"sha256_hash": "0SmTZgtTRjWa5lXR+XFCqrZcT495jL5XUcJlpMj0uEA=",
|
||||
"updated_at_in_epoch_millis": 1684429230000,
|
||||
"valid_for_in_days": 30,
|
||||
"fields": [
|
||||
"country_iso_code",
|
||||
"country_name",
|
||||
"continent_name",
|
||||
"region_iso_code",
|
||||
"region_name",
|
||||
"city_name",
|
||||
"time_zone",
|
||||
"location"
|
||||
]
|
||||
},
|
||||
"update_stats": {
|
||||
"last_succeeded_at_in_epoch_millis": 1684866730192,
|
||||
"last_processing_time_in_millis": 317640,
|
||||
"last_failed_at_in_epoch_millis": 1684866730492,
|
||||
"last_skipped_at_in_epoch_millis": 1684866730292
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Updating an IP2Geo data source
|
||||
|
||||
See the Creating the IP2Geo data source section for a list of endpoints and request field descriptions.
|
||||
|
||||
To update the date source, run the following query:
|
||||
|
||||
```json
|
||||
PUT /_plugins/geospatial/ip2geo/datasource/my-datasource/_settings
|
||||
{
|
||||
"endpoint": https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json,
|
||||
"update_interval_in_days": 10
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Deleting the IP2Geo data source
|
||||
|
||||
To delete the IP2Geo data source, you must first delete all processors associated with the data source. Otherwise, the request fails.
|
||||
|
||||
To delete the data source, run the following query:
|
||||
|
||||
```json
|
||||
DELETE /_plugins/geospatial/ip2geo/datasource/my-datasource
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Creating the pipeline
|
||||
|
||||
Once the data source is created, you can create the pipeline. The following is the syntax for the `ip2geo` processor:
|
||||
|
||||
```json
|
||||
{
|
||||
"ip2geo": {
|
||||
"field":"ip",
|
||||
"datasource":"my-datasource"
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
### Configuration parameters
|
||||
|
||||
The following table lists the required and optional parameters for the `ip2geo` processor.
|
||||
|
||||
| Name | Required | Default | Description |
|
||||
|------|----------|---------|-------------|
|
||||
| `datasource` | Required | - | The data source name to use to retrieve geographical information. |
|
||||
| `field` | Required | - | The field that contains the IP address for geographical lookup. |
|
||||
| `ignore_missing` | Optional | false | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. |
|
||||
| `properties` | Optional | All fields in `datasource` | The field that controls which properties are added to `target_field` from `datasource`. |
|
||||
| `target_field` | Optional | ip2geo | The field that contains the geographical information retrieved from the data source. |
|
||||
|
||||
## Using the processor
|
||||
|
||||
Follow these steps to use the processor in a pipeline.
|
||||
|
||||
**Step 1: Create a pipeline.**
|
||||
|
||||
The following query creates a pipeline, named `my-pipeline`, that converts the IP address to geographical information:
|
||||
|
||||
```json
|
||||
PUT /_ingest/pipeline/my-pipeline
|
||||
{
|
||||
"description":"convert ip to geo",
|
||||
"processors":[
|
||||
{
|
||||
"ip2geo":{
|
||||
"field":"ip",
|
||||
"datasource":"my-datasource"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
**Step 2 (Optional): Test the pipeline.**
|
||||
|
||||
{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/info-icon.png" class="inline-icon" alt="info icon"/>{:/} **NOTE**<br>It is recommended that you test your pipeline before you ingest documents.
|
||||
{: .note}
|
||||
|
||||
To test the pipeline, run the following query:
|
||||
|
||||
```json
|
||||
POST _ingest/pipeline/my-id/_simulate
|
||||
{
|
||||
"docs": [
|
||||
{
|
||||
"_index":"my-index",
|
||||
"_id":"my-id",
|
||||
"_source":{
|
||||
"my_ip_field":"172.0.0.1",
|
||||
"ip2geo":{
|
||||
"continent_name":"North America",
|
||||
"region_iso_code":"AL",
|
||||
"city_name":"Calera",
|
||||
"country_iso_code":"US",
|
||||
"country_name":"United States",
|
||||
"region_name":"Alabama",
|
||||
"location":"33.1063,-86.7583",
|
||||
"time_zone":"America/Chicago"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
**Step 3: Ingest a document.**
|
||||
|
||||
The following query ingests a document into an index named `my-index`:
|
||||
|
||||
```json
|
||||
PUT /my-index/_doc/my-id?pipeline=ip2geo
|
||||
{
|
||||
"ip": "172.0.0.1"
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
**Step 4 (Optional): Retrieve the document.**
|
||||
|
||||
To retrieve the document, run the following query:
|
||||
|
||||
```json
|
||||
GET /my-index/_doc/my-id
|
||||
```
|
||||
{% include copy-curl.html %}
|
Loading…
Reference in New Issue