Link checker plugin and some fixes to URLs

Signed-off-by: Miki <mehranb@amazon.com>
This commit is contained in:
Miki 2021-08-10 11:54:40 -07:00
parent 187bccec6b
commit 634db90e9b
17 changed files with 252 additions and 18 deletions

View File

@ -4,6 +4,8 @@ title: Logstash
nav_order: 200
has_children: true
has_toc: true
redirect_from:
- /logstash/
---
# Logstash

View File

@ -9,7 +9,7 @@ nav_order: 220
You can Ship Logstash events to an OpenSearch cluster and then visualize your events with OpenSearch Dashboards.
Make sure you have [Logstash]({{site.url}}{{site.baseurl}}/logstash/index/#install-logstash-on-mac--linux), [OpenSearch]({{site.url}}{{site.baseurl}}/opensearch/install/index/), and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/install/index/).
Make sure you have [Logstash]({{site.url}}{{site.baseurl}}/clients/logstash/index/#install-logstash), [OpenSearch]({{site.url}}{{site.baseurl}}/opensearch/install/index/), and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/install/index/).
{: .note }
## OpenSearch output plugin

View File

@ -94,7 +94,7 @@ For a list of available unit types, see [Supported units]({{site.url}}{{site.bas
ISM supports the following operations:
- [force_merge](#forcemerge)
- [force_merge](#force_merge)
- [read_only](#read_only)
- [read_write](#read_write)
- [replica_count](#replica_count)

View File

@ -874,7 +874,7 @@ GET _plugins/_alerting/monitors/alerts
Introduced 1.0
{: .label .label-purple }
[After getting your alerts](#get-alerts/), you can acknowledge any number of active alerts in one call. If the alert is already in an ERROR, COMPLETED, or ACKNOWLEDGED state, it appears in the `failed` array.
[After getting your alerts](#get-alerts), you can acknowledge any number of active alerts in one call. If the alert is already in an ERROR, COMPLETED, or ACKNOWLEDGED state, it appears in the `failed` array.
#### Request

View File

@ -34,7 +34,7 @@ Destination | A reusable location for an action, such as Amazon Chime, Slack, or
1. Specify a name for the destination so that you can identify it later.
1. For **Type**, choose Slack, Amazon Chime, custom webhook, or [email](#email-as-a-destination).
For Email type, refer to [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. For more information about webhooks, see the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html).
For Email type, refer to [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. For more information about webhooks, see the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Amazon Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html).
For custom webhooks, you must specify more information: parameters and headers. For example, if your endpoint requires basic authentication, you might need to add a header with a key of `Authorization` and a value of `Basic <Base64-encoded-credential-string>`. You might also need to change `Content-Type` to whatever your webhook requires. Popular values are `application/json`, `application/xml`, and `text/plain`.
@ -296,7 +296,7 @@ Variable | Data Type | Description
`ctx.trigger.actions.destination_id`| String | The alert destination's ID.
`ctx.trigger.actions.message_template.source` | String | The message to send in the alert.
`ctx.trigger.actions.message_template.lang` | String | The scripting language used to define the message. Must be Mustache.
`ctx.trigger.actions.throttle_enabled` | Boolean | Whether throttling is enabled for this trigger. See [adding actions](#add-actions/) for more information about throttling.
`ctx.trigger.actions.throttle_enabled` | Boolean | Whether throttling is enabled for this trigger. See [adding actions](#add-actions) for more information about throttling.
`ctx.trigger.actions.subject_template.source` | String | The message's subject in the alert.
`ctx.trigger.actions.subject_template.lang` | String | The scripting language used to define the subject. Must be mustache.

View File

@ -660,7 +660,7 @@ GET opensearch_dashboards_sample_data_logs/_search
```
The `ip_range` aggregation is for IP addresses.
It works on `ip` type fields. You can define the IP ranges and masks in the [CIDR](http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) notation.
It works on `ip` type fields. You can define the IP ranges and masks in the [CIDR](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) notation.
```json
GET opensearch_dashboards_sample_data_logs/_search
@ -1026,7 +1026,7 @@ GET opensearch_dashboards_sample_data_logs/_search
The `geohash_grid` aggregation buckets documents for geographical analysis. It organizes a geographical region into a grid of smaller regions of different sizes or precisions. Lower values of precision represent larger geographical areas and higher values represent smaller, more precise geographical areas.
The number of results returned by a query might be far too many to display each geo point individually on a map. The `geohash_grid` aggregation buckets nearby geo points together by calculating the Geohash for each point, at the level of precision that you define (between 1 to 12; the default is 5). To learn more about Geohash, see [Wikipedia](http://en.wikipedia.org/wiki/Geohash).
The number of results returned by a query might be far too many to display each geo point individually on a map. The `geohash_grid` aggregation buckets nearby geo points together by calculating the Geohash for each point, at the level of precision that you define (between 1 to 12; the default is 5). To learn more about Geohash, see [Wikipedia](https://en.wikipedia.org/wiki/Geohash).
The web logs example data is spread over a large geographical area, so you can use a lower precision value. You can zoom in on this map by increasing the precision value:

View File

@ -289,4 +289,4 @@ You can use wildcards to delete more than one data stream.
We recommend deleting data from a data stream using an ISM policy.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.

View File

@ -12,7 +12,7 @@ For example, if you use OpenSearch as a backend search engine for your applicati
When you're writing code to convert user input into OpenSearch queries, you can simplify your code with search templates. If you need to add fields to your search query, you can just modify the template without making changes to your code.
Search templates use the Mustache language. For a list of all syntax options, see the [Mustache manual](http://mustache.github.io/mustache.5.html).
Search templates use the Mustache language. For a list of all syntax options, see the [Mustache manual](https://mustache.github.io/mustache.5.html).
{: .note }
## Create search templates

232
_plugins/link-checker.rb Normal file
View File

@ -0,0 +1,232 @@
# frozen_string_literal: true
require "jekyll/hooks"
require "jekyll/document"
require "json"
require "set"
require "uri"
require "pathname"
##
# This singleton checks links during build to warn or fail upon finding dead links.
#
# `JEKYLL_CHECK_EXTERNAL_LINKS`, set on the environment, will cause verification of external links, irrespective of its
# value. Usage: `JEKYLL_CHECK_EXTERNAL_LINKS= bundle exec jekyll build --trace`
#
# `JEKYLL_FATAL_LINK_CHECKER`, set on the environment, will cause the build to fail if an internal dead link is found.
# If set as `JEKYLL_FATAL_LINK_CHECKER=2`, the build will fail for internal and external dead links; in this case, there
# is no need to set `JEKYLL_CHECK_EXTERNAL_LINKS`.
module Jekyll::LinkChecker
##
# The collection that will get stores as the output
@urls = {}
##
# Pattern to identify documents that should be excluded based on their URL
@excluded_paths = /(\.(css|js|json|map|xml|txt|yml)$)/i.freeze
##
# Pattern to identify certain HTML tags whose content should be excluded from indexing
@href_matcher = /<a[^>]+href=(['"])(.+?)\1/im.freeze
##
# Pattern to check for external URLs
@external_matcher = /^https?:\/\//.freeze
##
# List of domains to ignore
@ignored_domains = %w[localhost]
##
# Pattern of local paths to ignore
@ignored_paths = /(^\/javadocs\/)/.freeze
##
# Valid response codes for successful links
@success_codes = %w[200 302]
##
# Questionable response codes for successful links
@questionable_codes = %w[301 403 429]
##
# Holds the list of failures
@failures = []
##
# Driven by environment variables, it indicates a need to check external links
@check_external_links
##
# Driven by environment variables, it indicates the need to fail the build for dead links
@should_build_fatally
##
# Initializes the singleton by recording the site
def self.init(site)
@site = site
@urls = {}
@failures = []
end
##
# Processes a Document or Page and adds the links to a collection
# It also checks for anchors to parts of the same page/doc
def self.process(page)
return if @excluded_paths.match(page.path)
hrefs = page.content.scan(@href_matcher)
hrefs.each do |(_, href)|
relative_path = page.path[0] == '/' ? Pathname.new(page.path).relative_path_from(Dir.getwd) : page.path
if href.start_with? '#'
p relative_path if (page.content =~ /<[a-z0-9-]+[^>]+id="#{href[1..]}"/i).nil?
@failures << "##{href[1..]}, linked in ./#{relative_path}" if (page.content =~ /<[a-z0-9-]+[^>]+id="#{href[1..]}"/i).nil?
else
@urls[href] = Set[] unless @urls.key?(href)
@urls[href] << relative_path
end
end
end
##
# Saves the collection as a JSON file
def self.verify(site)
if ENV.key?('JEKYLL_CHECK_EXTERNAL_LINKS')
@check_external_links = true
puts "LinkChecker: [Notice] Will verify external links"
end
if ENV.key?('JEKYLL_FATAL_LINK_CHECKER')
@should_build_fatally = true
if ENV['JEKYLL_FATAL_LINK_CHECKER'] == '2'
@check_external_links = true
puts "LinkChecker: [Notice] The build will fail if any dead links are found"
else
puts "LinkChecker: [Notice] The build will fail if a dead internal link is found"
end
end
@base_url_matcher = /^#{@site.config["url"]}#{@site.baseurl}(\/.*)$/.freeze
@urls.each do |url, pages|
@failures << "#{url}, linked to in ./#{pages.to_a.join(", ./")}" unless self.check(url)
end
msg = "Found #{@failures.size} dead link#{@failures.size > 1 ? 's' : ''}:\n#{@failures.join("\n")}" unless @failures.empty?
if @should_build_fatally
raise msg
else
puts "\nLinkChecker: [Warning] #{msg}\n"
end
end
##
# Check if URL is accessible
def self.check(url)
match = @base_url_matcher.match(url)
unless match.nil?
url = match[1]
end
if @external_matcher =~ url
return true unless @check_external_links
return self.check_external(url)
end
return self.check_internal(url)
end
##
# Check if an external URL is accessible by making a HEAD call
def self.check_external(url)
uri = URI(url)
return true if @ignored_domains.include? uri.host
(Net::HTTP.new uri.host, uri.port).tap do |http|
http.use_ssl = true
end.start do |http|
http.use_ssl = (uri.scheme == "https")
request = Net::HTTP::Get.new(uri)
http.request(request) do |response|
return true if @success_codes.include? response.code
puts "LinkChecker: [Warning] Got #{response.code} from #{url}"
return @questionable_codes.include? response.code
end
end
end
##
# Check if an internal link is accessible
def self.check_internal(url)
return true if @ignored_paths =~ url
path, hash = url.split('#')
unless path.end_with? 'index.html'
path << '/' unless path.end_with? '/'
path << 'index.html' unless path.end_with? 'index.html'
end
filename = File.join(@site.config["destination"], path)
return false unless File.file?(filename)
content = File.read(filename)
unless content.include? "<title>Redirecting"
return true if hash.nil? || hash.empty?
return !(content =~ /<[a-z0-9-]+[^>]+id="#{hash}"/i).nil?
end
match = content.match(@href_matcher)
if match.nil?
puts "LinkChecker: [Warning] Cannot check #{url} due to an unfollowable redirect"
return true
end
redirect = match[2]
redirect << '#' + hash unless hash.nil? || hash.empty?
return self.check(redirect)
end
end
# Before any Document or Page is processed, initialize the LinkChecker
Jekyll::Hooks.register :site, :pre_render do |site|
Jekyll::LinkChecker.init(site)
end
# Process a Page as soon as its content is ready
Jekyll::Hooks.register :pages, :post_convert do |page|
Jekyll::LinkChecker.process(page)
end
# Process a Document as soon as its content is ready
Jekyll::Hooks.register :documents, :post_convert do |document|
Jekyll::LinkChecker.process(document)
end
# Verify gathered links after Jekyll is done writing all its stuff
Jekyll::Hooks.register :site, :post_write do |site|
Jekyll::LinkChecker.verify(site)
end

View File

@ -12,7 +12,7 @@ redirect_from: /knn/approximate-knn/
The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the Hierarchical Navigable Small World (HNSW) algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is preferred.
The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.

View File

@ -14,7 +14,7 @@ With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance
## Get started with k-NN's Painless Scripting functions
To use k-NN's Painless Scripting functions, first create an index with `knn_vector` fields like in [k-NN score script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script#getting-started-with-the-score-script). Once the index is created and you ingest some data, you can use the painless extensions:
To use k-NN's Painless Scripting functions, first create an index with `knn_vector` fields like in [k-NN score script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script#getting-started-with-the-score-script-for-vectors). Once the index is created and you ingest some data, you can use the painless extensions:
```json
GET my-knn-index-2/_search

View File

@ -12,7 +12,7 @@ The security plugin includes an internal user database. Use this database in pla
Roles are the core way of controlling access to your cluster. Roles contain any combination of cluster-wide permissions, index-specific permissions, document- and field-level security, and tenants. Then you map users to these roles so that users gain those permissions.
Unless you need to create new [reserved or hidden users]({{site.url}}{{site.baseurl}}/security-plugin/access-control/api#read-only-and-hidden-resources), we **highly** recommend using OpenSearch Dashboards or the REST API to create new users, roles, and role mappings. The `.yml` files are for initial setup, not ongoing use.
Unless you need to create new [reserved or hidden users]({{site.url}}{{site.baseurl}}/security-plugin/access-control/api#reserved-and-hidden-resources), we **highly** recommend using OpenSearch Dashboards or the REST API to create new users, roles, and role mappings. The `.yml` files are for initial setup, not ongoing use.
{: .warning }
---

View File

@ -175,7 +175,7 @@ Use a date pattern in the index name to configure daily, weekly, or monthly roll
plugins.security.audit.config.index: "'auditlog-'YYYY.MM.dd"
```
For a reference on the date pattern format, see the [Joda DateTimeFormat documentation](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html).
For a reference on the date pattern format, see the [Joda DateTimeFormat documentation](https://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html).
## (Advanced) Tune the thread pool

View File

@ -237,7 +237,7 @@ In this case, the header states that the message was signed using HMAC-SHA256.
### Payload
The payload of a JSON web token contains the so-called [JWT Claims](http://self-issued.info/docs/draft-ietf-oauth-json-web-token.html#RegisteredClaimName). A claim can be any piece of information about the user that the application that created the token has verified.
The payload of a JSON web token contains the so-called [JWT Claims](https://self-issued.info/docs/draft-ietf-oauth-json-web-token.html#RegisteredClaimName). A claim can be any piece of information about the user that the application that created the token has verified.
The specification defines a set of standard claims with reserved names ("registered claims"). These include, for example, the token issuer, the expiration date, or the creation date.

View File

@ -113,7 +113,7 @@ When an IdP generates and signs a JSON web token, it must add the ID of the key
}
```
As per the [OpenID Connect specification](http://openid.net/specs/openid-connect-messages-1_0-20.html), the `kid` (key ID) is mandatory. Token verification does not work if an IdP fails to add the `kid` field to the JWT.
As per the [OpenID Connect specification](https://openid.net/specs/openid-connect-messages-1_0-20.html), the `kid` (key ID) is mandatory. Token verification does not work if an IdP fails to add the `kid` field to the JWT.
If the security plugin receives a JWT with an unknown `kid`, it visits the IdP's `jwks_uri` and retrieves all available, valid keys. These keys are used and cached until a refresh is triggered by retrieving another unknown key ID.

View File

@ -29,7 +29,7 @@ The operating system for each OpenSearch node handles encryption of data at rest
cryptsetup luksFormat --key-file <key> <partition>
```
For full documentation on the command, see [the Linux man page](http://man7.org/linux/man-pages/man8/cryptsetup.8.html).
For full documentation on the command, see [the Linux man page](https://man7.org/linux/man-pages/man8/cryptsetup.8.html).
{% comment %}
## Beats

View File

@ -21,7 +21,7 @@ This page includes troubleshooting steps for configuring TLS certificates with t
## Validate YAML
`opensearch.yml` and the files in `opensearch_security/securityconfig/` are in the YAML format. A linter like [YAML Lint](http://www.yamllint.com/) can help verify that you don't have any formatting errors.
`opensearch.yml` and the files in `opensearch_security/securityconfig/` are in the YAML format. A linter like [YAML Validator](https://codebeautify.org/yaml-validator) can help verify that you don't have any formatting errors.
## View contents of PEM certificates
@ -207,7 +207,7 @@ plugins.security.ssl.http.enabled_protocols:
TLS relies on the server and client negotiating a common cipher suite. Depending on your system, the available ciphers will vary. They depend on the JDK or OpenSSL version you're using, and whether or not the `JCE Unlimited Strength Jurisdiction Policy Files` are installed.
For legal reasons, the JDK does not include strong ciphers like AES256. In order to use strong ciphers you need to download and install the [Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files](http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html). If you don't have them installed, you might see an error message on startup:
For legal reasons, the JDK does not include strong ciphers like AES256. In order to use strong ciphers you need to download and install the [Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files](https://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html). If you don't have them installed, you might see an error message on startup:
```
[INFO ] AES-256 not supported, max key length for AES is 128 bit.