repository-s3 also works with S3-compatibles (#38524)

- Notes that you can adjust the `s3.client.*.endpoint` setting to point to a
  repository held on an S3-compatible service.
- Notes that the default is `s3.amazonaws.com` and not to auto-detect the
  endpoint.
- Reformats docs to width.

Closes #35925
This commit is contained in:
David Turner 2019-02-19 10:00:46 +00:00
parent 2e2567e827
commit 4d820d5689
1 changed files with 131 additions and 83 deletions

View File

@ -1,21 +1,25 @@
[[repository-s3]]
=== S3 Repository Plugin
The S3 repository plugin adds support for using S3 as a repository for
The S3 repository plugin adds support for using AWS S3 as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].
*If you are looking for a hosted solution of Elasticsearch on AWS, please visit http://www.elastic.co/cloud.*
*If you are looking for a hosted solution of Elasticsearch on AWS, please visit
http://www.elastic.co/cloud.*
:plugin_name: repository-s3
include::install_remove.asciidoc[]
[[repository-s3-usage]]
==== Getting started with AWS
==== Getting Started
The plugin provides a repository type named `s3` which may be used when creating a repository.
The repository defaults to using https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS IAM Role] or
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2 IAM Role]
credentials for authentication. The only mandatory setting is the bucket name:
The plugin provides a repository type named `s3` which may be used when creating
a repository. The repository defaults to using
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS
IAM Role] or
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2
IAM Role] credentials for authentication. The only mandatory setting is the
bucket name:
[source,js]
----
@ -34,10 +38,10 @@ PUT _snapshot/my_s3_repository
[[repository-s3-client]]
==== Client Settings
The client that you use to connect to S3 has a number of settings available. The
settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. The default client
name that is looked up by an `s3` repository is `default`. It can be modified
using the <<repository-s3-repository,repository setting>> `client`. For example:
The client that you use to connect to S3 has a number of settings available.
The settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. By default,
`s3` repositories use a client named `default`, but this can be modified using
the <<repository-s3-repository,repository setting>> `client`. For example:
[source,js]
----
@ -51,7 +55,7 @@ PUT _snapshot/my_s3_repository
}
----
// CONSOLE
// TEST[skip:we don't have s3 setup while testing this]
// TEST[skip:we don't have S3 setup while testing this]
Most client settings can be added to the `elasticsearch.yml` configuration file
with the exception of the secure settings, which you add to the {es} keystore.
@ -74,9 +78,9 @@ contents, will utilize the latest settings from the keystore. Any existing `s3`
repositories, as well as any newly created ones, will pick up the new values
stored in the keystore.
NOTE: In progress snapshot/restore tasks will not be preempted by a *reload*
of the client's secure settings. The task will complete using the client as it
was built when the operation started.
NOTE: In-progress snapshot/restore tasks will not be preempted by a *reload* of
the client's secure settings. The task will complete using the client as it was
built when the operation started.
The following list contains the available client settings. Those that must be
stored in the keystore are marked as "secure" and are *reloadable*; the other
@ -84,61 +88,86 @@ settings belong in the `elasticsearch.yml` file.
`access_key` ({ref}/secure-settings.html[Secure])::
An s3 access key. The `secret_key` setting must also be specified.
An S3 access key. The `secret_key` setting must also be specified.
`secret_key` ({ref}/secure-settings.html[Secure])::
An s3 secret key. The `access_key` setting must also be specified.
An S3 secret key. The `access_key` setting must also be specified.
`session_token`::
An s3 session token. The `access_key` and `secret_key` settings must also
be specified. (Secure)
An S3 session token. The `access_key` and `secret_key` settings must also be
specified. (Secure)
`endpoint`::
The s3 service endpoint to connect to. This will be automatically
figured out by the s3 client based on the bucket location, but
can be specified explicitly. See http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
The S3 service endpoint to connect to. This defaults to `s3.amazonaws.com`
but the
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region[AWS
documentation] lists alternative S3 endpoints. If you are using an
<<repository-s3-compatible-services,S3-compatible service>> then you should
set this to the service's endpoint.
`protocol`::
The protocol to use to connect to s3. Valid values are either `http`
or `https`. Defaults to `https`.
The protocol to use to connect to S3. Valid values are either `http` or
`https`. Defaults to `https`.
`proxy.host`::
The host name of a proxy to connect to s3 through.
The host name of a proxy to connect to S3 through.
`proxy.port`::
The port of a proxy to connect to s3 through.
The port of a proxy to connect to S3 through.
`proxy.username` ({ref}/secure-settings.html[Secure])::
The username to connect to the `proxy.host` with.
The username to connect to the `proxy.host` with.
`proxy.password` ({ref}/secure-settings.html[Secure])::
The password to connect to the `proxy.host` with.
The password to connect to the `proxy.host` with.
`read_timeout`::
The socket timeout for connecting to s3. The value should specify the unit. For example,
a value of `5s` specifies a 5 second timeout. The default value is 50 seconds.
The socket timeout for connecting to S3. The value should specify the unit.
For example, a value of `5s` specifies a 5 second timeout. The default value
is 50 seconds.
`max_retries`::
The number of retries to use when an s3 request fails. The default value is 3.
The number of retries to use when an S3 request fails. The default value is
`3`.
`use_throttle_retries`::
Whether retries should be throttled (ie use backoff). Must be `true` or `false`. Defaults to `true`.
Whether retries should be throttled (i.e. should back off). Must be `true`
or `false`. Defaults to `true`.
[float]
[[repository-s3-compatible-services]]
===== S3-compatible services
There are a number of storage systems that provide an S3-compatible API, and
the `repository-s3` plugin allows you to use these systems in place of AWS S3.
To do so, you should set the `s3.client.CLIENT_NAME.endpoint` setting to the
system's endpoint. This setting accepts IP addresses and hostnames and may
include a port. For example, the endpoint may be `172.17.0.2` or
`172.17.0.2:9000`. You may also need to set `s3.client.CLIENT_NAME.protocol` to
`http` if the endpoint does not support HTTPS.
https://minio.io[Minio] is an example of a storage system that provides an
S3-compatible API. The `repository-s3` plugin allows {es} to work with
Minio-backed repositories as well as repositories stored on AWS S3. Other
S3-compatible storage systems may also work with {es}, but these are not tested
or supported.
[[repository-s3-repository]]
==== Repository Settings
The `s3` repository type supports a number of settings to customize how data is stored in S3.
These can be specified when creating the repository. For example:
The `s3` repository type supports a number of settings to customize how data is
stored in S3. These can be specified when creating the repository. For example:
[source,js]
----
@ -152,7 +181,7 @@ PUT _snapshot/my_s3_repository
}
----
// CONSOLE
// TEST[skip:we don't have s3 set up while testing this]
// TEST[skip:we don't have S3 set up while testing this]
The following settings are supported:
@ -162,21 +191,21 @@ The following settings are supported:
`client`::
The name of the s3 client to use to connect to S3. Defaults to `default`.
The name of the <<repository-s3-client,S3 client>> to use to connect to S3.
Defaults to `default`.
`base_path`::
Specifies the path within bucket to repository data. Defaults to
value of `repositories.s3.base_path` or to root directory if not set.
Previously, the base_path could take a leading `/` (forward slash).
However, this has been deprecated and setting the base_path now should
omit the leading `/`.
Specifies the path within bucket to repository data. Defaults to value of
`repositories.s3.base_path` or to root directory if not set. Previously,
the base_path could take a leading `/` (forward slash). However, this has
been deprecated and setting the base_path now should omit the leading `/`.
`chunk_size`::
Big files can be broken down into chunks during snapshotting if needed.
The chunk size can be specified in bytes or by using size value notation,
i.e. `1gb`, `10mb`, `5kb`. Defaults to `1gb`.
Big files can be broken down into chunks during snapshotting if needed. The
chunk size can be specified in bytes or by using size value notation, i.e.
`1gb`, `10mb`, `5kb`. Defaults to `1gb`.
`compress`::
@ -191,41 +220,49 @@ The following settings are supported:
`buffer_size`::
Minimum threshold below which the chunk is uploaded using a single
request. Beyond this threshold, the S3 repository will use the
http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS Multipart Upload API]
to split the chunk into several parts, each of `buffer_size` length, and
to upload each part in its own request. Note that setting a buffer
size lower than `5mb` is not allowed since it will prevent the use of the
Multipart API and may result in upload errors. It is also not possible to
set a buffer size greater than `5gb` as it is the maximum upload size
allowed by S3. Defaults to the minimum between `100mb` and `5%` of the heap size.
Minimum threshold below which the chunk is uploaded using a single request.
Beyond this threshold, the S3 repository will use the
http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS
Multipart Upload API] to split the chunk into several parts, each of
`buffer_size` length, and to upload each part in its own request. Note that
setting a buffer size lower than `5mb` is not allowed since it will prevent
the use of the Multipart API and may result in upload errors. It is also not
possible to set a buffer size greater than `5gb` as it is the maximum upload
size allowed by S3. Defaults to the minimum between `100mb` and `5%` of the
heap size.
`canned_acl`::
The S3 repository supports all http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3 canned ACLs]
: `private`, `public-read`, `public-read-write`, `authenticated-read`, `log-delivery-write`,
`bucket-owner-read`, `bucket-owner-full-control`. Defaults to `private`.
You could specify a canned ACL using the `canned_acl` setting. When the S3 repository
creates buckets and objects, it adds the canned ACL into the buckets and objects.
The S3 repository supports all
http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3
canned ACLs] : `private`, `public-read`, `public-read-write`,
`authenticated-read`, `log-delivery-write`, `bucket-owner-read`,
`bucket-owner-full-control`. Defaults to `private`. You could specify a
canned ACL using the `canned_acl` setting. When the S3 repository creates
buckets and objects, it adds the canned ACL into the buckets and objects.
`storage_class`::
Sets the S3 storage class for objects stored in the snapshot repository.
Values may be `standard`, `reduced_redundancy`, `standard_ia`.
Defaults to `standard`. Changing this setting on an existing repository
only affects the storage class for newly created objects, resulting in a
mixed usage of storage classes. Additionally, S3 Lifecycle Policies can
be used to manage the storage class of existing objects.
Due to the extra complexity with the Glacier class lifecycle, it is not
currently supported by the plugin. For more information about the
different classes, see http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS Storage Classes Guide]
Values may be `standard`, `reduced_redundancy`, `standard_ia`. Defaults to
`standard`. Changing this setting on an existing repository only affects the
storage class for newly created objects, resulting in a mixed usage of
storage classes. Additionally, S3 Lifecycle Policies can be used to manage
the storage class of existing objects. Due to the extra complexity with the
Glacier class lifecycle, it is not currently supported by the plugin. For
more information about the different classes, see
http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS
Storage Classes Guide]
NOTE: The option of defining client settings in the repository settings as documented below is considered deprecated:
NOTE: The option of defining client settings in the repository settings as
documented below is considered deprecated, and will be removed in a future
version.
In addition to the above settings, you may also specify all non-secure client settings in the repository settings.
In this case, the client settings found in the repository settings will be merged with those of the named client used by the repository.
Conflicts between client and repository settings are resolved by the repository settings taking precedence over client settings.
In addition to the above settings, you may also specify all non-secure client
settings in the repository settings. In this case, the client settings found in
the repository settings will be merged with those of the named client used by
the repository. Conflicts between client and repository settings are resolved
by the repository settings taking precedence over client settings.
For example:
@ -244,16 +281,19 @@ PUT _snapshot/my_s3_repository
// CONSOLE
// TEST[skip:we don't have s3 set up while testing this]
This sets up a repository that uses all client settings from the client `my_client_named` except for the `endpoint` that is overridden
to `my.s3.endpoint` by the repository settings.
This sets up a repository that uses all client settings from the client
`my_client_name` except for the `endpoint` that is overridden to
`my.s3.endpoint` by the repository settings.
[[repository-s3-permissions]]
===== Recommended S3 Permissions
In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon
IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an
S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy,
and using a Policy Document similar to this (changing snaps.example.com to your bucket name).
In order to restrict the Elasticsearch snapshot process to the minimum required
resources, we recommend using Amazon IAM in conjunction with pre-existing S3
buckets. Here is an example policy which will allow the snapshot access to an S3
bucket named "snaps.example.com". This may be configured through the AWS IAM
console, by creating a Custom Policy, and using a Policy Document similar to
this (changing snaps.example.com to your bucket name).
[source,js]
----
@ -290,7 +330,8 @@ IAM in conjunction with pre-existing S3 buckets. Here is an example policy which
----
// NOTCONSOLE
You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".
You may further restrict the permissions by specifying a prefix within the
bucket, in this example, named "foo".
[source,js]
----
@ -334,16 +375,23 @@ You may further restrict the permissions by specifying a prefix within the bucke
----
// NOTCONSOLE
The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository
registration will fail.
The bucket needs to exist to register a repository for snapshots. If you did not
create the bucket then the repository registration will fail.
Note: Starting in version 7.0, all bucket operations are using the path style access pattern. In previous versions the decision to use virtual hosted style
or path style access was made by the AWS Java SDK.
Note: Starting in version 7.0, all bucket operations are using the path style
access pattern. In previous versions the decision to use virtual hosted style or
path style access was made by the AWS Java SDK.
[[repository-s3-aws-vpc]]
[float]
==== AWS VPC Bandwidth Settings
AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch instances reside in a private subnet in an AWS VPC then all traffic to S3 will go through that VPC's NAT instance. If your VPC's NAT instance is a smaller instance size (e.g. a t1.micro) or is handling a high volume of network traffic your bandwidth to S3 may be limited by that NAT instance's networking bandwidth limitations.
AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch
instances reside in a private subnet in an AWS VPC then all traffic to S3 will
go through that VPC's NAT instance. If your VPC's NAT instance is a smaller
instance size (e.g. a t1.micro) or is handling a high volume of network traffic
your bandwidth to S3 may be limited by that NAT instance's networking bandwidth
limitations.
Instances residing in a public subnet in an AWS VPC will connect to S3 via the VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.
Instances residing in a public subnet in an AWS VPC will connect to S3 via the
VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.