OpenSearch/docs/plugins/repository-s3.asciidoc

434 lines
15 KiB
Plaintext

[[repository-s3]]
=== S3 Repository Plugin
The S3 repository plugin adds support for using AWS S3 as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].
*If you are looking for a hosted solution of Elasticsearch on AWS, please visit
http://www.elastic.co/cloud.*
:plugin_name: repository-s3
include::install_remove.asciidoc[]
[[repository-s3-usage]]
==== Getting Started
The plugin provides a repository type named `s3` which may be used when creating
a repository. The repository defaults to using
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS
IAM Role] or
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2
IAM Role] credentials for authentication. The only mandatory setting is the
bucket name:
[source,console]
----
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"bucket": "my_bucket"
}
}
----
// TEST[skip:we don't have s3 setup while testing this]
[[repository-s3-client]]
==== Client Settings
The client that you use to connect to S3 has a number of settings available.
The settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. By default,
`s3` repositories use a client named `default`, but this can be modified using
the <<repository-s3-repository,repository setting>> `client`. For example:
[source,console]
----
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"bucket": "my_bucket",
"client": "my_alternate_client"
}
}
----
// TEST[skip:we don't have S3 setup while testing this]
Most client settings can be added to the `elasticsearch.yml` configuration file
with the exception of the secure settings, which you add to the {es} keystore.
For more information about creating and updating the {es} keystore, see
{ref}/secure-settings.html[Secure settings].
For example, if you want to use specific credentials to access S3 then run the
following commands to add these credentials to the keystore:
[source,sh]
----
bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key
# a session token is optional so the following command may not be needed
bin/elasticsearch-keystore add s3.client.default.session_token
----
If instead you want to use the instance role or container role to access S3
then you should leave these settings unset. You can switch from using specific
credentials back to the default of using the instance role or container role by
removing these settings from the keystore as follows:
[source,sh]
----
bin/elasticsearch-keystore remove s3.client.default.access_key
bin/elasticsearch-keystore remove s3.client.default.secret_key
# a session token is optional so the following command may not be needed
bin/elasticsearch-keystore remove s3.client.default.session_token
----
*All* client secure settings of this plugin are
{ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you
reload the settings, the internal `s3` clients, used to transfer the snapshot
contents, will utilize the latest settings from the keystore. Any existing `s3`
repositories, as well as any newly created ones, will pick up the new values
stored in the keystore.
NOTE: In-progress snapshot/restore tasks will not be preempted by a *reload* of
the client's secure settings. The task will complete using the client as it was
built when the operation started.
The following list contains the available client settings. Those that must be
stored in the keystore are marked as "secure" and are *reloadable*; the other
settings belong in the `elasticsearch.yml` file.
`access_key` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
An S3 access key. If set, the `secret_key` setting must also be specified.
If unset, the client will use the instance or container role instead.
`secret_key` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
An S3 secret key. If set, the `access_key` setting must also be specified.
`session_token` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
An S3 session token. If set, the `access_key` and `secret_key` settings
must also be specified.
`endpoint`::
The S3 service endpoint to connect to. This defaults to `s3.amazonaws.com`
but the
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region[AWS
documentation] lists alternative S3 endpoints. If you are using an
<<repository-s3-compatible-services,S3-compatible service>> then you should
set this to the service's endpoint.
`protocol`::
The protocol to use to connect to S3. Valid values are either `http` or
`https`. Defaults to `https`.
`proxy.host`::
The host name of a proxy to connect to S3 through.
`proxy.port`::
The port of a proxy to connect to S3 through.
`proxy.username` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
The username to connect to the `proxy.host` with.
`proxy.password` ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable])::
The password to connect to the `proxy.host` with.
`read_timeout`::
The socket timeout for connecting to S3. The value should specify the unit.
For example, a value of `5s` specifies a 5 second timeout. The default value
is 50 seconds.
`max_retries`::
The number of retries to use when an S3 request fails. The default value is
`3`.
`use_throttle_retries`::
Whether retries should be throttled (i.e. should back off). Must be `true`
or `false`. Defaults to `true`.
`path_style_access`::
Whether to force the use of the path style access pattern. If `true`, the
path style access pattern will be used. If `false`, the access pattern will
be automatically determined by the AWS Java SDK (See
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#setPathStyleAccessEnabled-java.lang.Boolean-[AWS
documentation] for details). Defaults to `false`.
[[repository-s3-path-style-deprecation]]
NOTE: In versions `7.0`, `7.1`, `7.2` and `7.3` all bucket operations used the
https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/[now-deprecated]
path style access pattern. If your deployment requires the path style access
pattern then you should set this setting to `true` when upgrading.
`disable_chunked_encoding`::
Whether chunked encoding should be disabled or not. If `false`, chunked
encoding is enabled and will be used where appropriate. If `true`, chunked
encoding is disabled and will not be used, which may mean that snapshot
operations consume more resources and take longer to complete. It should
only be set to `true` if you are using a storage service that does not
support chunked encoding. See the
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#disableChunkedEncoding--[AWS
Java SDK documentation] for details. Defaults to `false`.
[float]
[[repository-s3-compatible-services]]
===== S3-compatible services
There are a number of storage systems that provide an S3-compatible API, and
the `repository-s3` plugin allows you to use these systems in place of AWS S3.
To do so, you should set the `s3.client.CLIENT_NAME.endpoint` setting to the
system's endpoint. This setting accepts IP addresses and hostnames and may
include a port. For example, the endpoint may be `172.17.0.2` or
`172.17.0.2:9000`. You may also need to set `s3.client.CLIENT_NAME.protocol` to
`http` if the endpoint does not support HTTPS.
https://minio.io[Minio] is an example of a storage system that provides an
S3-compatible API. The `repository-s3` plugin allows {es} to work with
Minio-backed repositories as well as repositories stored on AWS S3. Other
S3-compatible storage systems may also work with {es}, but these are not tested
or supported.
[[repository-s3-repository]]
==== Repository Settings
The `s3` repository type supports a number of settings to customize how data is
stored in S3. These can be specified when creating the repository. For example:
[source,console]
----
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"bucket": "my_bucket_name",
"another_setting": "setting_value"
}
}
----
// TEST[skip:we don't have S3 set up while testing this]
The following settings are supported:
`bucket`::
The name of the bucket to be used for snapshots. (Mandatory)
`client`::
The name of the <<repository-s3-client,S3 client>> to use to connect to S3.
Defaults to `default`.
`base_path`::
Specifies the path within bucket to repository data. Defaults to value of
`repositories.s3.base_path` or to root directory if not set. Previously,
the base_path could take a leading `/` (forward slash). However, this has
been deprecated and setting the base_path now should omit the leading `/`.
`chunk_size`::
Big files can be broken down into chunks during snapshotting if needed.
Specify the chunk size as a value and unit, for example:
`1GB`, `10MB`, `5KB`, `500B`. Defaults to `1GB`.
`compress`::
When set to `true` metadata files are stored in compressed format. This
setting doesn't affect index files that are already compressed by default.
Defaults to `false`.
include::repository-shared-settings.asciidoc[]
`server_side_encryption`::
When set to `true` files are encrypted on server side using AES256
algorithm. Defaults to `false`.
`buffer_size`::
Minimum threshold below which the chunk is uploaded using a single request.
Beyond this threshold, the S3 repository will use the
http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS
Multipart Upload API] to split the chunk into several parts, each of
`buffer_size` length, and to upload each part in its own request. Note that
setting a buffer size lower than `5mb` is not allowed since it will prevent
the use of the Multipart API and may result in upload errors. It is also not
possible to set a buffer size greater than `5gb` as it is the maximum upload
size allowed by S3. Defaults to the minimum between `100mb` and `5%` of the
heap size.
`canned_acl`::
The S3 repository supports all
http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3
canned ACLs] : `private`, `public-read`, `public-read-write`,
`authenticated-read`, `log-delivery-write`, `bucket-owner-read`,
`bucket-owner-full-control`. Defaults to `private`. You could specify a
canned ACL using the `canned_acl` setting. When the S3 repository creates
buckets and objects, it adds the canned ACL into the buckets and objects.
`storage_class`::
Sets the S3 storage class for objects stored in the snapshot repository.
Values may be `standard`, `reduced_redundancy`, `standard_ia`, `onezone_ia`
and `intelligent_tiering`. Defaults to `standard`.
Changing this setting on an existing repository only affects the
storage class for newly created objects, resulting in a mixed usage of
storage classes. Additionally, S3 Lifecycle Policies can be used to manage
the storage class of existing objects. Due to the extra complexity with the
Glacier class lifecycle, it is not currently supported by the plugin. For
more information about the different classes, see
http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS
Storage Classes Guide]
NOTE: The option of defining client settings in the repository settings as
documented below is considered deprecated, and will be removed in a future
version.
In addition to the above settings, you may also specify all non-secure client
settings in the repository settings. In this case, the client settings found in
the repository settings will be merged with those of the named client used by
the repository. Conflicts between client and repository settings are resolved
by the repository settings taking precedence over client settings.
For example:
[source,console]
----
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"client": "my_client_name",
"bucket": "my_bucket_name",
"endpoint": "my.s3.endpoint"
}
}
----
// TEST[skip:we don't have s3 set up while testing this]
This sets up a repository that uses all client settings from the client
`my_client_name` except for the `endpoint` that is overridden to
`my.s3.endpoint` by the repository settings.
[[repository-s3-permissions]]
===== Recommended S3 Permissions
In order to restrict the Elasticsearch snapshot process to the minimum required
resources, we recommend using Amazon IAM in conjunction with pre-existing S3
buckets. Here is an example policy which will allow the snapshot access to an S3
bucket named "snaps.example.com". This may be configured through the AWS IAM
console, by creating a Custom Policy, and using a Policy Document similar to
this (changing snaps.example.com to your bucket name).
[source,js]
----
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com/*"
]
}
],
"Version": "2012-10-17"
}
----
// NOTCONSOLE
You may further restrict the permissions by specifying a prefix within the
bucket, in this example, named "foo".
[source,js]
----
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Condition": {
"StringLike": {
"s3:prefix": [
"foo/*"
]
}
},
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com/foo/*"
]
}
],
"Version": "2012-10-17"
}
----
// NOTCONSOLE
The bucket needs to exist to register a repository for snapshots. If you did not
create the bucket then the repository registration will fail.
[[repository-s3-aws-vpc]]
[float]
==== AWS VPC Bandwidth Settings
AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch
instances reside in a private subnet in an AWS VPC then all traffic to S3 will
go through that VPC's NAT instance. If your VPC's NAT instance is a smaller
instance size (e.g. a t1.micro) or is handling a high volume of network traffic
your bandwidth to S3 may be limited by that NAT instance's networking bandwidth
limitations.
Instances residing in a public subnet in an AWS VPC will connect to S3 via the
VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.