411 lines
13 KiB
Plaintext
411 lines
13 KiB
Plaintext
[[repository-s3]]
|
|
=== S3 Repository Plugin
|
|
|
|
The S3 repository plugin adds support for using AWS S3 as a repository for
|
|
{ref}/modules-snapshots.html[Snapshot/Restore].
|
|
|
|
*If you are looking for a hosted solution of Elasticsearch on AWS, please visit
|
|
http://www.elastic.co/cloud.*
|
|
|
|
:plugin_name: repository-s3
|
|
include::install_remove.asciidoc[]
|
|
|
|
[[repository-s3-usage]]
|
|
==== Getting Started
|
|
|
|
The plugin provides a repository type named `s3` which may be used when creating
|
|
a repository. The repository defaults to using
|
|
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS
|
|
IAM Role] or
|
|
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2
|
|
IAM Role] credentials for authentication. The only mandatory setting is the
|
|
bucket name:
|
|
|
|
[source,js]
|
|
----
|
|
PUT _snapshot/my_s3_repository
|
|
{
|
|
"type": "s3",
|
|
"settings": {
|
|
"bucket": "my_bucket"
|
|
}
|
|
}
|
|
----
|
|
// CONSOLE
|
|
// TEST[skip:we don't have s3 setup while testing this]
|
|
|
|
|
|
[[repository-s3-client]]
|
|
==== Client Settings
|
|
|
|
The client that you use to connect to S3 has a number of settings available.
|
|
The settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. By default,
|
|
`s3` repositories use a client named `default`, but this can be modified using
|
|
the <<repository-s3-repository,repository setting>> `client`. For example:
|
|
|
|
[source,js]
|
|
----
|
|
PUT _snapshot/my_s3_repository
|
|
{
|
|
"type": "s3",
|
|
"settings": {
|
|
"bucket": "my_bucket",
|
|
"client": "my_alternate_client"
|
|
}
|
|
}
|
|
----
|
|
// CONSOLE
|
|
// TEST[skip:we don't have S3 setup while testing this]
|
|
|
|
Most client settings can be added to the `elasticsearch.yml` configuration file
|
|
with the exception of the secure settings, which you add to the {es} keystore.
|
|
For more information about creating and updating the {es} keystore, see
|
|
{ref}/secure-settings.html[Secure settings].
|
|
|
|
For example, before you start the node, run these commands to add AWS access key
|
|
settings to the keystore:
|
|
|
|
[source,sh]
|
|
----
|
|
bin/elasticsearch-keystore add s3.client.default.access_key
|
|
bin/elasticsearch-keystore add s3.client.default.secret_key
|
|
----
|
|
|
|
*All* client secure settings of this plugin are
|
|
{ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you
|
|
reload the settings, the internal `s3` clients, used to transfer the snapshot
|
|
contents, will utilize the latest settings from the keystore. Any existing `s3`
|
|
repositories, as well as any newly created ones, will pick up the new values
|
|
stored in the keystore.
|
|
|
|
NOTE: In-progress snapshot/restore tasks will not be preempted by a *reload* of
|
|
the client's secure settings. The task will complete using the client as it was
|
|
built when the operation started.
|
|
|
|
The following list contains the available client settings. Those that must be
|
|
stored in the keystore are marked as "secure" and are *reloadable*; the other
|
|
settings belong in the `elasticsearch.yml` file.
|
|
|
|
`access_key` ({ref}/secure-settings.html[Secure])::
|
|
|
|
An S3 access key. The `secret_key` setting must also be specified.
|
|
|
|
`secret_key` ({ref}/secure-settings.html[Secure])::
|
|
|
|
An S3 secret key. The `access_key` setting must also be specified.
|
|
|
|
`session_token` ({ref}/secure-settings.html[Secure])::
|
|
|
|
An S3 session token. The `access_key` and `secret_key` settings must also be
|
|
specified.
|
|
|
|
`endpoint`::
|
|
|
|
The S3 service endpoint to connect to. This defaults to `s3.amazonaws.com`
|
|
but the
|
|
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region[AWS
|
|
documentation] lists alternative S3 endpoints. If you are using an
|
|
<<repository-s3-compatible-services,S3-compatible service>> then you should
|
|
set this to the service's endpoint.
|
|
|
|
`protocol`::
|
|
|
|
The protocol to use to connect to S3. Valid values are either `http` or
|
|
`https`. Defaults to `https`.
|
|
|
|
`proxy.host`::
|
|
|
|
The host name of a proxy to connect to S3 through.
|
|
|
|
`proxy.port`::
|
|
|
|
The port of a proxy to connect to S3 through.
|
|
|
|
`proxy.username` ({ref}/secure-settings.html[Secure])::
|
|
|
|
The username to connect to the `proxy.host` with.
|
|
|
|
`proxy.password` ({ref}/secure-settings.html[Secure])::
|
|
|
|
The password to connect to the `proxy.host` with.
|
|
|
|
`read_timeout`::
|
|
|
|
The socket timeout for connecting to S3. The value should specify the unit.
|
|
For example, a value of `5s` specifies a 5 second timeout. The default value
|
|
is 50 seconds.
|
|
|
|
`max_retries`::
|
|
|
|
The number of retries to use when an S3 request fails. The default value is
|
|
`3`.
|
|
|
|
`use_throttle_retries`::
|
|
|
|
Whether retries should be throttled (i.e. should back off). Must be `true`
|
|
or `false`. Defaults to `true`.
|
|
|
|
`path_style_access`::
|
|
|
|
Whether to force the use of the path style access pattern. If `true`, the
|
|
path style access pattern will be used. If `false`, the access pattern will
|
|
be automatically determined by the AWS Java SDK (See
|
|
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#setPathStyleAccessEnabled-java.lang.Boolean-[AWS
|
|
documentation] for details). Defaults to `false`.
|
|
|
|
[[repository-s3-path-style-deprecation]]
|
|
NOTE: In versions `7.0`, `7.1`, `7.2` and `7.3` all bucket operations used the
|
|
https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/[now-deprecated]
|
|
path style access pattern. If your deployment requires the path style access
|
|
pattern then you should set this setting to `true` when upgrading.
|
|
|
|
[float]
|
|
[[repository-s3-compatible-services]]
|
|
===== S3-compatible services
|
|
|
|
There are a number of storage systems that provide an S3-compatible API, and
|
|
the `repository-s3` plugin allows you to use these systems in place of AWS S3.
|
|
To do so, you should set the `s3.client.CLIENT_NAME.endpoint` setting to the
|
|
system's endpoint. This setting accepts IP addresses and hostnames and may
|
|
include a port. For example, the endpoint may be `172.17.0.2` or
|
|
`172.17.0.2:9000`. You may also need to set `s3.client.CLIENT_NAME.protocol` to
|
|
`http` if the endpoint does not support HTTPS.
|
|
|
|
https://minio.io[Minio] is an example of a storage system that provides an
|
|
S3-compatible API. The `repository-s3` plugin allows {es} to work with
|
|
Minio-backed repositories as well as repositories stored on AWS S3. Other
|
|
S3-compatible storage systems may also work with {es}, but these are not tested
|
|
or supported.
|
|
|
|
[[repository-s3-repository]]
|
|
==== Repository Settings
|
|
|
|
The `s3` repository type supports a number of settings to customize how data is
|
|
stored in S3. These can be specified when creating the repository. For example:
|
|
|
|
[source,js]
|
|
----
|
|
PUT _snapshot/my_s3_repository
|
|
{
|
|
"type": "s3",
|
|
"settings": {
|
|
"bucket": "my_bucket_name",
|
|
"another_setting": "setting_value"
|
|
}
|
|
}
|
|
----
|
|
// CONSOLE
|
|
// TEST[skip:we don't have S3 set up while testing this]
|
|
|
|
The following settings are supported:
|
|
|
|
`bucket`::
|
|
|
|
The name of the bucket to be used for snapshots. (Mandatory)
|
|
|
|
`client`::
|
|
|
|
The name of the <<repository-s3-client,S3 client>> to use to connect to S3.
|
|
Defaults to `default`.
|
|
|
|
`base_path`::
|
|
|
|
Specifies the path within bucket to repository data. Defaults to value of
|
|
`repositories.s3.base_path` or to root directory if not set. Previously,
|
|
the base_path could take a leading `/` (forward slash). However, this has
|
|
been deprecated and setting the base_path now should omit the leading `/`.
|
|
|
|
`chunk_size`::
|
|
|
|
Big files can be broken down into chunks during snapshotting if needed. The
|
|
chunk size can be specified in bytes or by using size value notation, i.e.
|
|
`1gb`, `10mb`, `5kb`. Defaults to `1gb`.
|
|
|
|
`compress`::
|
|
|
|
When set to `true` metadata files are stored in compressed format. This
|
|
setting doesn't affect index files that are already compressed by default.
|
|
Defaults to `false`.
|
|
|
|
include::repository-shared-settings.asciidoc[]
|
|
|
|
`server_side_encryption`::
|
|
|
|
When set to `true` files are encrypted on server side using AES256
|
|
algorithm. Defaults to `false`.
|
|
|
|
`buffer_size`::
|
|
|
|
Minimum threshold below which the chunk is uploaded using a single request.
|
|
Beyond this threshold, the S3 repository will use the
|
|
http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS
|
|
Multipart Upload API] to split the chunk into several parts, each of
|
|
`buffer_size` length, and to upload each part in its own request. Note that
|
|
setting a buffer size lower than `5mb` is not allowed since it will prevent
|
|
the use of the Multipart API and may result in upload errors. It is also not
|
|
possible to set a buffer size greater than `5gb` as it is the maximum upload
|
|
size allowed by S3. Defaults to the minimum between `100mb` and `5%` of the
|
|
heap size.
|
|
|
|
`canned_acl`::
|
|
|
|
The S3 repository supports all
|
|
http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3
|
|
canned ACLs] : `private`, `public-read`, `public-read-write`,
|
|
`authenticated-read`, `log-delivery-write`, `bucket-owner-read`,
|
|
`bucket-owner-full-control`. Defaults to `private`. You could specify a
|
|
canned ACL using the `canned_acl` setting. When the S3 repository creates
|
|
buckets and objects, it adds the canned ACL into the buckets and objects.
|
|
|
|
`storage_class`::
|
|
|
|
Sets the S3 storage class for objects stored in the snapshot repository.
|
|
Values may be `standard`, `reduced_redundancy`, `standard_ia`
|
|
and `intelligent_tiering`. Defaults to `standard`.
|
|
Changing this setting on an existing repository only affects the
|
|
storage class for newly created objects, resulting in a mixed usage of
|
|
storage classes. Additionally, S3 Lifecycle Policies can be used to manage
|
|
the storage class of existing objects. Due to the extra complexity with the
|
|
Glacier class lifecycle, it is not currently supported by the plugin. For
|
|
more information about the different classes, see
|
|
http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS
|
|
Storage Classes Guide]
|
|
|
|
NOTE: The option of defining client settings in the repository settings as
|
|
documented below is considered deprecated, and will be removed in a future
|
|
version.
|
|
|
|
In addition to the above settings, you may also specify all non-secure client
|
|
settings in the repository settings. In this case, the client settings found in
|
|
the repository settings will be merged with those of the named client used by
|
|
the repository. Conflicts between client and repository settings are resolved
|
|
by the repository settings taking precedence over client settings.
|
|
|
|
For example:
|
|
|
|
[source,js]
|
|
----
|
|
PUT _snapshot/my_s3_repository
|
|
{
|
|
"type": "s3",
|
|
"settings": {
|
|
"client": "my_client_name",
|
|
"bucket": "my_bucket_name",
|
|
"endpoint": "my.s3.endpoint"
|
|
}
|
|
}
|
|
----
|
|
// CONSOLE
|
|
// TEST[skip:we don't have s3 set up while testing this]
|
|
|
|
This sets up a repository that uses all client settings from the client
|
|
`my_client_name` except for the `endpoint` that is overridden to
|
|
`my.s3.endpoint` by the repository settings.
|
|
|
|
[[repository-s3-permissions]]
|
|
===== Recommended S3 Permissions
|
|
|
|
In order to restrict the Elasticsearch snapshot process to the minimum required
|
|
resources, we recommend using Amazon IAM in conjunction with pre-existing S3
|
|
buckets. Here is an example policy which will allow the snapshot access to an S3
|
|
bucket named "snaps.example.com". This may be configured through the AWS IAM
|
|
console, by creating a Custom Policy, and using a Policy Document similar to
|
|
this (changing snaps.example.com to your bucket name).
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"Statement": [
|
|
{
|
|
"Action": [
|
|
"s3:ListBucket",
|
|
"s3:GetBucketLocation",
|
|
"s3:ListBucketMultipartUploads",
|
|
"s3:ListBucketVersions"
|
|
],
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com"
|
|
]
|
|
},
|
|
{
|
|
"Action": [
|
|
"s3:GetObject",
|
|
"s3:PutObject",
|
|
"s3:DeleteObject",
|
|
"s3:AbortMultipartUpload",
|
|
"s3:ListMultipartUploadParts"
|
|
],
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com/*"
|
|
]
|
|
}
|
|
],
|
|
"Version": "2012-10-17"
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
You may further restrict the permissions by specifying a prefix within the
|
|
bucket, in this example, named "foo".
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"Statement": [
|
|
{
|
|
"Action": [
|
|
"s3:ListBucket",
|
|
"s3:GetBucketLocation",
|
|
"s3:ListBucketMultipartUploads",
|
|
"s3:ListBucketVersions"
|
|
],
|
|
"Condition": {
|
|
"StringLike": {
|
|
"s3:prefix": [
|
|
"foo/*"
|
|
]
|
|
}
|
|
},
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com"
|
|
]
|
|
},
|
|
{
|
|
"Action": [
|
|
"s3:GetObject",
|
|
"s3:PutObject",
|
|
"s3:DeleteObject",
|
|
"s3:AbortMultipartUpload",
|
|
"s3:ListMultipartUploadParts"
|
|
],
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com/foo/*"
|
|
]
|
|
}
|
|
],
|
|
"Version": "2012-10-17"
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
The bucket needs to exist to register a repository for snapshots. If you did not
|
|
create the bucket then the repository registration will fail.
|
|
|
|
[[repository-s3-aws-vpc]]
|
|
[float]
|
|
==== AWS VPC Bandwidth Settings
|
|
|
|
AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch
|
|
instances reside in a private subnet in an AWS VPC then all traffic to S3 will
|
|
go through that VPC's NAT instance. If your VPC's NAT instance is a smaller
|
|
instance size (e.g. a t1.micro) or is handling a high volume of network traffic
|
|
your bandwidth to S3 may be limited by that NAT instance's networking bandwidth
|
|
limitations.
|
|
|
|
Instances residing in a public subnet in an AWS VPC will connect to S3 via the
|
|
VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.
|