311 lines
9.3 KiB
Plaintext
311 lines
9.3 KiB
Plaintext
[[repository-s3]]
|
|
=== S3 Repository Plugin
|
|
|
|
The S3 repository plugin adds support for using S3 as a repository for
|
|
{ref}/modules-snapshots.html[Snapshot/Restore].
|
|
|
|
*If you are looking for a hosted solution of Elasticsearch on AWS, please visit http://www.elastic.co/cloud.*
|
|
|
|
:plugin_name: repository-s3
|
|
include::install_remove.asciidoc[]
|
|
|
|
[[repository-s3-usage]]
|
|
==== Getting started with AWS
|
|
|
|
The plugin provides a repository type named `s3` which may be used when creating a repository.
|
|
The repository defaults to using
|
|
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[IAM Role]
|
|
credentials for authentication. The only mandatory setting is the bucket name:
|
|
|
|
[source,js]
|
|
----
|
|
PUT _snapshot/my_s3_repository
|
|
{
|
|
"type": "s3",
|
|
"settings": {
|
|
"bucket": "my_bucket"
|
|
}
|
|
}
|
|
----
|
|
// CONSOLE
|
|
// TEST[skip:we don't have s3 setup while testing this]
|
|
|
|
|
|
[[repository-s3-client]]
|
|
==== Client Settings
|
|
|
|
The client used to connect to S3 has a number of settings available. Client setting names are of
|
|
the form `s3.client.CLIENT_NAME.SETTING_NAME` and specified inside `elasticsearch.yml`. The
|
|
default client name looked up by a `s3` repository is called `default`, but can be customized
|
|
with the repository setting `client`. For example:
|
|
|
|
[source,js]
|
|
----
|
|
PUT _snapshot/my_s3_repository
|
|
{
|
|
"type": "s3",
|
|
"settings": {
|
|
"bucket": "my_bucket",
|
|
"client": "my_alternate_client"
|
|
}
|
|
}
|
|
----
|
|
// CONSOLE
|
|
// TEST[skip:we don't have s3 setup while testing this]
|
|
|
|
Some settings are sensitive and must be stored in the {ref}/secure-settings.html[elasticsearch keystore].
|
|
For example, to use explicit AWS access keys:
|
|
|
|
[source,sh]
|
|
----
|
|
bin/elasticsearch-keystore add s3.client.default.access_key
|
|
bin/elasticsearch-keystore add s3.client.default.secret_key
|
|
----
|
|
|
|
The following are the available client settings. Those that must be stored in the keystore
|
|
are marked as `Secure`.
|
|
|
|
`access_key`::
|
|
|
|
An s3 access key. The `secret_key` setting must also be specified. (Secure)
|
|
|
|
`secret_key`::
|
|
|
|
An s3 secret key. The `access_key` setting must also be specified. (Secure)
|
|
|
|
`endpoint`::
|
|
|
|
The s3 service endpoint to connect to. This will be automatically
|
|
figured out by the s3 client based on the bucket location, but
|
|
can be specified explicitly. See http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
|
|
|
|
`protocol`::
|
|
|
|
The protocol to use to connect to s3. Valid values are either `http`
|
|
or `https`. Defaults to `https`.
|
|
|
|
`proxy.host`::
|
|
|
|
The host name of a proxy to connect to s3 through.
|
|
|
|
`proxy.port`::
|
|
|
|
The port of a proxy to connect to s3 through.
|
|
|
|
`proxy.username`::
|
|
|
|
The username to connect to the `proxy.host` with. (Secure)
|
|
|
|
`proxy.password`::
|
|
|
|
The password to connect to the `proxy.host` with. (Secure)
|
|
|
|
`read_timeout`::
|
|
|
|
The socket timeout for connecting to s3. The value should specify the unit. For example,
|
|
a value of `5s` specifies a 5 second timeout. The default value is 50 seconds.
|
|
|
|
`max_retries`::
|
|
|
|
The number of retries to use when an s3 request fails. The default value is 3.
|
|
|
|
`use_throttle_retries`::
|
|
|
|
Whether retries should be throttled (ie use backoff). Must be `true` or `false`. Defaults to `true`.
|
|
|
|
[[repository-s3-repository]]
|
|
==== Repository Settings
|
|
|
|
The `s3` repository type supports a number of settings to customize how data is stored in S3.
|
|
These can be specified when creating the repository. For example:
|
|
|
|
[source,js]
|
|
----
|
|
PUT _snapshot/my_s3_repository
|
|
{
|
|
"type": "s3",
|
|
"settings": {
|
|
"bucket": "my_bucket_name",
|
|
"another_setting": "setting_value"
|
|
}
|
|
}
|
|
----
|
|
// CONSOLE
|
|
// TEST[skip:we don't have s3 set up while testing this]
|
|
|
|
The following settings are supported:
|
|
|
|
`bucket`::
|
|
|
|
The name of the bucket to be used for snapshots. (Mandatory)
|
|
|
|
`client`::
|
|
|
|
The name of the s3 client to use to connect to S3. Defaults to `default`.
|
|
|
|
`base_path`::
|
|
|
|
Specifies the path within bucket to repository data. Defaults to
|
|
value of `repositories.s3.base_path` or to root directory if not set.
|
|
Previously, the base_path could take a leading `/` (forward slash).
|
|
However, this has been deprecated and setting the base_path now should
|
|
omit the leading `/`.
|
|
|
|
`chunk_size`::
|
|
|
|
Big files can be broken down into chunks during snapshotting if needed.
|
|
The chunk size can be specified in bytes or by using size value notation,
|
|
i.e. `1gb`, `10mb`, `5kb`. Defaults to `1gb`.
|
|
|
|
`compress`::
|
|
|
|
When set to `true` metadata files are stored in compressed format. This
|
|
setting doesn't affect index files that are already compressed by default.
|
|
Defaults to `false`.
|
|
|
|
`server_side_encryption`::
|
|
|
|
When set to `true` files are encrypted on server side using AES256
|
|
algorithm. Defaults to `false`.
|
|
|
|
`buffer_size`::
|
|
|
|
Minimum threshold below which the chunk is uploaded using a single
|
|
request. Beyond this threshold, the S3 repository will use the
|
|
http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS Multipart Upload API]
|
|
to split the chunk into several parts, each of `buffer_size` length, and
|
|
to upload each part in its own request. Note that setting a buffer
|
|
size lower than `5mb` is not allowed since it will prevent the use of the
|
|
Multipart API and may result in upload errors. It is also not possible to
|
|
set a buffer size greater than `5gb` as it is the maximum upload size
|
|
allowed by S3. Defaults to the minimum between `100mb` and `5%` of the heap size.
|
|
|
|
`canned_acl`::
|
|
|
|
The S3 repository supports all http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3 canned ACLs]
|
|
: `private`, `public-read`, `public-read-write`, `authenticated-read`, `log-delivery-write`,
|
|
`bucket-owner-read`, `bucket-owner-full-control`. Defaults to `private`.
|
|
You could specify a canned ACL using the `canned_acl` setting. When the S3 repository
|
|
creates buckets and objects, it adds the canned ACL into the buckets and objects.
|
|
|
|
`storage_class`::
|
|
|
|
Sets the S3 storage class type for the backup files. Values may be
|
|
`standard`, `reduced_redundancy`, `standard_ia`. Defaults to `standard`.
|
|
Due to the extra complexity with the Glacier class lifecycle, it is not
|
|
currently supported by the plugin. For more information about the
|
|
different classes, see http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS Storage Classes Guide]
|
|
|
|
[[repository-s3-permissions]]
|
|
===== Recommended S3 Permissions
|
|
|
|
In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon
|
|
IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an
|
|
S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy,
|
|
and using a Policy Document similar to this (changing snaps.example.com to your bucket name).
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"Statement": [
|
|
{
|
|
"Action": [
|
|
"s3:ListBucket",
|
|
"s3:GetBucketLocation",
|
|
"s3:ListBucketMultipartUploads",
|
|
"s3:ListBucketVersions"
|
|
],
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com"
|
|
]
|
|
},
|
|
{
|
|
"Action": [
|
|
"s3:GetObject",
|
|
"s3:PutObject",
|
|
"s3:DeleteObject",
|
|
"s3:AbortMultipartUpload",
|
|
"s3:ListMultipartUploadParts"
|
|
],
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com/*"
|
|
]
|
|
}
|
|
],
|
|
"Version": "2012-10-17"
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"Statement": [
|
|
{
|
|
"Action": [
|
|
"s3:ListBucket",
|
|
"s3:GetBucketLocation",
|
|
"s3:ListBucketMultipartUploads",
|
|
"s3:ListBucketVersions"
|
|
],
|
|
"Condition": {
|
|
"StringLike": {
|
|
"s3:prefix": [
|
|
"foo/*"
|
|
]
|
|
}
|
|
},
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com"
|
|
]
|
|
},
|
|
{
|
|
"Action": [
|
|
"s3:GetObject",
|
|
"s3:PutObject",
|
|
"s3:DeleteObject",
|
|
"s3:AbortMultipartUpload",
|
|
"s3:ListMultipartUploadParts"
|
|
],
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com/foo/*"
|
|
]
|
|
}
|
|
],
|
|
"Version": "2012-10-17"
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository
|
|
registration will fail. If you want Elasticsearch to create the bucket instead, you can add the permission to create a
|
|
specific bucket like this:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"Action": [
|
|
"s3:CreateBucket"
|
|
],
|
|
"Effect": "Allow",
|
|
"Resource": [
|
|
"arn:aws:s3:::snaps.example.com"
|
|
]
|
|
}
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
[[repository-s3-aws-vpc]]
|
|
[float]
|
|
==== AWS VPC Bandwidth Settings
|
|
|
|
AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch instances reside in a private subnet in an AWS VPC then all traffic to S3 will go through that VPC's NAT instance. If your VPC's NAT instance is a smaller instance size (e.g. a t1.micro) or is handling a high volume of network traffic your bandwidth to S3 may be limited by that NAT instance's networking bandwidth limitations.
|
|
|
|
Instances residing in a public subnet in an AWS VPC will connect to S3 via the VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.
|