OpenSearch/docs/plugins/repository-s3.asciidoc

[[repository-s3]]
=== S3 Repository Plugin

The S3 repository plugin adds support for using AWS S3 as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].

*If you are looking for a hosted solution of Elasticsearch on AWS, please visit
http://www.elastic.co/cloud.*

:plugin_name: repository-s3
include::install_remove.asciidoc[]

[[repository-s3-usage]]
==== Getting Started

The plugin provides a repository type named `s3` which may be used when creating
a repository. The repository defaults to using
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html[ECS
IAM Role] or
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html[EC2
IAM Role] credentials for authentication. The only mandatory setting is the
bucket name:

[source,js]
----
PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my_bucket"
  }
}
----
// CONSOLE
// TEST[skip:we don't have s3 setup while testing this]


[[repository-s3-client]]
==== Client Settings

The client that you use to connect to S3 has a number of settings available.
The settings have the form `s3.client.CLIENT_NAME.SETTING_NAME`. By default,
`s3` repositories use a client named `default`, but this can be modified using
the <<repository-s3-repository,repository setting>> `client`. For example:

[source,js]
----
PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my_bucket",
    "client": "my_alternate_client"
  }
}
----
// CONSOLE
// TEST[skip:we don't have S3 setup while testing this]

Most client settings can be added to the `elasticsearch.yml` configuration file
with the exception of the secure settings, which you add to the {es} keystore.
For more information about creating and updating the {es} keystore, see
{ref}/secure-settings.html[Secure settings].

For example, before you start the node, run these commands to add AWS access key
settings to the keystore:

[source,sh]
----
bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key
----

*All* client secure settings of this plugin are
{ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you
reload the settings, the internal `s3` clients, used to transfer the snapshot
contents, will utilize the latest settings from the keystore. Any existing `s3`
repositories, as well as any newly created ones, will pick up the new values
stored in the keystore.

NOTE: In-progress snapshot/restore tasks will not be preempted by a *reload* of
the client's secure settings. The task will complete using the client as it was
built when the operation started.

The following list contains the available client settings. Those that must be
stored in the keystore are marked as "secure" and are *reloadable*; the other
settings belong in the `elasticsearch.yml` file.

`access_key` ({ref}/secure-settings.html[Secure])::

    An S3 access key. The `secret_key` setting must also be specified.

`secret_key` ({ref}/secure-settings.html[Secure])::

    An S3 secret key. The `access_key` setting must also be specified.

`session_token` ({ref}/secure-settings.html[Secure])::

    An S3 session token. The `access_key` and `secret_key` settings must also be
    specified.

`endpoint`::

    The S3 service endpoint to connect to. This defaults to `s3.amazonaws.com`
    but the
    http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region[AWS
    documentation] lists alternative S3 endpoints. If you are using an
    <<repository-s3-compatible-services,S3-compatible service>> then you should
    set this to the service's endpoint.

`protocol`::

    The protocol to use to connect to S3. Valid values are either `http` or
    `https`. Defaults to `https`.

`proxy.host`::

    The host name of a proxy to connect to S3 through.

`proxy.port`::

    The port of a proxy to connect to S3 through.

`proxy.username` ({ref}/secure-settings.html[Secure])::

    The username to connect to the `proxy.host` with.

`proxy.password` ({ref}/secure-settings.html[Secure])::

    The password to connect to the `proxy.host` with.

`read_timeout`::

    The socket timeout for connecting to S3. The value should specify the unit.
    For example, a value of `5s` specifies a 5 second timeout. The default value
    is 50 seconds.

`max_retries`::

    The number of retries to use when an S3 request fails. The default value is
    `3`.

`use_throttle_retries`::

    Whether retries should be throttled (i.e. should back off). Must be `true`
    or `false`. Defaults to `true`.

`path_style_access`::

   Whether to force the use of the path style access pattern. If `true`, the
   path style access pattern will be used. If `false`, the access pattern will
   be automatically determined by the AWS Java SDK (See
   https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#setPathStyleAccessEnabled-java.lang.Boolean-[AWS
   documentation] for details). Defaults to `false`.

[[repository-s3-path-style-deprecation]]
NOTE: In versions `7.0`, `7.1`, `7.2` and `7.3` all bucket operations used the
https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/[now-deprecated]
path style access pattern. If your deployment requires the path style access
pattern then you should set this setting to `true` when upgrading.

`disable_chunked_encoding`::

    Whether chunked encoding should be disabled or not. If `false`, chunked
    encoding is enabled and will be used where appropriate. If `true`, chunked
    encoding is disabled and will not be used, which may mean that snapshot
    operations consume more resources and take longer to complete. It should
    only be set to `true` if you are using a storage service that does not
    support chunked encoding. See the
    https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#disableChunkedEncoding--[AWS
    Java SDK documentation] for details. Defaults to `false`.

[float]
[[repository-s3-compatible-services]]
===== S3-compatible services

There are a number of storage systems that provide an S3-compatible API, and
the `repository-s3` plugin allows you to use these systems in place of AWS S3.
To do so, you should set the `s3.client.CLIENT_NAME.endpoint` setting to the
system's endpoint. This setting accepts IP addresses and hostnames and may
include a port. For example, the endpoint may be `172.17.0.2` or
`172.17.0.2:9000`. You may also need to set `s3.client.CLIENT_NAME.protocol` to
`http` if the endpoint does not support HTTPS.

https://minio.io[Minio] is an example of a storage system that provides an
S3-compatible API. The `repository-s3` plugin allows {es} to work with
Minio-backed repositories as well as repositories stored on AWS S3. Other
S3-compatible storage systems may also work with {es}, but these are not tested
or supported.

[[repository-s3-repository]]
==== Repository Settings

The `s3` repository type supports a number of settings to customize how data is
stored in S3. These can be specified when creating the repository. For example:

[source,js]
----
PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my_bucket_name",
    "another_setting": "setting_value"
  }
}
----
// CONSOLE
// TEST[skip:we don't have S3 set up while testing this]

The following settings are supported:

`bucket`::

    The name of the bucket to be used for snapshots. (Mandatory)

`client`::

    The name of the <<repository-s3-client,S3 client>> to use to connect to S3.
    Defaults to `default`.

`base_path`::

    Specifies the path within bucket to repository data. Defaults to value of
    `repositories.s3.base_path` or to root directory if not set.  Previously,
    the base_path could take a leading `/` (forward slash).  However, this has
    been deprecated and setting the base_path now should omit the leading `/`.

`chunk_size`::

    Big files can be broken down into chunks during snapshotting if needed.  The
    chunk size can be specified in bytes or by using size value notation, i.e.
    `1gb`, `10mb`, `5kb`. Defaults to `1gb`.

`compress`::

    When set to `true` metadata files are stored in compressed format. This
    setting doesn't affect index files that are already compressed by default.
    Defaults to `false`.

include::repository-shared-settings.asciidoc[]

`server_side_encryption`::

    When set to `true` files are encrypted on server side using AES256
    algorithm. Defaults to `false`.

`buffer_size`::

    Minimum threshold below which the chunk is uploaded using a single request.
    Beyond this threshold, the S3 repository will use the
    http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html[AWS
    Multipart Upload API] to split the chunk into several parts, each of
    `buffer_size` length, and to upload each part in its own request. Note that
    setting a buffer size lower than `5mb` is not allowed since it will prevent
    the use of the Multipart API and may result in upload errors. It is also not
    possible to set a buffer size greater than `5gb` as it is the maximum upload
    size allowed by S3. Defaults to the minimum between `100mb` and `5%` of the
    heap size.

`canned_acl`::

    The S3 repository supports all
    http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl[S3
    canned ACLs] : `private`, `public-read`, `public-read-write`,
    `authenticated-read`, `log-delivery-write`, `bucket-owner-read`,
    `bucket-owner-full-control`. Defaults to `private`.  You could specify a
    canned ACL using the `canned_acl` setting. When the S3 repository creates
    buckets and objects, it adds the canned ACL into the buckets and objects.

`storage_class`::

    Sets the S3 storage class for objects stored in the snapshot repository.
    Values may be `standard`, `reduced_redundancy`, `standard_ia`
    and `intelligent_tiering`. Defaults to `standard`.
    Changing this setting on an existing repository only affects the
    storage class for newly created objects, resulting in a mixed usage of
    storage classes. Additionally, S3 Lifecycle Policies can be used to manage
    the storage class of existing objects.  Due to the extra complexity with the
    Glacier class lifecycle, it is not currently supported by the plugin.  For
    more information about the different classes, see
    http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html[AWS
    Storage Classes Guide]

NOTE: The option of defining client settings in the repository settings as
documented below is considered deprecated, and will be removed in a future
version.

In addition to the above settings, you may also specify all non-secure client
settings in the repository settings.  In this case, the client settings found in
the repository settings will be merged with those of the named client used by
the repository.  Conflicts between client and repository settings are resolved
by the repository settings taking precedence over client settings.

For example:

[source,js]
----
PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "client": "my_client_name",
    "bucket": "my_bucket_name",
    "endpoint": "my.s3.endpoint"
  }
}
----
// CONSOLE
// TEST[skip:we don't have s3 set up while testing this]

This sets up a repository that uses all client settings from the client
`my_client_name` except for the `endpoint` that is overridden to
`my.s3.endpoint` by the repository settings.

[[repository-s3-permissions]]
===== Recommended S3 Permissions

In order to restrict the Elasticsearch snapshot process to the minimum required
resources, we recommend using Amazon IAM in conjunction with pre-existing S3
buckets. Here is an example policy which will allow the snapshot access to an S3
bucket named "snaps.example.com". This may be configured through the AWS IAM
console, by creating a Custom Policy, and using a Policy Document similar to
this (changing snaps.example.com to your bucket name).

[source,js]
----
{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}
----
// NOTCONSOLE

You may further restrict the permissions by specifying a prefix within the
bucket, in this example, named "foo".

[source,js]
----
{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Condition": {
        "StringLike": {
          "s3:prefix": [
            "foo/*"
          ]
        }
      },
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com/foo/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}
----
// NOTCONSOLE

The bucket needs to exist to register a repository for snapshots. If you did not
create the bucket then the repository registration will fail.

[[repository-s3-aws-vpc]]
[float]
==== AWS VPC Bandwidth Settings

AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch
instances reside in a private subnet in an AWS VPC then all traffic to S3 will
go through that VPC's NAT instance. If your VPC's NAT instance is a smaller
instance size (e.g. a t1.micro) or is handling a high volume of network traffic
your bandwidth to S3 may be limited by that NAT instance's networking bandwidth
limitations.

Instances residing in a public subnet in an AWS VPC will connect to S3 via the
VPC's internet gateway and not be bandwidth limited by the VPC's NAT instance.