Clarify gid used by docker image process and bind-mount method (#49632)

Fix reference about the uid:gid that Elasticsearch runs as inside the Docker container and add a packaging test to ensure that bind mounting a data dir with a random uid and gid:0 works as expected. Backport of #49529 Closes #47929
2019-11-27 13:42:54 +02:00 · 2019-11-27 13:42:54 +02:00 · 4b6915ea41
parent 502873b144
commit 4b6915ea41
5 changed files with 162 additions and 45 deletions
--- a/docs/reference/setup/install/docker.asciidoc
+++ b/docs/reference/setup/install/docker.asciidoc
@ -9,11 +9,11 @@ https://www.docker.elastic.co[www.docker.elastic.co]. The source files
 are in
 https://github.com/elastic/elasticsearch/blob/{branch}/distribution/docker[Github].

-These images are free to use under the Elastic license. They contain open source 
-and free commercial features and access to paid commercial features.  
-{stack-ov}/license-management.html[Start a 30-day trial] to try out all of the 
-paid commercial features. See the 
-https://www.elastic.co/subscriptions[Subscriptions] page for information about 
+These images are free to use under the Elastic license. They contain open source
+and free commercial features and access to paid commercial features.
+{stack-ov}/license-management.html[Start a 30-day trial] to try out all of the
+paid commercial features. See the
+https://www.elastic.co/subscriptions[Subscriptions] page for information about
 Elastic license levels.

 ==== Pulling the image
@ -35,9 +35,9 @@ ifeval::["{release-state}"!="unreleased"]
 docker pull {docker-repo}:{version}
 --------------------------------------------

-Alternatively, you can download other Docker images that contain only features 
-available under the Apache 2.0 license. To download the images, go to 
-https://www.docker.elastic.co[www.docker.elastic.co]. 
+Alternatively, you can download other Docker images that contain only features
+available under the Apache 2.0 license. To download the images, go to
+https://www.docker.elastic.co[www.docker.elastic.co].

 endif::[]

@ -52,7 +52,7 @@ endif::[]

 ifeval::["{release-state}"!="unreleased"]

-To start a single-node {es} cluster for development or testing, specify 
+To start a single-node {es} cluster for development or testing, specify
 <<single-node-discovery,single-node discovery>> to bypass the <<bootstrap-checks,bootstrap checks>>:

 [source,sh,subs="attributes"]
@ -65,7 +65,7 @@ endif::[]
 [[docker-compose-file]]
 ==== Starting a multi-node cluster with Docker Compose

-To get a three-node {es} cluster up and running in Docker, 
+To get a three-node {es} cluster up and running in Docker,
 you can use Docker Compose:

 . Create a `docker-compose.yml` file:
@ -84,20 +84,20 @@ include::docker-compose.yml[]
 --------------------------------------------
 endif::[]

-This sample Docker Compose file brings up a three-node {es} cluster. 
+This sample Docker Compose file brings up a three-node {es} cluster.
 Node `es01` listens on `localhost:9200` and `es02` and `es03` talk to `es01` over a Docker network.

-The  https://docs.docker.com/storage/volumes[Docker named volumes] 
-`data01`, `data02`, and `data03` store the node data directories so the data persists across restarts. 
+The  https://docs.docker.com/storage/volumes[Docker named volumes]
+`data01`, `data02`, and `data03` store the node data directories so the data persists across restarts.
 If they don't already exist, `docker-compose` creates them when you bring up the cluster.
 --
-. Make sure Docker Engine is allotted at least 4GiB of memory. 
+. Make sure Docker Engine is allotted at least 4GiB of memory.
 In Docker Desktop, you configure resource usage on the Advanced tab in Preference (macOS)
 or Settings (Windows).
 +
 NOTE: Docker Compose is not pre-installed with Docker on Linux.
-See docs.docker.com for installation instructions: 
-https://docs.docker.com/compose/install[Install Compose on Linux] 
+See docs.docker.com for installation instructions:
+https://docs.docker.com/compose/install[Install Compose on Linux]

 . Run `docker-compose` to bring up the cluster:
 +
@ -114,13 +114,13 @@ curl -X GET "localhost:9200/_cat/nodes?v&pretty"
 --------------------------------------------------
 // NOTCONSOLE

-Log messages go to the console and are handled by the configured Docker logging driver. 
+Log messages go to the console and are handled by the configured Docker logging driver.
 By default you can access logs with `docker logs`.

-To stop the cluster, run `docker-compose down`. 
-The data in the Docker volumes is preserved and loaded 
+To stop the cluster, run `docker-compose down`.
+The data in the Docker volumes is preserved and loaded
 when you restart the cluster with `docker-compose up`.
-To **delete the data volumes** when you bring down the cluster, 
+To **delete the data volumes** when you bring down the cluster,
 specify the `-v` option: `docker-compose down -v`.


@ -137,7 +137,7 @@ The following requirements and recommendations apply when running {es} in Docker

 ===== Set `vm.max_map_count` to at least `262144`

-The `vm.max_map_count` kernel setting must be set to at least `262144` for production use. 
+The `vm.max_map_count` kernel setting must be set to at least `262144` for production use.

 How you set `vm.max_map_count` depends on your platform:

@ -151,7 +151,7 @@ grep vm.max_map_count /etc/sysctl.conf
 vm.max_map_count=262144
 --------------------------------------------

-To apply the setting on a live system, run: 
+To apply the setting on a live system, run:

 [source,sh]
 --------------------------------------------
@ -196,15 +196,15 @@ sudo sysctl -w vm.max_map_count=262144
 ===== Configuration files must be readable by the `elasticsearch` user

 By default, {es} runs inside the container as user `elasticsearch` using
-uid:gid `1000:1000`.
+uid:gid `1000:0`.

 IMPORTANT: One exception is https://docs.openshift.com/container-platform/3.6/creating_images/guidelines.html#openshift-specific-guidelines[Openshift],
-which runs containers using an arbitrarily assigned user ID. 
+which runs containers using an arbitrarily assigned user ID.
 Openshift presents persistent volumes with the gid set to `0`, which works without any adjustments.

 If you are bind-mounting a local directory or file, it must be readable by the `elasticsearch` user.
-In addition, this user must have write access to the <<path-settings,data and log dirs>>. 
-A good strategy is to grant group access to gid `1000` or `0` for the local directory. 
+In addition, this user must have write access to the <<path-settings,data and log dirs>>.
+A good strategy is to grant group access to gid `0` for the local directory.

 For example, to prepare a local directory for storing data through a bind-mount:

@ -212,7 +212,7 @@ For example, to prepare a local directory for storing data through a bind-mount:
 --------------------------------------------
 mkdir esdatadir
 chmod g+rwx esdatadir
-chgrp 1000 esdatadir
+chgrp 0 esdatadir
 --------------------------------------------

 As a last resort, you can force the container to mutate the ownership of
@ -223,10 +223,10 @@ uid:gid `1000:0`, which provides the required read/write access to the {es} proc

 ===== Increase ulimits for nofile and nproc

-Increased ulimits for <<setting-system-settings,nofile>> and <<max-number-threads-check,nproc>> 
-must be available for the {es} containers. 
+Increased ulimits for <<setting-system-settings,nofile>> and <<max-number-threads-check,nproc>>
+must be available for the {es} containers.
 Verify the https://github.com/moby/moby/tree/ea4d1243953e6b652082305a9c3cda8656edab26/contrib/init[init system]
-for the Docker daemon sets them to acceptable values. 
+for the Docker daemon sets them to acceptable values.

 To check the Docker daemon defaults for ulimits, run:

@ -246,12 +246,12 @@ For example, when using `docker run`, set:
 ===== Disable swapping

 Swapping needs to be disabled for performance and node stability.
-For information about ways to do this, see <<setup-configuration-memory>>. 
+For information about ways to do this, see <<setup-configuration-memory>>.

-If you opt for the `bootstrap.memory_lock: true` approach, 
+If you opt for the `bootstrap.memory_lock: true` approach,
 you also need to define the `memlock: true` ulimit in the
 https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimits[Docker Daemon],
-or explicitly set for the container as shown in the  <<docker-compose-file, sample compose file>>. 
+or explicitly set for the container as shown in the  <<docker-compose-file, sample compose file>>.
 When using `docker run`, you can specify:

  -e "bootstrap.memory_lock=true" --ulimit memlock=-1:-1
@ -260,12 +260,12 @@ When using `docker run`, you can specify:

 The image https://docs.docker.com/engine/reference/builder/#/expose[exposes]
 TCP ports 9200 and 9300. For production clusters, randomizing the
-published ports with `--publish-all` is recommended, 
+published ports with `--publish-all` is recommended,
 unless you are pinning one container per host.

 ===== Set the heap size

-Use the `ES_JAVA_OPTS` environment variable to set the heap size. 
+Use the `ES_JAVA_OPTS` environment variable to set the heap size.
 For example, to use 16GB, specify `-e ES_JAVA_OPTS="-Xms16g -Xmx16g"` with `docker run`.

 IMPORTANT: You must <<heap-size,configure the heap size>> even if you are
@ -277,7 +277,7 @@ memory access] to the container.
 Pin your deployments to a specific version of the {es} Docker image. For
 example +docker.elastic.co/elasticsearch/elasticsearch:{version}+.

-===== Always bind data volumes 
+===== Always bind data volumes

 You should use a volume bound on `/usr/share/elasticsearch/data` for the following reasons:

@ -291,7 +291,7 @@ https://docs.docker.com/engine/extend/plugins/#volume-plugins[Docker volume plug
 ===== Avoid using `loop-lvm` mode

 If you are using the devicemapper storage driver, do not use the default `loop-lvm` mode.
-Configure docker-engine to use 
+Configure docker-engine to use
 https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#configure-docker-with-devicemapper[direct-lvm].

 ===== Centralize your logs
@ -304,14 +304,14 @@ production use.
 [[docker-configuration-methods]]
 ==== Configuring {es} with Docker

-When you run in Docker, the <<config-files-location,{es} configuration files>> are loaded from 
+When you run in Docker, the <<config-files-location,{es} configuration files>> are loaded from
 `/usr/share/elasticsearch/config/`.

 To use custom configuration files, you <<docker-config-bind-mount, bind-mount the files>>
-over the configuration files in the image. 
+over the configuration files in the image.

-You can set individual {es} configuration parameters using Docker environment variables. 
-The <<docker-compose-file, sample compose file>> and the 
+You can set individual {es} configuration parameters using Docker environment variables.
+The <<docker-compose-file, sample compose file>> and the
 <<docker-cli-run-dev-mode, single-node example>> use this method.

 To use the contents of a file to set an environment variable, suffix the environment
@ -335,8 +335,8 @@ parameters as command line options. For example:
 docker run <various parameters> bin/elasticsearch -Ecluster.name=mynewclustername
 --------------------------------------------

-While bind-mounting your configuration files is usually the preferred method in production, 
-you can also <<_c_customized_image, create a custom Docker image>> 
+While bind-mounting your configuration files is usually the preferred method in production,
+you can also <<_c_customized_image, create a custom Docker image>>
 that contains your configuration.

 [[docker-config-bind-mount]]
@ -351,7 +351,7 @@ For example, to bind-mount `custom_elasticsearch.yml` with `docker run`, specify
 --------------------------------------------

 IMPORTANT: The container **runs {es} as user `elasticsearch` using
-**uid:gid `1000:1000`**. Bind mounted host directories and files must be accessible by this user, 
+uid:gid `1000:0`**. Bind mounted host directories and files must be accessible by this user,
 and the data and log directories must be writable by this user.

 [[_c_customized_image]]
@ -373,7 +373,7 @@ docker build --tag=elasticsearch-custom .
 docker run -ti -v /usr/share/elasticsearch/data elasticsearch-custom
 --------------------------------------------

-Some plugins require additional security permissions. 
+Some plugins require additional security permissions.
 You must explicitly accept them either by:

 * Attaching a `tty` when you run the Docker image and allowing the permissions when prompted.
--- a/qa/os/src/test/java/org/elasticsearch/packaging/test/DockerTests.java
+++ b/qa/os/src/test/java/org/elasticsearch/packaging/test/DockerTests.java
@ -43,7 +43,9 @@ import static org.elasticsearch.packaging.util.Docker.assertPermissionsAndOwners
 import static org.elasticsearch.packaging.util.Docker.copyFromContainer;
 import static org.elasticsearch.packaging.util.Docker.ensureImageIsLoaded;
 import static org.elasticsearch.packaging.util.Docker.existsInContainer;
+import static org.elasticsearch.packaging.util.Docker.mkDirWithPrivilegeEscalation;
 import static org.elasticsearch.packaging.util.Docker.removeContainer;
+import static org.elasticsearch.packaging.util.Docker.rmDirWithPrivilegeEscalation;
 import static org.elasticsearch.packaging.util.Docker.runContainer;
 import static org.elasticsearch.packaging.util.Docker.runContainerExpectingFailure;
 import static org.elasticsearch.packaging.util.Docker.verifyContainerInstallation;
@ -181,6 +183,27 @@ public class DockerTests extends PackagingTestCase {
        assertThat(nodesResponse, containsString("\"using_compressed_ordinary_object_pointers\":\"false\""));
    }

+    /**
+     * Check that the default config can be overridden using a bind mount, and that env vars are respected
+     */
+    public void test71BindMountCustomPathWithDifferentUID() throws Exception {
+        final Path tempEsDataDir = tempDir.resolve("esDataDir");
+        // Make the local directory and contents accessible when bind-mounted
+        mkDirWithPrivilegeEscalation(tempEsDataDir, 1500, 0);
+
+        // Restart the container
+        final Map<Path, Path> volumes = singletonMap(tempEsDataDir.toAbsolutePath(), Paths.get("/usr/share/elasticsearch/data"));
+
+        runContainer(distribution(), volumes, null);
+
+        waitForElasticsearch(installation);
+
+        final String nodesResponse = makeRequest(Request.Get("http://localhost:9200/_nodes"));
+
+        assertThat(nodesResponse, containsString("\"_nodes\":{\"total\":1,\"successful\":1,\"failed\":0}"));
+        rmDirWithPrivilegeEscalation(tempEsDataDir);
+    }
+
    /**
     * Check that environment variables can be populated by setting variables with the suffix "_FILE",
     * which point to files that hold the required values.
--- a/qa/os/src/test/java/org/elasticsearch/packaging/util/Docker.java
+++ b/qa/os/src/test/java/org/elasticsearch/packaging/util/Docker.java
@ -24,6 +24,8 @@ import org.apache.commons.logging.LogFactory;
 import org.elasticsearch.common.CheckedRunnable;

 import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.attribute.PosixFileAttributes;
 import java.nio.file.attribute.PosixFilePermission;
 import java.util.ArrayList;
 import java.util.List;
@ -35,6 +37,7 @@ import static java.nio.file.attribute.PosixFilePermissions.fromString;
 import static org.elasticsearch.packaging.util.FileMatcher.p644;
 import static org.elasticsearch.packaging.util.FileMatcher.p660;
 import static org.elasticsearch.packaging.util.FileMatcher.p755;
+import static org.elasticsearch.packaging.util.FileMatcher.p770;
 import static org.elasticsearch.packaging.util.FileMatcher.p775;
 import static org.elasticsearch.packaging.util.FileUtils.getCurrentVersion;
 import static org.hamcrest.CoreMatchers.containsString;
@ -281,6 +284,76 @@ public class Docker {
        return result.isSuccess();
    }

+    /**
+     * Run privilege escalated shell command on the local file system via a bind mount inside a Docker container.
+     * @param shellCmd The shell command to execute on the localPath e.g. `mkdir /containerPath/dir`.
+     * @param localPath The local path where shellCmd will be executed on (inside a container).
+     * @param containerPath The path to mount localPath inside the container.
+     */
+    private static void executePrivilegeEscalatedShellCmd(String shellCmd, Path localPath, Path containerPath) {
+        final List<String> args = new ArrayList<>();
+
+        args.add("docker run");
+
+        // Don't leave orphaned containers
+        args.add("--rm");
+
+        // Mount localPath to a known location inside the container, so that we can execute shell commands on it later
+        args.add("--volume \"" + localPath.getParent() + ":" + containerPath.getParent() + "\"");
+
+        // Use a lightweight musl libc based small image
+        args.add("alpine");
+
+        // And run inline commands via the POSIX shell
+        args.add("/bin/sh -c \"" + shellCmd + "\"");
+
+        final String command = String.join(" ", args);
+        logger.info("Running command: " + command);
+        sh.run(command);
+    }
+
+    /**
+     * Create a directory with specified uid/gid using Docker backed privilege escalation.
+     * @param localPath The path to the directory to create.
+     * @param uid The numeric id for localPath
+     * @param gid The numeric id for localPath
+     */
+    public static void mkDirWithPrivilegeEscalation(Path localPath, int uid, int gid) {
+        final Path containerBasePath = Paths.get("/mount");
+        final Path containerPath = containerBasePath.resolve(Paths.get("/").relativize(localPath));
+        final List<String> args = new ArrayList<>();
+
+        args.add("mkdir " + containerPath.toAbsolutePath());
+        args.add("&&");
+        args.add("chown " + uid + ":" + gid + " " + containerPath.toAbsolutePath());
+        args.add("&&");
+        args.add("chmod 0770 " + containerPath.toAbsolutePath());
+        final String command = String.join(" ", args);
+        executePrivilegeEscalatedShellCmd(command, localPath, containerPath);
+
+        final PosixFileAttributes dirAttributes = FileUtils.getPosixFileAttributes(localPath);
+        final Map<String, Integer> numericPathOwnership = FileUtils.getNumericUnixPathOwnership(localPath);
+        assertEquals(localPath + " has wrong uid", numericPathOwnership.get("uid").intValue(), uid);
+        assertEquals(localPath + " has wrong gid", numericPathOwnership.get("gid").intValue(), gid);
+        assertEquals(localPath + " has wrong permissions", dirAttributes.permissions(), p770);
+    }
+
+    /**
+     * Delete a directory using Docker backed privilege escalation.
+     * @param localPath The path to the directory to delete.
+     */
+    public static void rmDirWithPrivilegeEscalation(Path localPath) {
+        final Path containerBasePath = Paths.get("/mount");
+        final Path containerPath = containerBasePath.resolve(Paths.get("/").relativize(localPath));
+        final List<String> args = new ArrayList<>();
+
+        args.add("cd " + containerBasePath.toAbsolutePath());
+        args.add("&&");
+        args.add("rm -rf " + localPath.getFileName());
+        final String command = String.join(" ", args);
+        executePrivilegeEscalatedShellCmd(command, localPath, containerPath);
+    }
+
    /**
     * Checks that the specified path's permissions and ownership match those specified.
     */
--- a/qa/os/src/test/java/org/elasticsearch/packaging/util/FileMatcher.java
+++ b/qa/os/src/test/java/org/elasticsearch/packaging/util/FileMatcher.java
@ -46,6 +46,7 @@ public class FileMatcher extends TypeSafeMatcher<Path> {
    public enum Fileness { File, Directory }

    public static final Set<PosixFilePermission> p775 = fromString("rwxrwxr-x");
+    public static final Set<PosixFilePermission> p770 = fromString("rwxrwx---");
    public static final Set<PosixFilePermission> p755 = fromString("rwxr-xr-x");
    public static final Set<PosixFilePermission> p750 = fromString("rwxr-x---");
    public static final Set<PosixFilePermission> p660 = fromString("rw-rw----");
--- a/qa/os/src/test/java/org/elasticsearch/packaging/util/FileUtils.java
+++ b/qa/os/src/test/java/org/elasticsearch/packaging/util/FileUtils.java
@ -33,6 +33,7 @@ import java.nio.channels.FileChannel;
 import java.nio.charset.StandardCharsets;
 import java.nio.file.DirectoryStream;
 import java.nio.file.Files;
+import java.nio.file.LinkOption;
 import java.nio.file.Path;
 import java.nio.file.Paths;
 import java.nio.file.StandardOpenOption;
@ -42,7 +43,9 @@ import java.nio.file.attribute.PosixFileAttributes;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
+import java.util.HashMap;
 import java.util.List;
+import java.util.Map;
 import java.util.StringJoiner;
 import java.util.regex.Pattern;
 import java.util.stream.Stream;
@ -240,6 +243,23 @@ public class FileUtils {
        }
    }

+    /**
+     * Gets numeric ownership attributes that are supported by Unix filesystems
+     * @return a Map of the uid/gid integer values
+     */
+    public static Map<String, Integer> getNumericUnixPathOwnership(Path path) {
+        Map<String, Integer> numericPathOwnership = new HashMap<>();
+
+        try {
+            numericPathOwnership.put("uid", (int) Files.getAttribute(path, "unix:uid", LinkOption.NOFOLLOW_LINKS));
+            numericPathOwnership.put("gid", (int) Files.getAttribute(path, "unix:gid", LinkOption.NOFOLLOW_LINKS));
+        } catch (IOException e) {
+            throw new RuntimeException(e);
+        }
+        return numericPathOwnership;
+    }
+
+
    // vagrant creates /tmp for us in windows so we use that to avoid long paths
    public static Path getTempDir() {
        return Paths.get("/tmp");