clarify security requirements around HTTPInputSource (#10914)

* clarify security requirements around HTTPInputSource

* explicitly mention write/datasource in best practices. clarify that the ingestion task is the risk

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
This commit is contained in:
Charles Smith 2021-02-26 09:37:47 -08:00 committed by GitHub
parent f930cf14d6
commit 573de3bc0d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 9 additions and 2 deletions

View File

@ -1133,8 +1133,14 @@ the [S3 input source](#s3-input-source) or the [Google Cloud Storage input sourc
### HTTP Input Source
The HTTP input source is to support reading files directly
from remote sites via HTTP.
The HTTP input source is to support reading files directly from remote sites via HTTP.
> **NOTE:** Ingestion tasks run under the operating system account that runs the Druid processes, for example the Indexer, Middle Manager, and Peon. This means any user who can submit an ingestion task can specify an `HTTPInputSource` at any location where the Druid process has permissions. For example, using `HTTPInputSource`, a console user has access to internal network locations where the they would be denied access otherwise.
> **WARNING:** `HTTPInputSource` is not limited to the HTTP or HTTPS protocols. It uses the Java `URI` class that supports HTTP, HTTPS, FTP, file, and jar protocols by default. This means you should never run Druid under the `root` account, because a user can use the file protocol to access any files on the local disk.
For more information about security best practices, see [Security overview](../operations/security-overview.md#best-practices).
The HTTP input source is _splittable_ and can be used by the [Parallel task](#parallel-task),
where each worker task of `index_parallel` will read only one file. This input source does not support Split Hint Spec.

View File

@ -41,6 +41,7 @@ This document gives you an overview of security features in Druid and how to con
## Best practices
* Do not expose the Druid Console without authentication on untrusted networks. Access to the console effectively confers access the file system on the installation machine, via file browsers in the UI. You should use an API gateway that restricts who can connect from untrusted networks, allow list the specific APIs that your users need to access, and implements account lockout and throttling features.
* You should only grant `WRITE` permissions to a `DATASOURCE` to trusted users. Druid assumes that these users have the same privileges as the operating system user that runs the Druid process.
* Grant users the minimum permissions necessary to perform their functions. For instance, do not allow users who only need to query data to write to data sources or view state.
* Disable JavaScript, as noted in the [Security section](https://druid.apache.org/docs/latest/development/javascript.html#security) of the JavaScript guide.
* Run Druid as an unprivileged Unix user on the installation machine (not root).