HDDS-1639. Restructure documentation pages for better understanding

Closes #901
This commit is contained in:
Márton Elek 2019-06-28 19:51:30 +02:00
parent f09c31a97e
commit 9fd3c702fc
No known key found for this signature in database
GPG Key ID: D51EA8F00EE79B28
70 changed files with 2218 additions and 809 deletions

View File

@ -21,26 +21,6 @@ theme: "ozonedoc"
pygmentsCodeFences: true
uglyurls: true
relativeURLs: true
menu:
main:
- identifier: Starting
name: "Getting Started"
title: "Getting Started"
url: runningviadocker.html
weight: 1
- identifier: Client
name: Client
title: Client
url: commandshell.html
weight: 2
- identifier: Tools
name: Tools
title: Tools
url: dozone.html
weight: 3
- identifier: Recipes
name: Recipes
title: Recipes
url: prometheus.html
weight: 4
disableKinds:
- taxonomy
- taxonomyTerm

View File

@ -1,108 +0,0 @@
---
title: Architecture
date: "2017-10-10"
menu: main
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Ozone is a redundant, distributed object store build by
leveraging primitives present in HDFS. The primary design point of ozone is scalability, and it aims to scale to billions of objects.
Ozone consists of volumes, buckets, and keys. A volume is similar to a home directory in the ozone world. Only an administrator can create it. Volumes are used to store buckets. Once a volume is created users can create as many buckets as needed. Ozone stores data as keys which live inside these buckets.
Ozone namespace is composed of many storage volumes. Storage volumes are also used as the basis for storage accounting.
To access a key, an Ozone URL has the following format:
```
http://servername:port/volume/bucket/key
```
Where the server name is the name of a data node, the port is the data node HTTP port. The volume represents the name of the ozone volume; bucket is an ozone bucket created by the user and key represents the file.
Please look at the [command line interface]({{< ref "CommandShell.md#shell" >}}) for more info.
Ozone supports both (S3 compatible) REST and RPC protocols. Clients can choose either of these protocols to communicate with Ozone. Please see the [client documentation]({{< ref "JavaApi.md" >}}) for more details.
Ozone separates namespace management and block space management; this helps
ozone to scale much better. The namespace is managed by a daemon called
[Ozone Manager ]({{< ref "OzoneManager.md" >}}) (OM), and block space is
managed by [Storage Container Manager] ({{< ref "Hdds.md" >}}) (SCM).
The data nodes provide replication and ability to store blocks; these blocks are stored in groups to reduce the metadata pressure on SCM. This groups of blocks are called storage containers. Hence the block manager is called storage container
manager.
Ozone Overview
--------------
The following diagram is a high-level overview of the core components of Ozone.
![Architecture diagram](../../OzoneOverview.svg)
The main elements of Ozone are:
### Ozone Manager
[Ozone Manager]({{< ref "OzoneManager.md" >}}) (OM) takes care of the Ozone's namespace.
All ozone objects like volumes, buckets, and keys are managed by OM. In Short, OM is the metadata manager for Ozone.
OM talks to blockManager(SCM) to get blocks and passes it on to the Ozone
client. Ozone client writes data to these blocks.
OM will eventually be replicated via Apache Ratis for High Availability.
### Storage Container Manager
[Storage Container Manager]({{< ref "Hdds.md" >}}) (SCM) is the block and cluster manager for Ozone.
SCM along with data nodes offer a service called 'storage containers'.
A storage container is a group unrelated of blocks that are managed together as a single entity.
SCM offers the following abstractions.
![SCM Abstractions](../../SCMBlockDiagram.png)
### Blocks
Blocks are similar to blocks in HDFS. They are replicated store of data. Client writes data to blocks.
### Containers
A collection of blocks replicated and managed together.
### Pipelines
SCM allows each storage container to choose its method of replication.
For example, a storage container might decide that it needs only one copy of a block
and might choose a stand-alone pipeline. Another storage container might want to have a very high level of reliability and pick a RATIS based pipeline. In other words, SCM allows different kinds of replication strategies to co-exist. The client while writing data, chooses a storage container with required properties.
### Pools
A group of data nodes is called a pool. For scaling purposes,
we define a pool as a set of machines. This makes management of data nodes easier.
### Nodes
The data node where data is stored. SCM monitors these nodes via heartbeat.
### Clients
Ozone ships with a set of clients. Ozone [CLI]({{< ref "CommandShell.md#shell" >}}) is the command line interface like 'hdfs' command. [Freon] ({{< ref "Freon.md" >}}) is a load generation tool for Ozone.
## S3 gateway
Ozone provides and [S3 compatible REST gateway server]({{< ref "S3.md">}}). All of the main S3 features are supported and any S3 compatible client library can be used.
### Ozone File System
[Ozone file system]({{< ref "OzoneFS.md">}}) is a Hadoop compatible file system. This allows Hadoop services and applications like Hive and Spark to run against
Ozone without any change. (For example: you can use `hdfs dfs -ls o3fs://...` instead of `hdfs dfs -ls hdfs://...`)
### Ozone Client
This is similar to DFSClient in HDFS. This is the standard client to talk to Ozone. All other components that we have discussed so far rely on Ozone client. Ozone client supports RPC protocol.

View File

@ -1,65 +0,0 @@
---
title: "Hadoop Distributed Data Store"
date: "2017-09-14"
menu:
main:
parent: Architecture
weight: 10
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
SCM Overview
------------
Storage Container Manager or SCM is a very important component of ozone. SCM
offers block and container-based services to Ozone Manager. A container is a
collection of unrelated blocks under ozone. SCM and data nodes work together
to maintain the replication levels needed by the cluster.
It is easier to look at a putKey operation to understand the role that SCM plays.
To put a key, a client makes a call to OM with the following arguments.
-- putKey(keyName, data, pipeline type, replication count)
1. keyName - refers to the file name.
2. data - The data that the client wants to write.
3. pipeline type - Allows the client to select the pipeline type. A pipeline
refers to the replication strategy used for replicating a block. Ozone
currently supports Stand Alone and Ratis as two different pipeline types.
4. replication count - This specifies how many copies of the block replica should be maintained.
In most cases, the client does not specify the pipeline type and replication
count. The default pipeline type and replication count are used.
Ozone Manager when it receives the putKey call, makes a call to SCM asking
for a pipeline instance with the specified property. So if the client asked
for RATIS replication strategy and a replication count of three, then OM
requests SCM to return a set of data nodes that meet this capability.
If SCM can find this a pipeline ( that is a set of data nodes) that can meet
the requirement from the client, then those nodes are returned to OM. OM will
persist this info and return a tuple consisting of {BlockID, ContainerName, and Pipeline}.
If SCM is not able to find a pipeline, then SCM creates a logical pipeline and then returns it.
SCM manages blocks, containers, and pipelines. To return healthy pipelines,
SCM also needs to understand the node health. So SCM listens to heartbeats
from data nodes and acts as the node manager too.

View File

@ -1,77 +0,0 @@
---
title: "Ozone Manager"
date: "2017-09-14"
menu:
main:
parent: Architecture
weight: 11
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
OM Overview
-------------
Ozone Manager or OM is the namespace manager for Ozone. The clients (RPC clients, Rest proxy, Ozone file system, etc.) communicate with OM to create and delete various ozone objects.
Each ozone volume is the root of a namespace under OM. This is very different from HDFS which provides a single rooted file system.
Ozone's namespace is a collection of volumes or is a forest instead of a
single rooted tree as in HDFS. This property makes it easy to deploy multiple
OMs for scaling, this feature is under development.
OM Metadata
-----------------
Conceptually, OM maintains a list of volumes, buckets, and keys. For each user, it maintains a list of volumes. For each volume, the list of buckets and for each bucket the list of keys.
Right now, OM is a single instance service. Ozone already relies on Apache Ratis (A Replicated State Machine based on Raft protocol). OM will be extended to replicate all its metadata via Ratis. With that, OM will be highly available.
OM UI
------------
OM supports a simple UI for the time being. The default port of OM is 9874. To access the OM UI, the user can connect to http://OM:port or for a concrete example,
```
http://omserver:9874/
```
OM UI primarily tries to measure load and latency of OM. The first section of OM UI relates to the number of operations seen by the cluster broken down by the object, operation and whether the operation was successful.
The latter part of the UI is focused on latency and number of operations that OM is performing.
One of the hardest problems in HDFS world is discovering the numerous settings offered to tune HDFS. Ozone solves that problem by tagging the configs. To discover settings, click on "Common Tools"->Config. This will take you to the ozone config UI.
Config UI
------------
The ozone config UI is a matrix with row representing the tags, and columns representing All, OM and SCM.
Suppose a user wanted to discover the required settings for ozone. Then the user can tick the checkbox that says "Required."
This will filter out all "Required" settings along with the description of what each setting does.
The user can combine different checkboxes and UI will combine the results. That is, If you have more than one row selected, then all keys for those chosen tags are displayed together.
We are hopeful that this leads to a more straightforward way of discovering settings that manage ozone.
OM and SCM
-------------------
[Storage container manager]({{< ref "Hdds.md" >}}) or (SCM) is the block manager
for ozone. When a client requests OM for a set of data nodes to write data, OM talks to SCM and gets a block.
A block returned by SCM contains a pipeline, which is a set of nodes that we participate in that block replication.
So OM is dependent on SCM for reading and writing of Keys. However, OM is independent of SCM while doing metadata operations like ozone volume or bucket operations.

View File

@ -1,110 +0,0 @@
---
title: "Ozone Security Overview"
date: "2019-April-03"
menu:
main:
parent: Architecture
weight: 11
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Security in Ozone #
Starting with badlands release (ozone-0.4.0-alpha) ozone cluster can be secured against external threats. Specifically it can be configured for following security features:
1. Authentication
2. Authorization
3. Audit
4. Transparent Data Encryption (TDE)
## Authentication ##
### Kerberos ###
Similar to hadoop, Ozone allows kerberos-based authentication. So one way to setup identities for all the daemons and clients is to create kerberos keytabs and configure it like any other service in hadoop.
### Tokens ###
Tokens are widely used in Hadoop to achieve lightweight authentication without compromising on security. Main motivation for using tokens inside Ozone is to prevent the unauthorized access while keeping the protocol lightweight and without sharing secret over the wire. Ozone utilizes three types of token:
#### Delegation token ####
Once client establishes their identity via kerberos they can request a delegation token from OzoneManager. This token can be used by a client to prove its identity until the token expires. Like Hadoop delegation tokens, an Ozone delegation token has 3 important fields:
1. **Renewer**: User responsible for renewing the token.
2. **Issue date**: Time at which token was issued.
3. **Max date**: Time after which token cant be renewed.
Token operations like get, renew and cancel can only be performed over an Kerberos authenticated connection. Clients can use delegation token to establish connection with OzoneManager and perform any file system/object store related operations like, listing the objects in a bucket or creating a volume etc.
#### Block Tokens ####
Block tokens are similar to delegation tokens in sense that they are signed by OzoneManager. Block tokens are created by OM (OzoneManager) when a client request involves interaction with DataNodes such as read/write Ozone keys.
Unlike delegation tokens there is no client API to request block tokens. Instead, they are handed transparently to client along with key/block locations. Block tokens are validated by Datanodes when receiving read/write requests from clients. Block token can't be renewed explicitly by client. Client with expired block token will need to refetch the key/block locations to get new block tokens.
#### S3Token ####
Like block tokens S3Tokens are handled transparently for clients. It is signed by S3secret created by client. S3Gateway creates this token for every s3 client request. To create an S3Token user must have a S3 secret.
### Certificates ###
Apart from kerberos and tokens Ozone utilizes certificate based authentication for Ozone service components. To enable this, SCM (StorageContainerManager) bootstraps itself as an Certificate Authority when security is enabled. This allows all daemons inside Ozone to have an SCM signed certificate. Below is brief descriptions of steps involved:
1. Datanodes and OzoneManagers submits a CSR (certificate signing request) to SCM.
2. SCM verifies identity of DN (Datanode) or OM via Kerberos and generates a certificate.
3. This certificate is used by OM and DN to prove their identities.
4. Datanodes use OzoneManager certificate to validate block tokens. This is possible because both of them trust SCM signed certificates. (i.e OzoneManager and Datanodes)
## Authorization ##
Ozone provides a pluggable API to control authorization of all client related operations. Default implementation allows every request. Clearly it is not meant for production environments. To configure a more fine grained policy one may configure Ranger plugin for Ozone. Since it is a pluggable module clients can also implement their own custom authorization policy and configure it using `ozone.acl.authorizer.class`.
## Audit ##
Ozone provides ability to audit all read & write operations to OM, SCM and Datanodes. Ozone audit leverages the Marker feature which enables user to selectively audit only READ or WRITE operations by a simple config change without restarting the service(s).
To enable/disable audit of READ operations, set filter.read.onMatch to NEUTRAL or DENY respectively. Similarly, the audit of WRITE operations can be controlled using filter.write.onMatch.
Generating audit logs is only half the job, so Ozone also provides AuditParser - a sqllite based command line utility to parse/query audit logs with predefined templates(ex. Top 5 commands) and options for custom query. Once the log file has been loaded to AuditParser, one can simply run a template as shown below:
ozone auditparser <path to db file> template top5cmds
Similarly, users can also execute custom query using:
```bash
ozone auditparser <path to db file> query "select * from audit where level=='FATAL'"
```
## Transparent Data Encryption ##
Ozone TDE setup process and usage are very similar to HDFS TDE. The major difference is that Ozone TDE is enabled at Ozone bucket level when a bucket is created.
To create an encrypted bucket, client need to
* Create a bucket encryption key with hadoop key CLI (same as you do for HDFS encryption zone key)
```bash
hadoop key create key1
```
* Create an encrypted bucket with -k option
```bash
ozone sh bucket create -k key1 /vol1/ez1
```
After that the usage will be transparent to the client and end users, i.e., all data written to encrypted bucket are encrypted at datanodes.
To know more about how to setup a secure Ozone cluster refer to [How to setup secure Ozone cluster]({{< ref "SetupSecureOzone.md" >}})
Ozone [security architecture document](https://issues.apache.org/jira/secure/attachment/12911638/HadoopStorageLayerSecurity.pdf) can be referred for a deeper dive into Ozone Security architecture.

View File

@ -1,74 +0,0 @@
---
title: Starting an Ozone Cluster
weight: 1
menu:
main:
parent: Starting
weight: 3
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Before we boot up the Ozone cluster, we need to initialize both SCM and Ozone Manager.
{{< highlight bash >}}
ozone scm --init
{{< /highlight >}}
This allows SCM to create the cluster Identity and initialize its state.
The ```init``` command is similar to Namenode format. Init command is executed only once, that allows SCM to create all the required on-disk structures to work correctly.
{{< highlight bash >}}
ozone --daemon start scm
{{< /highlight >}}
Once we know SCM is up and running, we can create an Object Store for our use. This is done by running the following command.
{{< highlight bash >}}
ozone om --init
{{< /highlight >}}
Once Ozone manager has created the Object Store, we are ready to run the name
services.
{{< highlight bash >}}
ozone --daemon start om
{{< /highlight >}}
At this point Ozone's name services, the Ozone manager, and the block service SCM is both running.
**Please note**: If SCM is not running
```om --init``` command will fail. SCM start will fail if on-disk data structures are missing. So please make sure you have done both ```scm --init``` and ```om --init``` commands.
Now we need to start the data nodes. Please run the following command on each datanode.
{{< highlight bash >}}
ozone --daemon start datanode
{{< /highlight >}}
At this point SCM, Ozone Manager and data nodes are up and running.
***Congratulations!, You have set up a functional ozone cluster.***
-------
If you want to make your life simpler, you can just run
{{< highlight bash >}}
ozone scm --init
ozone om --init
start-ozone.sh
{{< /highlight >}}
This assumes that you have set up the slaves file correctly and ssh
configuration that allows ssh-ing to all data nodes. This is the same as the
HDFS configuration, so please refer to HDFS documentation on how to set this
up.

View File

@ -1,98 +0,0 @@
---
title: "Setup secure ozone cluster"
date: "2019-April-03"
menu:
main:
parent: Architecture
weight: 11
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Setup secure ozone cluster #
To enable security in ozone cluster **ozone.security.enabled** should be set to true.
Property|Value
----------------------|------
ozone.security.enabled| true
## Kerberos ##
Configuration for service daemons:
Property|Description
--------|------------------------------------------------------------
hdds.scm.kerberos.principal | The SCM service principal. Ex scm/_HOST@REALM.COM_
hdds.scm.kerberos.keytab.file |The keytab file used by SCM daemon to login as its service principal.
ozone.om.kerberos.principal |The OzoneManager service principal. Ex om/_HOST@REALM.COM
ozone.om.kerberos.keytab.file |The keytab file used by SCM daemon to login as its service principal.
hdds.scm.http.kerberos.principal|SCM http server service principal.
hdds.scm.http.kerberos.keytab.file|The keytab file used by SCM http server to login as its service principal.
ozone.om.http.kerberos.principal|OzoneManager http server principal.
ozone.om.http.kerberos.keytab.file|The keytab file used by OM http server to login as its service principal.
ozone.s3g.keytab.file |The keytab file used by S3 gateway. Ex /etc/security/keytabs/HTTP.keytab
ozone.s3g.authentication.kerberos.principal|S3 Gateway principal. Ex HTTP/_HOST@EXAMPLE.COM
## Tokens ##
## Delegation token ##
Delegation tokens are enabled by default when security is enabled.
## Block Tokens ##
Property|Value
-----------------------------|------
hdds.block.token.enabled | true
## S3Token ##
S3 token are enabled by default when security is enabled.
To use S3 tokens users need to perform following steps:
* S3 clients should get the secret access id and user secret from OzoneManager.
```
ozone s3 getsecret
```
* Setup secret in aws configs:
```
aws configure set default.s3.signature_version s3v4
aws configure set aws_access_key_id ${accessId}
aws configure set aws_secret_access_key ${secret}
aws configure set region us-west-1
```
## Certificates ##
Certificates are used internally inside Ozone. Its enabled be default when security is enabled.
## Authorization ##
Default access authorizer for Ozone approves every request. It is not suitable for production environments. It is recommended that clients use ranger plugin for Ozone to manage authorizations.
Property|Value
--------|------------------------------------------------------------
ozone.acl.enabled | true
ozone.acl.authorizer.class| org.apache.ranger.authorization.ozone.authorizer.RangerOzoneAuthorizer
## TDE ##
To use TDE clients must set KMS URI.
Property|Value
-----------------------------------|-----------------------------------------
hadoop.security.key.provider.path | KMS uri. Ex kms://http@kms-host:9600/kms

View File

@ -1,5 +1,5 @@
---
title: Ozone Overview
title: Overview
menu: main
weight: -10
---
@ -22,11 +22,17 @@ weight: -10
# Apache Hadoop Ozone
Ozone is a scalable, distributed object store for Hadoop. Applications like
Apache Spark, Hive and YARN, can run against Ozone without any
modifications. Ozone comes with a [Java client library]({{< ref "JavaApi.md"
>}}), a [S3]({{< ref "S3.md" >}}) and a [command line interface]
({{< ref "CommandShell.md#shell" >}}) which makes it easy to use Ozone.
<img src="ozone-usage.png" style="max-width: 60%;"/>
*_Ozone is a scalable, redundant, and distributed object store for Hadoop. <p>
Apart from scaling to billions of objects of varying sizes,
Ozone can function effectively in containerized environments
like Kubernetes._* <p>
Applications like Apache Spark, Hive and YARN, work without any modifications when using Ozone. Ozone comes with a [Java client library]({{<
ref "JavaApi.md"
>}}), [S3 protocol support] ({{< ref "S3.md" >}}), and a [command line interface]
({{< ref "shell/_index.md" >}}) which makes it easy to use Ozone.
Ozone consists of volumes, buckets, and Keys:
@ -34,6 +40,6 @@ Ozone consists of volumes, buckets, and Keys:
* Buckets are similar to directories. A bucket can contain any number of keys, but buckets cannot contain other buckets.
* Keys are similar to files. A bucket can contain any number of keys.
<a href="{{< ref "RunningViaDocker.md" >}}"><button class="btn btn-danger btn-lg">Getting started</button></a>
<a href="{{< ref "start/_index.md" >}}"> <button type="button"
class="btn btn-success btn-lg">Next >></button>
</div>

View File

@ -0,0 +1,217 @@
---
title: "Ozone Containers"
summary: Ozone uses containers extensively for testing. This page documents the usage and best practices of Ozone.
weight: 2
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Docker heavily is used at the ozone development with three principal use-cases:
* __dev__:
* We use docker to start local pseudo-clusters (docker provides unified environment, but no image creation is required)
* __test__:
* We create docker images from the dev branches to test ozone in kubernetes and other container orchestator system
* We provide _apache/ozone_ images for each release to make it easier the evaluation of Ozone. These images are __not__ created __for production__ usage.
<div class="alert alert-warning" role="alert">
We <b>strongly</b> recommend that you create your own custom images when you
deploy ozone into production using containers. Please treat all the standard
shipped container images and k8s resources as examples and guides to help you
customize your own deployment.
</div>
* __production__:
* We document how can you create your own docker image for your production cluster.
Let's check out each of the use-cases in more detail:
## Development
Ozone artifact contains example docker-compose directories to make it easier to start Ozone cluster in your local machine.
From distribution:
```
cd compose/ozone
docker-compose up -d
```
After a local build
```
cd hadoop-ozone/dist/target/ozone-*/compose
docker-compose up -d
```
These environments are very important tools to start different type of Ozone clusters at any time.
To be sure that the compose files are up-to-date, we also provide acceptance test suites which start the cluster and check the basic behaviour.
The acceptance tests are part of the distribution, and you can find the test definitions in `./smoketest` directory.
You can start the tests from any compose directory:
For example:
```
cd compose/ozone
./test.sh
```
### Implementation details
`./compose` tests are based on the apache/hadoop-runner docker image. The image itself doesn't contain any Ozone jar file or binary just the helper scripts to start ozone.
hadoop-runner provdes a fixed environment to run Ozone everywhere, but the ozone distribution itself is mounted from the including directory:
(Example docker-compose fragment)
```
scm:
image: apache/hadoop-runner:jdk11
volumes:
- ../..:/opt/hadoop
ports:
- 9876:9876
```
The containers are conigured based on environment variables, but because the same environment variables should be set for each containers we maintain the list of the environment variables in a separated file:
```
scm:
image: apache/hadoop-runner:jdk11
#...
env_file:
- ./docker-config
```
The docker-config file contains the list of the required environment variables:
```
OZONE-SITE.XML_ozone.om.address=om
OZONE-SITE.XML_ozone.om.http-address=om:9874
OZONE-SITE.XML_ozone.scm.names=scm
OZONE-SITE.XML_ozone.enabled=True
#...
```
As you can see we use naming convention. Based on the name of the environment variable, the appropariate hadoop config XML (`ozone-site.xml` in our case) will be generated by a [script](https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest/scripts) which is included in the `hadoop-runner` base image.
The [entrypoint](https://github.com/apache/hadoop/blob/docker-hadoop-runner-latest/scripts/starter.sh) of the `hadoop-runner` image contains a helper shell script which triggers this transformation and cab do additional actions (eg. initialize scm/om storage, download required keytabs, etc.) based on environment variables.
## Test/Staging
The `docker-compose` based approach is recommended only for local test not for multi node cluster. To use containers on a multi-node cluster we need a Container Orchestrator like Kubernetes.
Kubernetes example files are included in the `kubernetes` folder.
*Please note*: all the provided images are based the `hadoop-runner` image which contains all the required tool for testing in staging environments. For production we recommend to create your own, hardened image with your own base image.
### Test the release
The release can be tested with deploying any of the example clusters:
```
cd kubernetes/examples/ozone
kubectl apply -f
```
Plese note that in this case the latest released container will be downloaded from the dockerhub.
### Test the development build
To test a development build you can create your own image and upload it to your own docker registry:
```
mvn clean install -f pom.ozone.xml -DskipTests -Pdocker-build,docker-push -Ddocker.image=myregistry:9000/name/ozone
```
The configured image will be used in all the generated kubernetes resources files (`image:` keys are adjusted during the build)
```
cd kubernetes/examples/ozone
kubectl apply -f
```
## Production
<div class="alert alert-danger" role="alert">
We <b>strongly</b> recommend to use your own image in your production cluster
and
adjust base image, umask, security settings, user settings according to your own requirements.
</div>
You can use the source of our development images as an example:
* Base image: https://github.com/apache/hadoop/blob/docker-hadoop-runner-jdk11/Dockerfile
* Docker image: https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/src/main/Dockerfile
Most of the elements are optional and just helper function but to use the provided example kubernetes resources you may need the scripts from [here](https://github.com/apache/hadoop/tree/docker-hadoop-runner-jdk11/scripts)
* The two python scripts convert environment variables to real hadoop XML config files
* The start.sh executes the python scripts (and other initialization) based on environment variables.
## Containers
Ozone related container images and source locations:
<table class="table table-dark">
<thead>
<tr>
<th scope="col">#</th>
<th scope="col">Container</th>
<th scope="col">Repository</th>
<th scope="col">Base</th>
<th scope="col">Branch</th>
<th scope="col">Tags</th>
<th scope="col">Comments</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">1</th>
<td>apache/ozone</td>
<td>https://github.com/apache/hadoop-docker-ozone</td>
<td>ozone-... </td>
<td>hadoop-runner</td>
<td>0.3.0,0.4.0,0.4.1</td>
<td>For each Ozone release we create new release tag.</td>
</tr>
<tr>
<th scope="row">2</th>
<td>apache/hadoop-runner </td>
<td>https://github.com/apache/hadoop</td>
<td>docker-hadoop-runner</td>
<td>centos</td>
<td>jdk11,jdk8,latest</td>
<td>This is the base image used for testing Hadoop Ozone.
This is a set of utilities that make it easy for us run ozone.</td>
</tr>
<tr>
<th scope="row">3</th>
<td>apache/ozone:build (WIP)</td>
<td>https://github.com/apache/hadoop-docker-ozone</td>
<td>ozone-build </td>
<td> </td>
<td>TODO: Add more documentation here.</td>
</tr>
</tbody>
</table>

View File

@ -1,9 +1,8 @@
---
title: "Dozone & Dev Tools"
title: "Docker Cheat Sheet"
date: 2017-08-10
menu:
main:
parent: Tools
summary: Docker Compose cheat sheet to help you remember the common commands to control an Ozone cluster running on top of Docker.
weight: 4
---
<!---
@ -23,43 +22,22 @@ menu:
limitations under the License.
-->
Dozone stands for docker for ozone. Ozone supports docker to make it easy to develop and test ozone. Starting a docker-based ozone container is simple.
In the `compose` directory of the ozone distribution there are multiple pseudo-cluster setup which can be used to run Ozone in different way (for example with secure cluster, with tracing enabled, with prometheus etc.).
In the `compose/ozone` directory there are two files that define the docker and ozone settings.
If the usage is not document in a specific directory the default usage is the following:
Developers can
{{< highlight bash >}}
```bash
cd compose/ozone
{{< /highlight >}}
and simply run
{{< highlight bash >}}
docker-compose up -d
{{< /highlight >}}
```
to run a ozone cluster on docker.
The data of the container is ephemeral and deleted together with the docker volumes. To force the deletion of existing data you can always delete all the temporary data:
This command will launch OM, SCM and a data node.
To access the OM UI, one can view http://localhost:9874.
_Please note_: dozone does not map the data node ports to the 9864. Instead, it maps to the ephemeral port range. So many examples in the command shell will not work if you run those commands from the host machine. To find out where the data node port is listening, you can run the `docker ps` command or always ssh into a container before running ozone commands.
To shutdown a running docker-based ozone cluster, please run
{{< highlight bash >}}
```bash
docker-compose down
{{< /highlight >}}
```
Adding more config settings
---------------------------
The file called `docker-config` contains all ozone specific config settings. This file is processed to create the ozone-site.xml.
Useful Docker & Ozone Commands
------------------------------
## Useful Docker & Ozone Commands
If you make any modifications to ozone, the simplest way to test it is to run freon and unit tests.
@ -102,7 +80,7 @@ You can start multiple data nodes with:
docker-compose scale datanode=3
{{< /highlight >}}
You can test the commands from the [Ozone CLI]({{< ref "CommandShell.md#shell" >}}) after opening a new bash shell in one of the containers:
You can test the commands from the [Ozone CLI]({{< ref "shell/_index.md" >}}) after opening a new bash shell in one of the containers:
{{< highlight bash >}}
docker-compose exec datanode bash

View File

@ -1,10 +1,8 @@
---
title: Running concurrently with HDFS
linktitle: Runing with HDFS
weight: 1
menu:
main:
parent: Starting
weight: 5
summary: Ozone is designed to run concurrently with HDFS. This page explains how to deploy Ozone in a exisiting HDFS cluster.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -26,15 +24,10 @@ menu:
Ozone is designed to work with HDFS. So it is easy to deploy ozone in an
existing HDFS cluster.
Ozone does *not* support security today. It is a work in progress and tracked
in
[HDDS-4](https://issues.apache.org/jira/browse/HDDS-4). If you enable ozone
in a secure HDFS cluster, for your own protection Ozone will refuse to work.
The container manager part of Ozone can run inside DataNodes as a pluggable module
or as a standalone component. This document describe how can it be started as
a HDFS datanode plugin.
In other words, till Ozone security work is done, Ozone will not work in any
secure clusters.
The container manager part of Ozone runs inside DataNodes as a pluggable module.
To activate ozone you should define the service plugin implementation class.
<div class="alert alert-warning" role="alert">

View File

@ -1,10 +1,8 @@
---
title: Ozone CLI
menu:
main:
parent: Client
weight: 1
identifier: OzoneShell
title: "Tools"
date: "2017-10-10"
summary: Ozone supports a set of tools that are handy for developers.Here is a quick list of command line tools.
weight: 3
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -49,55 +47,3 @@ The commands supported by ozone are:
* **version** - Prints the version of Ozone and HDDS.
* **genconf** - Generate minimally required ozone configs and output to
ozone-site.xml.
## Understanding Ozone command shell
The most used command when working with Ozone is the Ozone command shell.
Ozone command shell gives a command shell interface to work against
Ozone.
The Ozone shell commands take the following format.
> _ozone sh object action url_
**ozone** script is used to invoke all Ozone sub-commands. The ozone shell is
invoked via ```sh``` command.
The object can be a volume, bucket or a key. The action is various verbs like
create, list, delete etc.
Ozone URL can point to a volume, bucket or keys in the following format:
_\[scheme\]\[server:port\]/volume/bucket/key_
Where,
1. Scheme - This should be `o3` which is the native RPC protocol to access
Ozone API. The usage of the schema is optional.
2. Server:Port - This is the address of the Ozone Manager. This can be server
only, in that case, the default port is used. If this value is omitted
then the defaults specified in the ozone-site.xml will be used for Ozone
Manager address.
Depending on the call, the volume/bucket/key names will be part of the URL.
Please see volume commands, bucket commands, and key commands section for more
detail.
## Invoking help
Ozone shell help can be invoked at _object_ level or at _action_ level.
For example:
{{< highlight bash >}}
ozone sh volume --help
{{< /highlight >}}
This will show all possible actions for volumes.
or it can be invoked to explain a specific action like
{{< highlight bash >}}
ozone sh volume create --help
{{< /highlight >}}
This command will give you command line options of the create command.

View File

@ -0,0 +1,30 @@
---
title: "Beyond Basics"
date: "2017-10-10"
menu: main
weight: 7
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{<jumbotron title="Beyond Basics">}}
Beyond Basics pages go into custom configurations of Ozone, including how
to run Ozone concurrently with an existing HDFS cluster. These pages also
take deep into how to run profilers and leverage tracing support built into
Ozone.
{{</jumbotron>}}

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

View File

@ -0,0 +1,75 @@
---
title: "Datanodes"
date: "2017-09-14"
weight: 4
summary: Ozone supports Amazon's Simple Storage Service (S3) protocol. In fact, You can use S3 clients and S3 SDK based applications without any modifications with Ozone.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Datanodes are the worker bees of Ozone. All data is stored on data nodes.
Clients write data in terms of blocks. Datanode aggregates these blocks into
a storage container. A storage container is the data streams and metadata
about the blocks written by the clients.
## Storage Containers
![FunctionalOzone](ContainerMetadata.png)
A storage container is a self-contained super block. It has a list of Ozone
blocks that reside inside it, as well as on-disk files which contain the
actual data streams. This is the default Storage container format. From
Ozone's perspective, container is a protocol spec, actual storage layouts
does not matter. In other words, it is trivial to extend or bring new
container layouts. Hence this should be treated as a reference implementation
of containers under Ozone.
## Understanding Ozone Blocks and Containers
When a client wants to read a key from Ozone, the client sends the name of
the key to the Ozone Manager. Ozone manager returns the list of Ozone blocks
that make up that key.
An Ozone block contains the container ID and a local ID. The figure below
shows the logical layout out of Ozone block.
![OzoneBlock](OzoneBlock.png)
The container ID lets the clients discover the location of the container. The
authoritative information about where a container is located is with the
Storage Container Manager or SCM. In most cases, the container location will
cached by Ozone Manager and will be returned along with the Ozone blocks.
Once the client is able to locate the contianer, that is, understand which
data nodes contain this container, the client will connect to the datanode
read the data the data stream specified by container ID:Local ID. In other
words, the local ID serves as index into the container which describes what
data stream we want to read from.
### Discovering the Container Locations
How does SCM know where the containers are located ? This is very similar to
what HDFS does; the data nodes regularly send container reports like block
reports. Container reports are far more concise than block reports. For
example, an Ozone deployment with a 196 TB data node will have around 40
thousand containers. Compare that with HDFS block count of million and half
blocks that get reported. That is a 40x reduction in the block reports.
This extra indirection helps tremendously with scaling Ozone. SCM has far
less block data to process and the name node is a different service are
critical to scaling Ozone.

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

View File

@ -0,0 +1,52 @@
---
title: "Storage Container Manager"
date: "2017-09-14"
weight: 3
summary: Storage Container Manager or SCM is the core metadata service of Ozone. SCM provides a distributed block layer for Ozone.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Storage container manager provides multiple critical functions for the Ozone
cluster. SCM acts as the cluster manager, Certificate authority, Block
manager and the replica manager.
{{<card title="Cluster Management" icon="tasks">}}
SCM is in charge of creating an Ozone cluster. When an SCM is booted up via <kbd>init</kbd> command, SCM creates the cluster identity and root certificates needed for the SCM certificate authority. SCM manages the life cycle of a data node in the cluster.
{{</card>}}
{{<card title="Service Identity Management" icon="eye-open">}}
SCM's Ceritificate authority is in
charge of issuing identity certificates for each and every
service in the cluster. This certificate infrastructre makes
it easy to enable mTLS at network layer and also the block
token infrastructure depends on this certificate infrastructure.
{{</card>}}
{{<card title="Block Management" icon="th">}}
SCM is the block manager. SCM
allocates blocks and assigns them to data nodes. Clients
read and write these blocks directly.
{{</card>}}
{{<card title="Replica Management" icon="link">}}
SCM keeps track of all the block
replicas. If there is a loss of data node or a disk, SCM
detects it and instructs data nodes make copies of the
missing blocks to ensure high avialablity.
{{</card>}}

View File

@ -0,0 +1,81 @@
---
title: Overview
date: "2017-10-10"
weight: 1
summary: Ozone's overview and components that make up Ozone.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Ozone is a redundant, distributed object store optimized for Big data
workloads. The primary design point of ozone is scalability, and it aims to
scale to billions of objects.
Ozone separates namespace management and block space management; this helps
ozone to scale much better. The namespace is managed by a daemon called
[Ozone Manager ]({{< ref "OzoneManager.md" >}}) (OM), and block space is
managed by [Storage Container Manager] ({{< ref "Hdds.md" >}}) (SCM).
Ozone consists of volumes, buckets, and keys.
A volume is similar to a home directory in the ozone world.
Only an administrator can create it.
Volumes are used to store buckets.
Once a volume is created users can create as many buckets as needed.
Ozone stores data as keys which live inside these buckets.
Ozone namespace is composed of many storage volumes.
Storage volumes are also used as the basis for storage accounting.
The block diagram shows the core components of Ozone.
![Architecture diagram](ozoneBlockDiagram.png)
The Ozone Manager is the name space manager, Storage Container Manager
manages the physical and data layer and Recon is the management interface for
Ozone.
## Different Perspectives
![FunctionalOzone](FunctionalOzone.png)
Any distributed system can viewed from different perspectives. One way to
look at Ozone is to imagine it as Ozone Manager as a name space service built on
top of HDDS, a distributed block store.
Another way to visualize Ozone is to look at the functional layers; we have a
metadata data management layer, composed of Ozone Manager and Storage
Container Manager.
We have a data storage layer, which is basically the data nodes and they are
managed by SCM.
The replication layer, provided by Ratis is used to replicate metadata (Ozone
Manager and SCM) and also used for consistency when data is modified at the
data nodes.
We have a management server called Recon, that talks to all other components
of Ozone and provides a unified management API and UX for Ozone.
We have a protocol bus that allows Ozone to be extended via other
protocols. We currently only have S3 protocol support built via Protocol bus.
Protocol Bus provides a generic notion that you can implement new file system
or object store protocols that call into O3 Native protocol.

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.5 KiB

View File

@ -0,0 +1,87 @@
---
title: "Ozone Manager"
date: "2017-09-14"
weight: 2
summary: Ozone Manager is the principal name space service of Ozone. OM manages the life cycle of volumes, buckets and Keys.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Ozone Manager or OM is the namespace manager for Ozone.
This means that when you want to write some data, you ask Ozone
manager for a block and Ozone Manager gives you a block and remembers that
information. When you want to read the that file back, you need to find the
address of the block and Ozone manager returns it you.
Ozone manager also allows users to organize keys under a volume and bucket.
Volumes and buckets are part of the namespace and managed by Ozone Manager.
Each ozone volume is the root of an independent namespace under OM.
This is very different from HDFS which provides a single rooted file system.
Ozone's namespace is a collection of volumes or is a forest instead of a
single rooted tree as in HDFS. This property makes it easy to deploy multiple
OMs for scaling.
## Ozone Manager Metadata
OM maintains a list of volumes, buckets, and keys.
For each user, it maintains a list of volumes.
For each volume, the list of buckets and for each bucket the list of keys.
Ozone Manager will use Apache Ratis(A Raft protocol implementation) to
replicate Ozone Manager state. This will ensure High Availability for Ozone.
## Ozone Manager and Storage Container Manager
The relationship between Ozone Manager and Storage Container Manager is best
understood if we trace what happens during a key write and key read.
### Key Write
* To write a key to Ozone, a client tells Ozone manager that it would like to
write a key into a bucket that lives inside a specific volume. Once Ozone
manager determines that you are allowed to write a key to specified bucket,
OM needs to allocate a block for the client to write data.
* To allocate a block, Ozone manager sends a request to Storage Container
Manager or SCM; SCM is the manager of data nodes. SCM picks three data nodes
into which client can write data. SCM allocates the block and returns the
block ID to Ozone Manager.
* Ozone manager records this block information in its metadata and returns the
block and a block token (a security permission to write data to the block)
the client.
* The client uses the block token to prove that it is allowed to write data to
the block and writes data to the data node.
* Once the write is complete on the data node, the client will update the block
information on
Ozone manager.
### Key Reads
* Key reads are simpler, the client requests the block list from the Ozone
Manager
* Ozone manager will return the block list and block tokens which
allows the client to read the data from nodes.
* Client connects to the data node and presents the block token and reads
the data from the data node.

View File

@ -0,0 +1,33 @@
---
title: Concepts
date: "2017-10-10"
menu: main
weight: 6
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{<jumbotron title="Ozone Architecture">}}
Ozone's architectural elements are explained in the following pages. The
metadata layer, data layer, protocol bus, replication layer and Recon are
discussed in the following pages. These concepts are useful if you want to
understand how ozone works in depth.
{{</jumbotron>}}

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

View File

@ -1,9 +1,8 @@
---
title: "Java API"
date: "2017-09-14"
menu:
main:
parent: "Client"
weight: 1
summary: Ozone has a set of Native RPC based APIs. This is the lowest level API's on which all other protocols are built. This is the most performant and feature-full of all Ozone protocols.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -22,11 +21,8 @@ menu:
limitations under the License.
-->
Introduction
-------------
Ozone ships with its own client library that supports RPC. For generic use cases the S3
compatible REST interface also can be used instead of the Ozone client.
Ozone ships with its own client library that supports RPC. For generic use cases the S3
compatible REST interface also can be used instead of the Ozone client.
## Creating an Ozone client

View File

@ -1,12 +1,8 @@
---
title: Ozone File System
weight: 1
date: 2017-09-14
menu: main
menu:
main:
parent: Starting
weight: 4
weight: 2
summary: Hadoop Compatible file system allows any application that expects an HDFS like interface to work against Ozone with zero changes. Frameworks like Apache Spark, YARN and Hive work against Ozone without needing any change.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -25,7 +21,9 @@ menu:
limitations under the License.
-->
There are many Hadoop compatible files systems under Hadoop. Hadoop compatible file systems ensures that storage backends like Ozone can easily be integrated into Hadoop eco-system.
The Hadoop compatible file system interface allpws storage backends like Ozone
to be easily integrated into Hadoop eco-system. Ozone file system is an
Hadoop compatible file system.
## Setting up the Ozone file system
@ -102,21 +100,21 @@ The second one contains all the dependency in an internal, separated directory,
With this method the hadoop-ozone-filesystem-lib-legacy.jar can be used from
any older hadoop version (eg. hadoop 3.1, hadoop 2.7 or spark+hadoop 2.7)
Similar to the dependency jar, there are two OzoneFileSystem implementation.
For hadoop 3.0 and newer, you can use `org.apache.hadoop.fs.ozone.OzoneFileSystem`
For hadoop 3.0 and newer, you can use `org.apache.hadoop.fs.ozone.OzoneFileSystem`
which is a full implementation of the Hadoop compatible File System API.
For Hadoop 2.x you should use the Basic version: `org.apache.hadoop.fs.ozone.BasicOzoneFileSystem`.
This is the same implementation but doesn't include the features/dependencies which are added with
This is the same implementation but doesn't include the features/dependencies which are added with
Hadoop 3.0. (eg. FS statistics, encryption zones).
### Summary
The following table summarize which jar files and implementation should be used:
Hadoop version | Required jar | OzoneFileSystem implementation
---------------|-------------------------|----------------------------------------------------
3.2 | filesystem-lib-current | org.apache.hadoop.fs.ozone.OzoneFileSystem

View File

@ -1,9 +1,7 @@
---
title: S3
menu:
main:
parent: Client
weight: 1
title: S3 Protocol
weight: 3
summary: Ozone supports Amazon's Simple Storage Service (S3) protocol. In fact, You can use S3 clients and S3 SDK based applications without any modifications with Ozone.
---
<!---
@ -23,6 +21,7 @@ menu:
limitations under the License.
-->
Ozone provides S3 compatible REST interface to use the object store data with any S3 compatible tools.
## Getting started

View File

@ -0,0 +1,27 @@
---
title: "Programming Interfaces"
menu:
main:
weight: 4
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{<jumbotron title="Multi-Protocol Support">}}
Ozone is a multi-protocol file system. There are different protocols by which
users can access data on Ozone.
{{</jumbotron>}}

View File

@ -1,8 +1,7 @@
---
title: Monitoring with Prometheus
menu:
main:
parent: Recipes
summary: A Simple recipe to monitor Ozone using Prometheus
linktitle: Prometheus
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -23,12 +22,12 @@ menu:
[Prometheus](https://prometheus.io/) is an open-source monitoring server developed under under the [Cloud Native Computing Foundation](https://www.cncf.io/).
Ozone supports Prometheus out of the box. The servers start a prometheus
Ozone supports Prometheus out of the box. The servers start a prometheus
compatible metrics endpoint where all the available hadoop metrics are published in prometheus exporter format.
## Prerequisites
1. [Install the and start]({{< ref "RunningViaDocker.md" >}}) an Ozone cluster.
1. [Install the and start]({{< ref "start/RunningViaDocker.md" >}}) an Ozone cluster.
2. [Download](https://prometheus.io/download/#prometheus) the prometheus binary.
## Monitoring with prometheus

View File

@ -1,8 +1,7 @@
---
title: Spark in Kubernetes with OzoneFS
menu:
main:
parent: Recipes
linktitle: Spark
summary: How to use Apache Spark with Ozone on K8s?
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -21,9 +20,6 @@ menu:
limitations under the License.
-->
Using Ozone from Apache Spark
===
This recipe shows how Ozone object store can be used from Spark using:
- OzoneFS (Hadoop compatible file system)
@ -34,7 +30,7 @@ This recipe shows how Ozone object store can be used from Spark using:
## Requirements
Download latest Spark and Ozone distribution and extract them. This method is
Download latest Spark and Ozone distribution and extract them. This method is
tested with the `spark-2.4.0-bin-hadoop2.7` distribution.
You also need the following:
@ -47,7 +43,7 @@ You also need the following:
### Create the base Spark driver/executor image
First of all create a docker image with the Spark image creator.
First of all create a docker image with the Spark image creator.
Execute the following from the Spark distribution
```

View File

@ -0,0 +1,28 @@
---
title: Recipes
date: "2017-10-10"
menu: main
weight: 8
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{<jumbotron title="Recipes of Ozone">}}
Standard How-to documents which describe how to use Ozone with other Software. For example, How to use Ozone with Apache Spark.
{{</jumbotron>}}

View File

@ -0,0 +1,43 @@
---
title: "Apache Ranger"
date: "2019-April-03"
weight: 5
summary: Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
icon: user
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Apache Ranger™ is a framework to enable, monitor and manage comprehensive data
security across the Hadoop platform. The next version(any version after 1.20)
of Apache Ranger is aware of Ozone, and can manage an Ozone cluster.
To use Apache Ranger, you must have Apache Ranger installed in your Hadoop
Cluster. For installation instructions of Apache Ranger, Please take a look
at the [Apache Ranger website](https://ranger.apache.org/index.html).
If you have a working Apache Ranger installation that is aware of Ozone, then
configuring Ozone to work with Apache Ranger is trivial. You have to enable
the ACLs support and set the acl authorizer class inside Ozone to be Ranger
authorizer. Please add the following properties to the ozone-site.xml.
Property|Value
--------|------------------------------------------------------------
ozone.acl.enabled | true
ozone.acl.authorizer.class| org.apache.ranger.authorization.ozone.authorizer.RangerOzoneAuthorizer

View File

@ -0,0 +1,177 @@
---
title: "Securing Ozone"
date: "2019-April-03"
summary: Overview of Ozone security concepts and steps to secure Ozone Manager and SCM.
weight: 1
icon: tower
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Kerberos
Ozone depends on [Kerberos](https://web.mit.edu/kerberos/) to make the
clusters secure. Historically, HDFS has supported running in an isolated
secure networks where it is possible to deploy without securing the cluster.
This release of Ozone follows that model, but soon will move to _secure by
default._ Today to enable security in ozone cluster, we need to set the
configuration **ozone.security.enabled** to true.
Property|Value
----------------------|---------
ozone.security.enabled| **true**
# Tokens #
Ozone uses a notion of tokens to avoid overburdening the Kerberos server.
When you serve thousands of requests per second, involving Kerberos might not
work well. Hence once an authentication is done, Ozone issues delegation
tokens and block tokens to the clients. These tokens allow applications to do
specified operations against the cluster, as if they have kerberos tickets
with them. Ozone supports following kinds of tokens.
### Delegation Token ###
Delegation tokens allow an application to impersonate a users kerberos
credentials. This token is based on verification of kerberos identity and is
issued by the Ozone Manager. Delegation tokens are enabled by default when
security is enabled.
### Block Token ###
Block tokens allow a client to read or write a block. This is needed so that
data nodes know that the user/client has permission to read or make
modifications to the block.
### S3Token ###
S3 uses a very different shared secret security scheme. Ozone supports the AWS Signature Version 4 protocol,
and from the end users perspective Ozone's s3 feels exactly like AWS S3.
The S3 credential tokens are called S3 tokens in the code. These tokens are
also enabled by default when security is enabled.
Each of the service daemons that make up Ozone needs a Kerberos service
principal name and a corresponding [kerberos key tab]({{https://web.mit.edu/kerberos/krb5-latest/doc/basic/keytab_def.html}}) file.
All these settings should be made in ozone-site.xml.
<div class="card-group">
<div class="card">
<div class="card-body">
<h3 class="card-title">Storage Container Manager</h3>
<p class="card-text">
<br>
SCM requires two Kerberos principals, and the corresponding key tab files
for both of these principals.
<br>
<table class="table table-dark">
<thead>
<tr>
<th scope="col">Property</th>
<th scope="col">Description</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">hdds.scm.kerberos.principal</th>
<td>The SCM service principal. e.g. scm/HOST@REALM.COM</td>
</tr>
<tr>
<th scope="row">hdds.scm.kerberos.keytab.file</th>
<td>The keytab file used by SCM daemon to login as its service principal.</td>
</tr>
<tr>
<th scope="row">hdds.scm.http.kerberos.principal</th>
<td>SCM http server service principal.</td>
</tr>
<tr>
<th scope="row">hdds.scm.http.kerberos.keytab.file</th>
<td>The keytab file used by SCM http server to login as its service principal.</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="card">
<div class="card-body">
<h3 class="card-title">Ozone Manager</h3>
<p class="card-text">
<br>
Like SCM, OM also requires two Kerberos principals, and the
corresponding key tab files for both of these principals.
<br>
<table class="table table-dark">
<thead>
<tr>
<th scope="col">Property</th>
<th scope="col">Description</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">ozone.om.kerberos.principal </th>
<td>The OzoneManager service principal. e.g. om/_HOST@REALM
.COM</td>
</tr>
<tr>
<th scope="row">ozone.om.kerberos.keytab.file</th>
<td>TThe keytab file used by SCM daemon to login as its service principal.</td>
</tr>
<tr>
<th scope="row">ozone.om.http.kerberos.principal</th>
<td>Ozone Manager http server service principal.</td>
</tr>
<tr>
<th scope="row"> ozone.om.http.kerberos.keytab.file</th>
<td>The keytab file used by OM http server to login as its service principal.</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="card">
<div class="card-body">
<h3 class="card-title">S3 Gateway</h3>
<p class="card-text">
<br>
S3 gateway requires one service principal and here the configuration values
needed in the ozone-site.xml.
<br>
<table class="table table-dark">
<thead>
<tr>
<th scope="col">Property</th>
<th scope="col">Description</th>
</tr>
</thead>
<tr>
<th scope="row">ozone.s3g.keytab.file</th>
<td>The keytab file used by S3 gateway</td>
</tr>
<tr>
<th scope="row">ozone.s3g.authentication.kerberos
.principal</th>
<td>S3 Gateway principal. e.g. HTTP/_HOST@EXAMPLE.COM</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>

View File

@ -0,0 +1,70 @@
---
title: "Securing Datanodes"
date: "2019-April-03"
weight: 2
summary: Explains different modes of securing data nodes. These range from kerberos to auto approval.
icon: th
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Datanodes under Hadoop is traditionally secured by creating a Keytab file on
the data nodes. With Ozone, we have moved away to using data node
certificates. That is, Kerberos on data nodes is not needed in case of a
secure Ozone cluster.
However, we support the legacy Kerberos based Authentication to make it easy
for the current set of users.The HDFS configuration keys are the following
that is setup in hdfs-site.xml.
Property|Example Value|Comment
--------|--------------|--------------
dfs.datanode.keytab.file| /keytab/dn.service.keytab| Keytab file.
dfs.datanode.kerberos.principal| dn/_HOST@REALM.TLD| principal name.
## How a data node becomes secure.
Under Ozone, when a data node boots up and discovers SCM's address, the first
thing that data node does is to create a private key and send a certificate
request to the SCM.
<h3>Certificate Approval via Kerberos <span class="badge badge-secondary">Current Model</span></h3>
SCM has a built-in CA, and SCM has to approve this request. If the data node
already has a Kerberos key tab, then SCM will trust Kerberos credentials and
issue a certificate automatically.
<h3>Manual Approval <span class="badge badge-primary">In Progress</span></h3>
If these are band new data nodes and Kerberos key tabs are not present at the
data nodes, then this request for the data nodes identity certificate is
queued up for approval from the administrator(This is work in progress,
not committed in Ozone yet). In other words, the web of trust is established
by the administrator of the cluster.
<h3>Automatic Approval <span class="badge badge-secondary">In Progress</span></h3>
If you running under an container orchestrator like Kubernetes, we rely on
Kubernetes to create a one-time token that will be given to data node during
boot time to prove the identity of the data node container (This is also work
in progress.)
Once a certificate is issued, a Data node is secure and Ozone manager can
issue block tokens. If there is no data node certificates or the SCM's root
certificate is not present in the data node, then data node will register
itself and down load the SCM's root certificate as well get the certificates
for itself.

View File

@ -0,0 +1,61 @@
---
title: "Securing S3"
date: "2019-April-03"
summary: Ozone supports S3 protocol, and uses AWS Signature Version 4 protocol which allows a seamless S3 experience.
weight: 4
icon: cloud
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
To access an S3 bucket, users need AWS access key ID and AWS secret. Both of
these are generated by going to AWS website. When you use Ozone's S3
protocol, you need the same AWS access key and secret.
Under Ozone, the clients can download the access key directly from Ozone.
The user needs to `kinit` first and once they have authenticated via kerberos
they can download the S3 access key ID and AWS secret. Just like AWS S3,
both of these are secrets that needs to be protected by the client since it
gives full access to the S3 buckets.
* S3 clients can get the secret access id and user secret from OzoneManager.
```
ozone s3 getsecret
```
This command will talk to ozone, validate the user via kerberos and generate
the AWS credentials. The values will be printed out on the screen. You can
set these values up in your .aws file for automatic access while working
against Ozone S3 buckets.
<div class="alert alert-danger" role="alert">
Please note: These S3 crediantials are like your kerberos passswords
that give compelete access to your buckets.
</div>
* Now you can proceed to setup these secrets in aws configs:
```
aws configure set default.s3.signature_version s3v4
aws configure set aws_access_key_id ${accessId}
aws configure set aws_secret_access_key ${secret}
aws configure set region us-west-1
```
Please refer to AWS S3 documentation on how to use S3 via command line or via
S3 API.

View File

@ -0,0 +1,66 @@
---
title: "Transparent Data Encryption"
date: "2019-April-03"
summary: TDE allows data on the disks to be encrypted-at-rest and automatically decrypted during access. You can enable this per key or per bucket.
weight: 3
icon: lock
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## Transparent Data Encryption
Ozone TDE setup process and usage are very similar to HDFS TDE.
The major difference is that Ozone TDE is enabled at Ozone bucket level
when a bucket is created.
### Setting up the Key Management Server
To use TDE, clients must setup a Key Management server and provide that URI to
Ozone/HDFS. Since Ozone and HDFS can use the same Key Management Server, this
configuration can be provided via *hdfs-site.xml*.
Property| Value
-----------------------------------|-----------------------------------------
hadoop.security.key.provider.path | KMS uri. e.g. kms://http@kms-host:9600/kms
### Using Transparent Data Encryption
If this is already configured for your cluster, then you can simply proceed
to create the encryption key and enable encrypted buckets.
To create an encrypted bucket, client need to:
* Create a bucket encryption key with hadoop key CLI, which is similar to
how you would use HDFS encryption zones.
```bash
hadoop key create encKey
```
The above command creates an encryption key for the bucket you want to protect.
Once the key is created, you can tell Ozone to use that key when you are
reading and writing data into a bucket.
* Assign the encryption key to a bucket.
```bash
ozone sh bucket create -k encKey /vol/encryptedBucket
```
After this command, all data written to the _encryptedBucket_ will be encrypted
via the encKey and while reading the clients will talk to Key Management
Server and read the key and decrypt it. In other words, the data stored
inside Ozone is always encrypted. The fact that data is encrypted at rest
will be completely transparent to the clients and end users.

View File

@ -0,0 +1,85 @@
---
title: "Ozone ACLs"
date: "2019-April-03"
weight: 6
summary: Native ACL support provides ACL functionality without Ranger integration.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Ozone supports a set of native ACLs. These ACLs cane be used independently or
along with Ranger. If Apache Ranger is enabled, then ACL will be checked
first with Ranger and then Ozone's internal ACLs will be evaluated.
Ozone ACLs are a super set of Posix and S3 ACLs.
The general format of an ACL is _object_:_who_:_rights_.
Where an _object_ can be:
1. **Volume** - An Ozone volume. e.g. /volume
2. **Bucket** - An Ozone bucket. e.g. /volume/bucket
3. **Key** - An object key or an object. e.g. /volume/bucket/key
4. **Prefix** - A path prefix for a specific key. e.g. /volume/bucket/prefix1/prefix2
Where a _who_ can be:
1. **User** - A user in the Kerberos domain. User like in Posix world can be
named or unnamed.
2. **Group** - A group in the Kerberos domain. Group also like in Posix world
can
be named or unnamed.
3. **World** - All authenticated users in the Kerberos domain. This maps to
others in the Posix domain.
4. **Anonymous** - Ignore the user field completely. This is an extension to
the Posix semantics, This is needed for S3 protocol, where we express that
we have no way of knowing who the user is or we don't care.
<div class="alert alert-success" role="alert">
A S3 user accesing Ozone via AWS v4 signature protocol will be translated
to the appropriate Kerberos user by Ozone Manager.
</div>
Where a _right_ can be:
1. **Create** This ACL provides a user the ability to create buckets in a
volume and keys in a bucket. Please note: Under Ozone, Only admins can create volumes.
2. **List** This ACL allows listing of buckets and keys. This ACL is attached
to the volume and buckets which allow listing of the child objects. Please note: The user and admins can list the volumes owned by the user.
3. **Delete** Allows the user to delete a volume, bucket or key.
4. **Read** Allows the user to read the metadata of a Volume and Bucket and
data stream and metadata of a key(object).
5. **Write** - Allows the user to write the metadata of a Volume and Bucket and
allows the user to overwrite an existing ozone key(object).
6. **Read_ACL** Allows a user to read the ACL on a specific object.
7. **Write_ACL** Allows a user to write the ACL on a specific object.
<h3>Ozone Native ACL APIs <span class="badge badge-secondary">Work in
progress</span></h3>
The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs
supported are:
1. **SetAcl** This API will take user principal, the name of the object, type
of the object and a list of ACLs.
2. **GetAcl** This API will take the name of an ozone object and type of the
object and will return a list of ACLs.
3. **RemoveAcl** - It is possible that we might support an API called RemoveACL
as a convenience API, but in reality it is just a GetACL followed by SetACL
with an etag to avoid conflicts.

View File

@ -0,0 +1,36 @@
---
title: Security
name: Security
identifier: SecureOzone
menu: main
weight: 5
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{<jumbotron title="Securing Ozone">}}
Ozone is an enterprise class, secure storage system. There many
optional security features in Ozone. Following pages discuss how
you can leverage the security features of Ozone.
{{</jumbotron>}}
<div class="alert alert-warning" role="alert">
If you would like to understand Ozone's security architecture at a greater
depth, please take a look at <a href="https://issues.apache.org/jira/secure/attachment/12911638/HadoopStorageLayerSecurity.pdf">Ozone security architecture.</a>
</div>
Depending on your needs, there are multiple optional steps in securing ozone.

View File

@ -1,9 +1,7 @@
---
title: Bucket Commands
menu:
main:
parent: OzoneShell
weight: 2
summary: Bucket commands help you to manage the life cycle of a volume.
weight: 2
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more

View File

@ -0,0 +1,69 @@
---
title: Shell Overview
summary: Explains the command syntax used by shell command.
weight: 1
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Ozone shell help can be invoked at _object_ level or at _action_ level.
For example:
{{< highlight bash >}}
ozone sh volume --help
{{< /highlight >}}
This will show all possible actions for volumes.
or it can be invoked to explain a specific action like
{{< highlight bash >}}
ozone sh volume create --help
{{< /highlight >}}
This command will give you command line options of the create command.
</p>
### General Command Format
The Ozone shell commands take the following format.
> _ozone sh object action url_
**ozone** script is used to invoke all Ozone sub-commands. The ozone shell is
invoked via ```sh``` command.
The object can be a volume, bucket or a key. The action is various verbs like
create, list, delete etc.
Ozone URL can point to a volume, bucket or keys in the following format:
_\[scheme\]\[server:port\]/volume/bucket/key_
Where,
1. **Scheme** - This should be `o3` which is the native RPC protocol to access
Ozone API. The usage of the schema is optional.
2. **Server:Port** - This is the address of the Ozone Manager. If the port is
omitted the default port from ozone-site.xml will be used.
Depending on the call, the volume/bucket/key names will be part of the URL.
Please see volume commands, bucket commands, and key commands section for more
detail.

View File

@ -1,9 +1,8 @@
---
title: Key Commands
menu:
main:
parent: OzoneShell
weight: 3
summary: Key commands help you to manage the life cycle of
Keys / Objects.
weight: 4
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -22,6 +21,7 @@ menu:
limitations under the License.
-->
Ozone shell supports the following key commands.
* [get](#get)
@ -137,8 +137,5 @@ ozone sh key rename /hive/jan sales.orc new_name.orc
{{< /highlight >}}
The above command will rename `sales.orc` to `new_name.orc` in the bucket `/hive/jan`.
You can try out these commands from the docker instance of the [Alpha
Cluster](runningviadocker.html).

View File

@ -1,9 +1,7 @@
---
title: Volume Commands
menu:
main:
parent: OzoneShell
weight: 1
weight: 2
summary: Volume commands help you to manage the life cycle of a volume.
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more

View File

@ -0,0 +1,28 @@
---
title: Command Line Interface
menu:
main:
weight: 3
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{<jumbotron title="OzoneShell">}}
Ozone shell is the primary interface to interact with Ozone.
It provides a command shell interface to work against Ozone.
{{</jumbotron>}}

View File

@ -1,10 +1,5 @@
---
title: Building from Sources
weight: 1
menu:
main:
parent: Starting
weight: 6
title: From Source
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -23,11 +18,17 @@ menu:
limitations under the License.
-->
***This is a guide on how to build the ozone sources. If you are <font
color="red">not</font>
planning to build sources yourself, you can safely skip this page.***
{{< requirements >}}
* Java 1.8
* Maven
* Protoc (2.5)
{{< /requirements >}}
If you are a Hadoop ninja, and wise in the ways of Apache, you already know
<div class="alert alert-info" role="alert">This is a guide on how to build the ozone sources. If you are <font
color="red">not</font>
planning to build sources yourself, you can safely skip this page.</div>
If you are a Hadoop ninja, and wise in the ways of Apache, you already know
that a real Apache release is a source release.
If you want to build from sources, Please untar the source tarball and run
@ -45,9 +46,10 @@ You can copy this tarball and use this instead of binary artifacts that are
provided along with the official release.
## How to test the build
You can run the acceptance tests in the hadoop-ozone directory to make sure
that your build is functional. To launch the acceptance tests, please follow
the instructions in the **README.md** in the `smoketest` directory.
the instructions in the **README.md** in the `smoketest` directory.
```bash
cd smoketest
@ -61,6 +63,5 @@ cd smoketest
./test.sh --env ozone basic
```
Acceptance tests
will start a small ozone cluster and verify that ozone shell and ozone file
system is fully functional.
Acceptance tests will start a small ozone cluster and verify that ozone shell and ozone file
system is fully functional.

View File

@ -0,0 +1,52 @@
---
title: Ozone on Kubernetes
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{< requirements >}}
* Working kubernetes cluster (LoadBalancer, PersistentVolume are not required)
* kubectl
{{< /requirements >}}
As the _apache/ozone_ docker images are available from the dockerhub the deployment process is very similar Minikube deployment. The only big difference is that we have dedicated set of k8s files for hosted clusters (for example we can use one datanode per host)
Deploy to kubernetes
`kubernetes/examples` folder of the ozone distribution contains kubernetes deployment resource files for multiple use cases.
To deploy to a hosted cluster use the ozone subdirectory:
```
cd kubernetes/examples/ozone
kubectl apply -f .
```
And you can check the results with
```
kubectl get pod
Access the services
```
Now you can access any of the services. By default the services are not published but you can access them with port-foward rules.
```
kubectl port-forward s3g-0 9878:9878
kubectl port-forward scm-0 9876:9876
```

View File

@ -0,0 +1,69 @@
---
title: Minikube & Ozone
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{< requirements >}}
* Working minikube setup
* kubectl
{{< /requirements >}}
`kubernetes/examples` folder of the ozone distribution contains kubernetes deployment resource files for multiple use cases. By default the kubernetes resource files are configured to use `apache/ozone` image from the dockerhub.
To deploy it to minikube use the minikube configuration set:
```
cd kubernetes/examples/minikube
kubectl apply -f .
```
And you can check the results with
```
kubectl get pod
```
Note: the kubernetes/examples/minikube resource set is optimized for minikube usage:
* You can have multiple datanodes even if you have only one host (in a real production cluster usually you need one datanode per physical host)
* The services are published with node port
## Access the services
Now you can access any of the services. For each web endpoint an additional NodeType service is defined in the minikube k8s resource set. NodeType services are available via a generated port of any of the host nodes:
```bash
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
datanode ClusterIP None <none> <none> 27s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 118m
om ClusterIP None <none> 9874/TCP 27s
om-public NodePort 10.108.48.148 <none> 9874:32649/TCP 27s
s3g ClusterIP None <none> 9878/TCP 27s
s3g-public NodePort 10.97.133.137 <none> 9878:31880/TCP 27s
scm ClusterIP None <none> 9876/TCP 27s
scm-public NodePort 10.105.231.28 <none> 9876:32171/TCP 27s
```
Minikube contains a convenience command to access any of the NodePort services:
```
minikube service s3g-public
Opening kubernetes service default/s3g-public in default browser...
```

View File

@ -1,10 +1,6 @@
---
title: Configuration
weight: 1
menu:
main:
parent: Starting
weight: 2
title: Ozone On Premise Installation
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -23,9 +19,6 @@ menu:
limitations under the License.
-->
If you are feeling adventurous, you can setup ozone in a real cluster.
Setting up a real cluster requires us to understand the components of Ozone.
Ozone is designed to work concurrently with HDFS. However, Ozone is also
@ -38,9 +31,6 @@ capable of running independently. The components of ozone are the same in both a
requests blocks from SCM, to which clients can write data.
3. Datanodes - Ozone data node code runs inside the HDFS datanode or in the independent deployment case runs an ozone datanode daemon.
## Setting up an Ozone only cluster
* Please untar the ozone-<version> to the directory where you are going
@ -113,7 +103,7 @@ Here is an example,
{{< highlight xml >}}
<property>
<name>ozone.scm.datanode.id.dir</name>
<value>/data/disk1/meta/node/datanode.id</value>
<value>/data/disk1/meta/node</value>
</property>
{{< /highlight >}}
@ -129,7 +119,7 @@ Here is an example,
{{< /highlight >}}
### Ozone Settings Summary
## Ozone Settings Summary
| Setting | Value | Comment |
|--------------------------------|------------------------------|------------------------------------------------------------------|
@ -140,3 +130,58 @@ Here is an example,
| ozone.scm.client.address | SCM server name and port | Used by client-side |
| ozone.scm.datanode.address | SCM server name and port | Used by datanode to talk to SCM |
| ozone.om.address | OM server name | Used by Ozone handler and Ozone file system. |
## Startup the cluster
Before we boot up the Ozone cluster, we need to initialize both SCM and Ozone Manager.
{{< highlight bash >}}
ozone scm --init
{{< /highlight >}}
This allows SCM to create the cluster Identity and initialize its state.
The ```init``` command is similar to Namenode format. Init command is executed only once, that allows SCM to create all the required on-disk structures to work correctly.
{{< highlight bash >}}
ozone --daemon start scm
{{< /highlight >}}
Once we know SCM is up and running, we can create an Object Store for our use. This is done by running the following command.
{{< highlight bash >}}
ozone om --init
{{< /highlight >}}
Once Ozone manager has created the Object Store, we are ready to run the name
services.
{{< highlight bash >}}
ozone --daemon start om
{{< /highlight >}}
At this point Ozone's name services, the Ozone manager, and the block service SCM is both running.
**Please note**: If SCM is not running
```om --init``` command will fail. SCM start will fail if on-disk data structures are missing. So please make sure you have done both ```scm --init``` and ```om --init``` commands.
Now we need to start the data nodes. Please run the following command on each datanode.
{{< highlight bash >}}
ozone --daemon start datanode
{{< /highlight >}}
At this point SCM, Ozone Manager and data nodes are up and running.
***Congratulations!, You have set up a functional ozone cluster.***
## Shortcut
If you want to make your life simpler, you can just run
{{< highlight bash >}}
ozone scm --init
ozone om --init
start-ozone.sh
{{< /highlight >}}
This assumes that you have set up the slaves file correctly and ssh
configuration that allows ssh-ing to all data nodes. This is the same as the
HDFS configuration, so please refer to HDFS documentation on how to set this
up.

View File

@ -1,10 +1,6 @@
---
title: Alpha Cluster
weight: 1
menu:
main:
parent: Starting
weight: 1
title: Pseudo-cluster
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
@ -23,21 +19,11 @@ menu:
limitations under the License.
-->
{{< requirements >}}
* docker and docker-compose
{{< /requirements >}}
***This is an alpha release of Ozone. Please don't use this release in
production.*** Please check the road map page for features under
development.
The easiest way to run ozone is to download the release tarball and launch
ozone via Docker. Docker will create a small ozone cluster on your machine,
including the data nodes and ozone services.
## Running Ozone via Docker
**This assumes that you have Docker installed on the machine.**
* Download the Ozone tarball and untar it.
* Download the Ozone binary tarball and untar it.
* Go to the directory where the docker compose files exist and tell
`docker-compose` to start Ozone in the background. This will start a small
@ -70,4 +56,5 @@ While you are there, please don't forget to check out the ozone configuration ex
To shutdown the cluster, please run
{{< highlight bash >}}
docker-compose down
{{< /highlight >}}
{{< /highlight >}}

View File

@ -0,0 +1,111 @@
---
title: Simple Single Ozone
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{< requirements >}}
* Working docker setup
* AWS CLI (optional)
{{< /requirements >}}
# Ozone in a Single Container
The easiest way to start up an all-in-one ozone container is to use the latest
docker image from docker hub:
```bash
docker run -P 9878:9878 -P 9876:9876 apache/ozone
```
This command will pull down the ozone image from docker hub and start all
ozone services in a single container. <br>
This container will run the required metadata servers (Ozone Manager, Storage
Container Manager) one data node and the S3 compatible REST server
(S3 Gateway).
# Local multi-container cluster
If you would like to use a more realistic pseud-cluster where each components
run in own containers, you can start it with a docker-compose file.
We have shipped a docker-compose and an enviorment file as part of the
container image that is uploaded to docker hub.
The following commands can be used to extract these files from the image in the docker hub.
```bash
docker run apache/ozone cat docker-compose.yaml > docker-compose.yaml
docker run apache/ozone cat docker-config > docker-config
```
Now you can start the cluster with docker-compose:
```bash
docker-compose up -d
```
If you need multiple datanodes, we can just scale it up:
```bash
docker-compose scale datanode=3
```
# Running S3 Clients
Once the cluster is booted up and ready, you can verify it is running by
connecting to the SCM's UI at [http://localhost:9876](http://localhost:9876).
The S3 gateway endpoint will be exposed at port 9878. You can use Ozone's S3
support as if you are working against the real S3.
Here is how you create buckets from command line:
```bash
aws s3api --endpoint http://localhost:9878/ create-bucket --bucket=bucket1
```
Only notable difference in the above command line is the fact that you have
to tell the _endpoint_ address to the aws s3api command.
Now let us put a simple file into the S3 Bucket hosted by Ozone. We will
start by creating a temporary file that we can upload to Ozone via S3 support.
```bash
ls -1 > /tmp/testfile
```
This command creates a temporary file that
we can upload to Ozone. The next command actually uploads to Ozone's S3
bucket using the standard aws s3 command line interface.
```bash
aws s3 --endpoint http://localhost:9878 cp --storage-class REDUCED_REDUNDANCY /tmp/testfile s3://bucket1/testfile
```
<div class="alert alert-info" role="alert">
Note: REDUCED_REDUNDANCY is required for the single container ozone, since it
has a single datanode. </div>
We can now verify that file got uploaded by running the list command against
our bucket.
```bash
aws s3 --endpoint http://localhost:9878 ls s3://bucket1/testfile
```
.
<div class="alert alert-info" role="alert"> You can also check the internal
bucket browser supported by Ozone S3 interface by clicking on the below link.
<br>
</div>
http://localhost:9878/bucket1?browser

View File

@ -0,0 +1,88 @@
---
title: Getting Started
name: Getting Started
identifier: Starting
menu: main
weight: 1
cards: "false"
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{<jumbotron title="Installing Ozone">}}
There are many ways to install and run Ozone. Starting from simple docker
deployments on
local nodes, to full scale multi-node cluster deployment on
Kubernetes or bare-metal.
{{</jumbotron>}}
<section class="row cardgroup">
<span class="label label-warning label-">Easy Start</span>
<h2>Running Ozone from Docker Hub</h2>
You can try out Ozone using docker hub without downloading the official release. This makes it easy to explore Ozone.
{{<card title="Starting ozone inside a single container" link="start/StartFromDockerHub.md" link-text="Ozone In Docker" image="start/docker.png">}}
The simplest and easiest way to start an ozone cluster
to explore what it can do is to start ozone via docker.
{{</card>}}
</section>
<section class="row cardgroup">
<span class="label label-success">Recommended</span>
<h2>Running Ozone from an Official Release</h2>
Apache Ozone can also be run from the official release packages. Along with the official source releases, we also release a set of convenience binary packages. It is easy to run these binaries in different configurations.
{{<card title="Deploying Ozone on a physical cluster" link="start/OnPrem" link-text="On-Prem Ozone Cluster" image="start/hadoop.png">}}
Ozone is designed to work concurrently with HDFS. The physical cluster instructions explain each component of Ozone and how to deploy with maximum control.
{{</card>}}
{{<card title="Deploying Ozone on K8s" link="start/Kubernetes" link-text="Kubernetes" image="start/k8s.png">}}
Ozone is designed to work well under Kubernetes. These are instructions on how to deploy Ozone on K8s platform. Ozone provides a replicated storage solution for K8s based Applications.
{{</card>}}
{{<card title="Deploy Ozone using MiniKube" link="start/Minikube" link-text="Minikube cluster" image="start/minikube.png">}}
Ozone comes with a standard set of K8s resources. You can deploy them to MiniKube and experiment with the K8s based deployments.
{{</card>}}
{{<card title="An Ozone cluster in Local Node." link="start/RunningViaDocker.md" link-text="docker-compose" image="start/docker.png">}}
We also ship standard docker files with official release, if you want to use them. These are part of official release and not depend upon Docker Hub.
{{</card>}}
</section>
<section class="row cardgroup">
<span class="label label-danger">Hadoop Ninja</span>
<h2>Building From Sources </h2>
Instructions to build Ozone from source to create deployment packages.
{{<card title="Building From Sources" link="start/FromSource.md" link-text="Build ozone from source" image="start/hadoop.png">}}
If you are a Hadoop ninja, and wise in the ways of Apache, you already know that a real Apache release is a source release. We believe that even ninjas need help at times.
{{</card>}}
</section>

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.6 KiB

View File

@ -1,9 +1,7 @@
---
title: "Audit Parser"
date: 2018-12-17
menu:
main:
parent: Tools
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more

View File

@ -1,9 +1,7 @@
---
title: Freon
date: "2017-09-02T23:58:17-07:00"
menu:
main:
parent: Tools
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more

View File

@ -1,9 +1,7 @@
---
title: "Generate Configurations"
date: 2018-12-18
menu:
main:
parent: Tools
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more

View File

@ -1,9 +1,7 @@
---
title: "SCMCLI"
date: 2017-08-10
menu:
main:
parent: Tools
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more

View File

@ -0,0 +1,228 @@
---
title: "Testing tools"
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Testing is one of the most important part during the development of a distributed system. We have the following type of test.
This page includes our existing test tool which are part of the Ozone source base.
Note: we have more tests (like TCP-DS, TCP-H tests via Spark or Hive) which are not included here because they use external tools only.
## Unit test
As every almost every java project we have the good old unit tests inside each of our projects.
## Integration test (JUnit)
Traditional unit tests are supposed to test only one unit, but we also have higher level unit tests. They use `MiniOzoneCluster` which is a helper method to start real daemons (scm,om,datanodes) during the unit test.
From maven/java point of view they are just simple unit tests (JUnit library is used) but to separate them (and solve some dependency problems) we moved all of these tests to `hadoop-ozone/integration-test`
## Smoketest
We use docker-compose based pseudo-cluster to run different configuration of Ozone. To be sure that the different configuration can be started we implemented _acceptance_ tests with the help of https://robotframework.org/.
The smoketests are available from the distribution (`./smoketest`) but the robot files defines only the tests: usually they start CLI and check the output.
To run the tests in different environmente (docker-compose, kubernetes) you need a definition to start the containers and execute the right tests in the right containers.
These definition of the tests are included in the `compose` directory (check `./compose/*/test.sh` or `./compose/test-all.sh`).
For example a simple way to test the distribution packege:
```
cd compose/ozonze
./test.sh
```
## Blockade
[Blockade](https://github.com/worstcase/blockade) is a tool to test network failures and partitions (it's inspired by the legendary [Jepsen tests](https://jepsen.io/analyses)).
Blockade tests are implemented with the help of tests and can be started from the `./blockade` directory of the distrubution.
```
cd blocakde
pip install pytest==2.8.7,blockade
python -m pytest -s .
```
See the README in the blockade directory for more details.
## MiniChaosOzoneCluster
This is a way to get [chaos](https://en.wikipedia.org/wiki/Chaos_engineering) in your machine. It can be started from the source code and a MiniOzoneCluster (which starts real daemons) will be started and killed randomly.
## Freon
Freon is a command line application which is included in the Ozone distribution. It's a load generator which is used in our stress tests.
For example:
```
ozone freon randomkeys --numOfVolumes=10 --numOfBuckets 10 --numOfKeys 10 --replicationType=RATIS --factor=THREE
```
```
***************************************************
Status: Success
Git Base Revision: 48aae081e5afacbb3240657556b26c29e61830c3
Number of Volumes created: 10
Number of Buckets created: 100
Number of Keys added: 1000
Ratis replication factor: THREE
Ratis replication type: RATIS
Average Time spent in volume creation: 00:00:00,035
Average Time spent in bucket creation: 00:00:00,319
Average Time spent in key creation: 00:00:03,659
Average Time spent in key write: 00:00:10,894
Total bytes written: 10240000
Total Execution time: 00:00:16,898
***********************
```
For more information check the [documentation page](https://hadoop.apache.org/ozone/docs/0.4.0-alpha/freon.html)
## Genesis
Genesis is a microbenchmarking tool. It's also included in the distribution (`ozone genesis`) but it doesn't require real cluster. It measures different part of the code in an isolated way (eg. the code which saves the data to the local RocksDB based key value stores)
Example run:
```
ozone genesis -benchmark=BenchMarkRocksDbStore
# JMH version: 1.19
# VM version: JDK 11.0.1, VM 11.0.1+13-LTS
# VM invoker: /usr/lib/jvm/java-11-openjdk-11.0.1.13-3.el7_6.x86_64/bin/java
# VM options: -Dproc_genesis -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender
# Warmup: 2 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.apache.hadoop.ozone.genesis.BenchMarkRocksDbStore.test
# Parameters: (backgroundThreads = 4, blockSize = 8, maxBackgroundFlushes = 4, maxBytesForLevelBase = 512, maxOpenFiles = 5000, maxWriteBufferNumber = 16, writeBufferSize = 64)
# Run progress: 0.00% complete, ETA 00:00:22
# Fork: 1 of 1
# Warmup Iteration 1: 213775.360 ops/s
# Warmup Iteration 2: 32041.633 ops/s
Iteration 1: 196342.348 ops/s
?stack: <delayed till summary>
Iteration 2: 41926.816 ops/s
?stack: <delayed till summary>
Iteration 3: 210433.231 ops/s
?stack: <delayed till summary>
Iteration 4: 46941.951 ops/s
?stack: <delayed till summary>
Iteration 5: 212825.884 ops/s
?stack: <delayed till summary>
Iteration 6: 145914.351 ops/s
?stack: <delayed till summary>
Iteration 7: 141838.469 ops/s
?stack: <delayed till summary>
Iteration 8: 205334.438 ops/s
?stack: <delayed till summary>
Iteration 9: 163709.519 ops/s
?stack: <delayed till summary>
Iteration 10: 162494.608 ops/s
?stack: <delayed till summary>
Iteration 11: 199155.793 ops/s
?stack: <delayed till summary>
Iteration 12: 209679.298 ops/s
?stack: <delayed till summary>
Iteration 13: 193787.574 ops/s
?stack: <delayed till summary>
Iteration 14: 127004.147 ops/s
?stack: <delayed till summary>
Iteration 15: 145511.080 ops/s
?stack: <delayed till summary>
Iteration 16: 223433.864 ops/s
?stack: <delayed till summary>
Iteration 17: 169752.665 ops/s
?stack: <delayed till summary>
Iteration 18: 165217.191 ops/s
?stack: <delayed till summary>
Iteration 19: 191038.476 ops/s
?stack: <delayed till summary>
Iteration 20: 196335.579 ops/s
?stack: <delayed till summary>
Result "org.apache.hadoop.ozone.genesis.BenchMarkRocksDbStore.test":
167433.864 ?(99.9%) 43530.883 ops/s [Average]
(min, avg, max) = (41926.816, 167433.864, 223433.864), stdev = 50130.230
CI (99.9%): [123902.981, 210964.748] (assumes normal distribution)
Secondary result "org.apache.hadoop.ozone.genesis.BenchMarkRocksDbStore.test:?stack":
Stack profiler:
....[Thread state distributions]....................................................................
78.9% RUNNABLE
20.0% TIMED_WAITING
1.1% WAITING
....[Thread state: RUNNABLE]........................................................................
59.8% 75.8% org.rocksdb.RocksDB.put
16.5% 20.9% org.rocksdb.RocksDB.get
0.7% 0.9% java.io.UnixFileSystem.delete0
0.7% 0.9% org.rocksdb.RocksDB.disposeInternal
0.3% 0.4% java.lang.Long.formatUnsignedLong0
0.1% 0.2% org.apache.hadoop.ozone.genesis.BenchMarkRocksDbStore.test
0.1% 0.1% java.lang.Long.toUnsignedString0
0.1% 0.1% org.apache.hadoop.ozone.genesis.generated.BenchMarkRocksDbStore_test_jmhTest.test_thrpt_jmhStub
0.0% 0.1% java.lang.Object.clone
0.0% 0.0% java.lang.Thread.currentThread
0.4% 0.5% <other>
....[Thread state: TIMED_WAITING]...................................................................
20.0% 100.0% java.lang.Object.wait
....[Thread state: WAITING].........................................................................
1.1% 100.0% jdk.internal.misc.Unsafe.park
# Run complete. Total time: 00:00:38
Benchmark (backgroundThreads) (blockSize) (maxBackgroundFlushes) (maxBytesForLevelBase) (maxOpenFiles) (maxWriteBufferNumber) (writeBufferSize) Mode Cnt Score Error Units
BenchMarkRocksDbStore.test 4 8 4 512 5000 16 64 thrpt 20 167433.864 ? 43530.883 ops/s
BenchMarkRocksDbStore.test:?stack 4 8 4 512 5000 16 64 thrpt NaN ---
```

View File

@ -0,0 +1,19 @@
---
title: "Testing tools"
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

View File

Before

Width:  |  Height:  |  Size: 39 KiB

After

Width:  |  Height:  |  Size: 39 KiB

BIN
hadoop-hdds/docs/static/ozone-usage.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

View File

@ -0,0 +1,60 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
{{ partial "header.html" . }}
<body>
{{ partial "navbar.html" . }}
<div class="container-fluid">
<div class="row">
{{ partial "sidebar.html" . }}
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main">
<h1>{{ .Title }}</h1>
<div class="col-md-9">
{{ .Content }}
{{.Params.card}}
{{ if not (eq .Params.cards "false")}}
{{ range .Pages }}
<div class="col-sm-6">
<div class="card">
<div class="card-body">
<h2 class="card-title">
{{ with .Params.Icon}}
<span class="glyphicon glyphicon-{{.}}" aria-hidden="true"></span>
{{end}}
{{ .LinkTitle }}
</h2>
<p class="card-text">{{.Summary}}</p>
<a href="{{.Permalink}}" class=" btn btn-primary btn-lg">{{.LinkTitle}}</a>
</div>
</div>
</div>
{{ end }}
{{end}}
</div>
</div>
</div>
</div>
{{ partial "footer.html" . }}
</body>
</html>

View File

@ -16,23 +16,40 @@
-->
{{ partial "header.html" . }}
<body>
<body>
{{ partial "navbar.html" . }}
{{ partial "navbar.html" . }}
<div class="container-fluid">
<div class="row">
{{ partial "sidebar.html" . }}
<div class="col-sm-9 col-sm-offset-3 col-md-10 col-md-offset-2 main">
<h1>{{ .Title }}</h1>
<div class="col-md-9">
{{ .Content }}
</div>
<div class="container-fluid">
<div class="row">
{{ partial "sidebar.html" . }}
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main">
<div class="col-md-9">
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="/">Home</a></li>
<li class="breadcrumb-item" aria-current="page"><a href="{{.CurrentSection.Permalink}}">{{.CurrentSection.Title}}</a></li>
<li class="breadcrumb-item active" aria-current="page">{{ .Title }}</li>
</ol>
</nav>
<h1>{{.Title}}</h1>
{{ .Content }}
{{ with .PrevInSection }}
<a class="btn btn-success btn-lg" href="{{ .Permalink }}">Next >></a>
{{ end }}
</div>
</div>
</div>
</div>
{{ partial "footer.html" . }}
{{ partial "footer.html" . }}
</body>
</html>
</body>
</html>

View File

@ -23,7 +23,7 @@
<div class="container-fluid">
<div class="row">
{{ partial "sidebar.html" . }}
<div class="col-sm-9 col-sm-offset-3 col-md-10 col-md-offset-2 main">
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main">
{{ .Content }}

View File

@ -26,9 +26,9 @@
<title>Documentation for Apache Hadoop Ozone</title>
<!-- Bootstrap core CSS -->
<link href="css/bootstrap.min.css" rel="stylesheet">
<link href="{{ "css/bootstrap.min.css" | relURL}}" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="css/ozonedoc.css" rel="stylesheet">
<link href="{{ "css/ozonedoc.css" | relURL}}" rel="stylesheet">
</head>

View File

@ -14,14 +14,13 @@
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-sm-3 col-md-2 sidebar" id="sidebar">
<img src="ozone-logo.png" style="max-width: 100%;"/>
<div class="col-sm-2 col-md-2 sidebar" id="sidebar">
<ul class="nav nav-sidebar">
{{ $currentPage := . }}
{{ range .Site.Menus.main }}
{{ if .HasChildren }}
<li class="{{ if $currentPage.IsMenuCurrent "main" . }}active{{ end }}">
<a href="{{ .URL }}">
<a href="{{ .URL | relURL}}">
{{ .Pre }}
<span>{{ .Name }}</span>
</a>
@ -29,14 +28,14 @@
{{ range .Children }}
<li class="{{ if $currentPage.IsMenuCurrent "main" . }}active{{ end }}">
{{ if .HasChildren }}
<a href="{{ .URL }}">
<a href="{{ .URL | relURL}}">
{{ .Pre }}
<span>{{ .Name }}</span>
</a>
<ul class="nav">
{{ range .Children }}
<li class="{{ if $currentPage.IsMenuCurrent "main" . }}active{{ end }}">
<a href="{{ .URL }}">{{ .Name }}</a>
<a href="{{ .URL | relURL}}">{{ .Name }}</a>
</li>
{{ end }}
</ul>
@ -50,9 +49,9 @@
{{ else }}
<li class="{{ if $currentPage.IsMenuCurrent "main" . }}active{{ end }}">
{{ if eq .URL "/" }}
<a href="index.html">
<a href="{{ "index.html" | relURL }}">
{{ else }}
<a href="{{ .URL }}">
<a href="{{ .URL | relURL }}">
{{ end }}
{{ .Pre }}

View File

@ -0,0 +1,20 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<a class="btn btn-primary">
{{ .Get "ref" }}
{{ .Inner }}
</a>

View File

@ -0,0 +1,40 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="col-sm-6">
<div class="media">
{{with .Get "image"}}
<div class="media-left media-top">
<img src="{{.}}"></img>
</div>
{{end}}
<div class="media-body">
<h4 class="media-title">
{{ if .Get "icon" }}
<span class="glyphicon glyphicon-{{ .Get "icon"}}"></span>
{{end}}
{{ .Get "title" }}
</h4>
{{ .Inner }}
{{ if .Get "link" }}
<p><a href="{{ .Get "link" | ref .}}" class=" btn btn-primary btn-lg">{{.Get "link-text" }}</a></p>
{{end}}
</div>
</div>
</div>

View File

@ -0,0 +1,25 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="jumbotron jumbotron-fluid">
<div class="container">
<h3 class="display-4">{{ .Get "title"}} </h3>
<p class="lead">
{{ .Inner }}
</p>
</div>
</div>

View File

@ -0,0 +1,22 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="panel panel-default">
<div class="panel-heading">Requirements</div>
<div class="panel-body">
{{ .Inner | markdownify}}
</div>
</div>

View File

@ -146,4 +146,21 @@ a:hover {
h4 {
font-weight: bold;
}
.cardgroup {
margin-bottom: 50px;
}
.cardgroup .card {
padding: 20px;
}
.cardgroup .media {
padding: 30px;
}
.card {
padding: 20px;
}