mirror of https://github.com/apache/druid.git
99 lines
4.3 KiB
Markdown
99 lines
4.3 KiB
Markdown
---
|
|
id: cloudfiles
|
|
title: "Rackspace Cloud Files"
|
|
---
|
|
|
|
<!--
|
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
|
~ or more contributor license agreements. See the NOTICE file
|
|
~ distributed with this work for additional information
|
|
~ regarding copyright ownership. The ASF licenses this file
|
|
~ to you under the Apache License, Version 2.0 (the
|
|
~ "License"); you may not use this file except in compliance
|
|
~ with the License. You may obtain a copy of the License at
|
|
~
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~
|
|
~ Unless required by applicable law or agreed to in writing,
|
|
~ software distributed under the License is distributed on an
|
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
~ KIND, either express or implied. See the License for the
|
|
~ specific language governing permissions and limitations
|
|
~ under the License.
|
|
-->
|
|
|
|
|
|
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-cloudfiles-extensions` in the extensions load list.
|
|
|
|
## Deep Storage
|
|
|
|
[Rackspace Cloud Files](http://www.rackspace.com/cloud/files/) is another option for deep storage. This requires some additional Druid configuration.
|
|
|
|
|Property|Possible Values|Description|Default|
|
|
|--------|---------------|-----------|-------|
|
|
|`druid.storage.type`|cloudfiles||Must be set.|
|
|
|`druid.storage.region`||Rackspace Cloud Files region.|Must be set.|
|
|
|`druid.storage.container`||Rackspace Cloud Files container name.|Must be set.|
|
|
|`druid.storage.basePath`||Rackspace Cloud Files base path to use in the container.|Must be set.|
|
|
|`druid.storage.operationMaxRetries`||Number of tries before cancel a Rackspace operation.|10|
|
|
|`druid.cloudfiles.userName`||Rackspace Cloud username|Must be set.|
|
|
|`druid.cloudfiles.apiKey`||Rackspace Cloud API key.|Must be set.|
|
|
|`druid.cloudfiles.provider`|rackspace-cloudfiles-us,rackspace-cloudfiles-uk|Name of the provider depending on the region.|Must be set.|
|
|
|`druid.cloudfiles.useServiceNet`|true,false|Whether to use the internal service net.|true|
|
|
|
|
## Firehose
|
|
|
|
<a name="firehose"></a>
|
|
|
|
#### StaticCloudFilesFirehose
|
|
|
|
This firehose ingests events, similar to the StaticAzureBlobStoreFirehose, but from Rackspace's Cloud Files.
|
|
|
|
Data is newline delimited, with one JSON object per line and parsed as per the `InputRowParser` configuration.
|
|
|
|
The storage account is shared with the one used for Rackspace's Cloud Files deep storage functionality, but blobs can be in a different region and container.
|
|
|
|
As with the Azure blobstore, it is assumed to be gzipped if the extension ends in .gz
|
|
|
|
This firehose is _splittable_ and can be used by [native parallel index tasks](../../ingestion/native-batch.md).
|
|
Since each split represents an object in this firehose, each worker task of `index_parallel` will read an object.
|
|
|
|
Sample spec:
|
|
|
|
```json
|
|
"firehose" : {
|
|
"type" : "static-cloudfiles",
|
|
"blobs": [
|
|
{
|
|
"region": "DFW"
|
|
"container": "container",
|
|
"path": "/path/to/your/file.json"
|
|
},
|
|
{
|
|
"region": "ORD"
|
|
"container": "anothercontainer",
|
|
"path": "/another/path.json"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or
|
|
shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.
|
|
|
|
|property|description|default|required?|
|
|
|--------|-----------|-------|---------|
|
|
|type|This should be `static-cloudfiles`.|N/A|yes|
|
|
|blobs|JSON array of Cloud Files blobs.|N/A|yes|
|
|
|maxCacheCapacityBytes|Maximum size of the cache space in bytes. 0 means disabling cache.|1073741824|no|
|
|
|maxCacheCapacityBytes|Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes.|1073741824|no|
|
|
|maxFetchCapacityBytes|Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read.|1073741824|no|
|
|
|fetchTimeout|Timeout for fetching a Cloud Files object.|60000|no|
|
|
|maxFetchRetry|Maximum retry for fetching a Cloud Files object.|3|no|
|
|
|
|
Cloud Files Blobs:
|
|
|
|
|property|description|default|required?|
|
|
|--------|-----------|-------|---------|
|
|
|container|Name of the Cloud Files container|N/A|yes|
|
|
|path|The path where data is located.|N/A|yes|
|