druid/docs/development/extensions-core/google.md

59 lines
2.8 KiB
Markdown

---
id: google
title: "Google Cloud Storage"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
## Google Cloud Storage Extension
This extension allows you to do 2 things:
* [Ingest data](#reading-data-from-google-cloud-storage) from files stored in Google Cloud Storage.
* Write segments to [deep storage](#deep-storage) in GCS.
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-google-extensions` in the extensions load list.
### Required Configuration
To configure connectivity to google cloud, run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
### Reading data from Google Cloud Storage
The [Google Cloud Storage input source](../../ingestion/native-batch-input-source.md) is supported by the [Parallel task](../../ingestion/native-batch.md)
to read objects directly from Google Cloud Storage. If you use the [Hadoop task](../../ingestion/hadoop.md),
you can read data from Google Cloud Storage by specifying the paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec).
Objects can also be read directly from Google Cloud Storage via the [StaticGoogleBlobStoreFirehose](../../ingestion/native-batch-firehose.md#staticgoogleblobstorefirehose)
### Deep Storage
Deep storage can be written to Google Cloud Storage either via this extension or the [druid-hdfs-storage extension](../extensions-core/hdfs.md).
#### Configuration
To configure connectivity to google cloud, run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
|Property|Description|Possible Values|Default|
|--------|---------------|-----------|-------|
|`druid.storage.type`|google||Must be set.|
|`druid.google.bucket`||Google Storage bucket name.|Must be set.|
|`druid.google.prefix`|A prefix string that will be prepended to the blob names for the segments published to Google deep storage| |""|
|`druid.google.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024|