2018-12-13 14:47:20 -05:00
---
2019-08-21 00:48:59 -04:00
id: google
2018-12-13 14:47:20 -05:00
title: "Google Cloud Storage"
---
2018-11-13 12:38:37 -05:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2020-01-19 15:49:33 -05:00
## Google Cloud Storage Extension
2016-11-16 03:36:45 -05:00
2020-01-19 15:49:33 -05:00
This extension allows you to do 2 things:
* [Ingest data ](#reading-data-from-google-cloud-storage ) from files stored in Google Cloud Storage.
2020-04-14 21:39:54 -04:00
* Write segments to [deep storage ](#deep-storage ) in GCS.
2016-11-16 03:36:45 -05:00
2021-08-19 04:52:26 -04:00
To use this Apache Druid extension, [include ](../../development/extensions.md#loading-extensions ) `druid-google-extensions` in the extensions load list.
2020-01-19 15:49:33 -05:00
### Required Configuration
To configure connectivity to google cloud, run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
### Reading data from Google Cloud Storage
2021-12-03 06:07:14 -05:00
The [Google Cloud Storage input source ](../../ingestion/native-batch-input-source.md ) is supported by the [Parallel task ](../../ingestion/native-batch.md )
2020-01-19 15:49:33 -05:00
to read objects directly from Google Cloud Storage. If you use the [Hadoop task ](../../ingestion/hadoop.md ),
you can read data from Google Cloud Storage by specifying the paths in your [`inputSpec` ](../../ingestion/hadoop.md#inputspec ).
2021-12-03 06:07:14 -05:00
Objects can also be read directly from Google Cloud Storage via the [StaticGoogleBlobStoreFirehose ](../../ingestion/native-batch-firehose.md#staticgoogleblobstorefirehose )
2020-01-19 15:49:33 -05:00
### Deep Storage
2016-11-16 03:36:45 -05:00
2019-08-21 00:48:59 -04:00
Deep storage can be written to Google Cloud Storage either via this extension or the [druid-hdfs-storage extension ](../extensions-core/hdfs.md ).
2018-07-05 12:53:41 -04:00
2020-01-19 15:49:33 -05:00
#### Configuration
To configure connectivity to google cloud, run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
2018-07-05 12:53:41 -04:00
2020-02-21 17:12:03 -05:00
|Property|Description|Possible Values|Default|
2018-07-05 12:53:41 -04:00
|--------|---------------|-----------|-------|
|`druid.storage.type`|google||Must be set.|
2020-02-21 17:12:03 -05:00
|`druid.google.bucket`||Google Storage bucket name.|Must be set.|
|`druid.google.prefix`|A prefix string that will be prepended to the blob names for the segments published to Google deep storage| |""|
|`druid.google.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024|