diff --git a/docs/content/dependencies/metadata-storage.md b/docs/content/dependencies/metadata-storage.md index ddf8855e7b7..d847bbc1d28 100644 --- a/docs/content/dependencies/metadata-storage.md +++ b/docs/content/dependencies/metadata-storage.md @@ -17,13 +17,13 @@ Derby is not suitable for production use as a metadata store. Use MySQL or Postg ## Using derby - Add the following to your Druid configuration. +Add the following to your Druid configuration. + +```properties +druid.metadata.storage.type=derby +druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527//opt/var/druid_state/derby;create=true +``` - ```properties - druid.metadata.storage.type=derby - druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527//home/y/var/druid_state/derby;create=true - ``` - ## MySQL See [mysql-metadata-storage extension documentation](../development/extensions-core/mysql.html). diff --git a/docs/content/ingestion/batch-ingestion.md b/docs/content/ingestion/batch-ingestion.md index d6da72f3ec0..921cbb383fc 100644 --- a/docs/content/ingestion/batch-ingestion.md +++ b/docs/content/ingestion/batch-ingestion.md @@ -239,7 +239,7 @@ classification=yarn-site,properties=[mapreduce.reduce.memory.mb=6144,mapreduce.r ``` - Follow the instructions under "[Configure Hadoop for data -loads](cluster.html#configure-cluster-for-hadoop-data-loads)" using the XML files from +loads](../tutorials/cluster.html#configure-cluster-for-hadoop-data-loads)" using the XML files from `/etc/hadoop/conf` on your EMR master. #### Loading from S3 with EMR @@ -269,7 +269,7 @@ Druid works out of the box with many Hadoop distributions. If you are having dependency conflicts between Druid and your version of Hadoop, you can try searching for a solution in the [Druid user groups](https://groups.google.com/forum/#!forum/druid- -user), or reading the Druid [Different Hadoop Versions](..//operations/other-hadoop.html) documentation. +user), or reading the Druid [Different Hadoop Versions](../operations/other-hadoop.html) documentation. ## Command Line Hadoop Indexer diff --git a/docs/content/ingestion/stream-pull.md b/docs/content/ingestion/stream-pull.md index 8785dcd427e..c2d2fb125d3 100644 --- a/docs/content/ingestion/stream-pull.md +++ b/docs/content/ingestion/stream-pull.md @@ -293,9 +293,6 @@ results. Is this always a problem? No. If your data is small enough to fit on a single Kafka partition, you can replicate without issues. Otherwise, you can run real-time nodes without replication. -There is now also an [experimental low level Kafka firehose](../development/kafka-simple-consumer-firehose.html) which -solves the issues described above with using the high level Kafka consumer. - Please note that druid will skip over event that failed its checksum and it is corrupt. ### Locking diff --git a/docs/content/ingestion/update-existing-data.md b/docs/content/ingestion/update-existing-data.md index 26bdc005aaf..c709ff504a7 100644 --- a/docs/content/ingestion/update-existing-data.md +++ b/docs/content/ingestion/update-existing-data.md @@ -28,7 +28,7 @@ segments and avoid the overhead of rebuilding new segments with reindexing, you ### Reindexing and Delta Ingestion with Hadoop Batch Ingestion This section assumes the reader understands how to do batch ingestion using Hadoop. See -[batch-ingestion](batch-ingestion.md) for more information. Hadoop batch-ingestion can be used for reindexing and delta ingestion. +[batch-ingestion](batch-ingestion.html) for more information. Hadoop batch-ingestion can be used for reindexing and delta ingestion. Druid uses an `inputSpec` in the `ioConfig` to know where the data to be ingested is located and how to read it. For simple Hadoop batch ingestion, `static` or `granularity` spec types allow you to read data stored in deep storage. diff --git a/docs/content/querying/dimensionspecs.md b/docs/content/querying/dimensionspecs.md index ab6185daa85..2150c01745c 100644 --- a/docs/content/querying/dimensionspecs.md +++ b/docs/content/querying/dimensionspecs.md @@ -353,7 +353,7 @@ For example if you want to concat "[" and "]" before and after the actual dimens ### Filtered DimensionSpecs -These are only valid for multi-value dimensions. If you have a row in druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filter.html) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases. +These are only valid for multi-value dimensions. If you have a row in druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.html) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases. It happens because "query filter" is internally used on the bitmaps and only used to match the row to be included in the query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which will match the row with dimension value ["v1", "v2", "v3"]. Please see the section on "Multi-value columns" in [segment](../design/segments.html) for more details. Then groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each. diff --git a/docs/content/tutorials/cluster.md b/docs/content/tutorials/cluster.md index 3a865d788be..7f10de6ab15 100644 --- a/docs/content/tutorials/cluster.md +++ b/docs/content/tutorials/cluster.md @@ -68,13 +68,13 @@ In this package, you'll find: * `LICENSE` - the license files. -* `bin/` - scripts related to the [single-machine quickstart](quickstart.md). +* `bin/` - scripts related to the [single-machine quickstart](quickstart.html). * `conf/*` - template configurations for a clustered setup. -* `conf-quickstart/*` - configurations for the [single-machine quickstart](quickstart.md). +* `conf-quickstart/*` - configurations for the [single-machine quickstart](quickstart.html). * `extensions/*` - all Druid extensions. * `hadoop-dependencies/*` - Druid Hadoop dependencies. * `lib/*` - all included software packages for core Druid. -* `quickstart/*` - files related to the [single-machine quickstart](quickstart.md). +* `quickstart/*` - files related to the [single-machine quickstart](quickstart.html). We'll be editing the files in `conf/` in order to get things running. diff --git a/docs/content/tutorials/quickstart.md b/docs/content/tutorials/quickstart.md index d0bcb3bd08c..2223c7ae7b8 100644 --- a/docs/content/tutorials/quickstart.md +++ b/docs/content/tutorials/quickstart.md @@ -174,7 +174,7 @@ bin/tranquility server -configFile /conf-quickstart/tranqu
This section shows you how to load data using Tranquility Server, but Druid also supports a wide -variety of other streaming ingestion options, including from +variety of other streaming ingestion options, including from popular streaming systems like Kafka, Storm, Samza, and Spark Streaming.
@@ -229,7 +229,7 @@ visualize and explore data in Druid. We recommend trying [Pivot](https://github. [Panoramix](https://github.com/mistercrunch/panoramix), or [Metabase](https://github.com/metabase/metabase) to start visualizing the data you just ingested. -If you installed Pivot for example, you should be able to view your data in your browser at [localhost:9090](localhost:9090). +If you installed Pivot for example, you should be able to view your data in your browser at [localhost:9090](http://localhost:9090/). ### SQL and other query libraries diff --git a/docs/content/tutorials/tutorial-batch.md b/docs/content/tutorials/tutorial-batch.md index cc4d9f3e621..3d14a9a8449 100644 --- a/docs/content/tutorials/tutorial-batch.md +++ b/docs/content/tutorials/tutorial-batch.md @@ -16,7 +16,7 @@ Once that's complete, you can load your own dataset by writing a custom ingestio ## Writing an ingestion spec -When loading files into Druid, you will use Druid's [batch loading](ingestion-batch.html) process. +When loading files into Druid, you will use Druid's [batch loading](../ingestion/batch-ingestion.html) process. There's an example batch ingestion spec in `quickstart/wikiticker-index.json` that you can modify for your own needs. diff --git a/docs/content/tutorials/tutorial-kafka.md b/docs/content/tutorials/tutorial-kafka.md index eddd927047b..ceb741c7e54 100644 --- a/docs/content/tutorials/tutorial-kafka.md +++ b/docs/content/tutorials/tutorial-kafka.md @@ -45,7 +45,7 @@ Run this command to create a Kafka topic called *metrics*, to which we'll send d ## Enable Druid Kafka ingestion -Druid includes configs for [Tranquility Kafka](ingestion-streams.md#kafka) to support loading data from Kafka. +Druid includes configs for [Tranquility Kafka](../ingestion/stream-pull.html#kafka) to support loading data from Kafka. To enable this in the quickstart-based configuration: - Stop your Tranquility command (CTRL-C) and then start it up again.