Commit Graph

1200 Commits

Author SHA1 Message Date
Rishabh Singh a83d1cdf26
fix var name (#13657) 2023-01-11 21:15:30 +05:30
317brian d9c27d6102
docs: add index page and related stuff for jupyter tutorials (#13342) 2022-12-16 13:33:50 -08:00
Rishabh Singh f42722e627
Set monotonically increasing worker capacity in start-druid-main (#13581)
This commit updates the task memory allocation logic.
- min task count is 2 and max task count is number of cpus on the machine
- task count increases wrt total task memory
- task memory increases from 512m to 2g
2022-12-16 15:34:30 +05:30
Clint Wylie d9e5245ff0
allow string dimension indexer to handle byte[] as base64 strings (#13573)
This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`. 

This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing).

I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.
2022-12-16 14:50:17 +05:30
317brian 668d1fad6b
docs: notebook only for API tutorial (#13345)
* docs: notebook for API tutorial

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* address the other comments

* typo

* add commentary to outputs

* address feedback from will

* delete unnecessary comment

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-15 13:16:07 -08:00
Rishabh Singh 97bc0220c7
Update task memory computation in start-druid (#13563)
Changes:
* Use 80% of memory specified for running services (versus 50% earlier).
* Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). 
* Add direct memory for router.
2022-12-15 11:06:16 +05:30
Vadim Ogievetsky 2729e25295
Link to java docs (#13478)
* add link to page about selecting a JRE

* add link to script also

* simplify text
2022-12-14 11:45:23 -08:00
Rishabh Singh 8e386072e9
Druid automated quickstart: zookeeper in service list (#13550) 2022-12-12 10:29:43 -08:00
Rishabh Singh 4ebdfe226d
Druid automated quickstart (#13365)
* Druid automated quickstart

* remove conf/druid/single-server/quickstart/_common/historical/jvm.config

* Minor changes in python script

* Add lower bound memory for some services

* Additional runtime properties for services

* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py

* File end newline

* Limit the ability to start multiple instances of a service, documentation changes

* simplify script arguments

* restore changes in medium profile

* run-druid refactor

* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging

* Remove extra quotes from mm task javaopts array

* Update logic to compute minimum memory

* simplify run-druid

* remove debug options from run-druid

* resolve the config_path provided

* comment out service specific runtime properties which are computed in the code

* simplify run-druid

* clean up docs, naming changes

* Throw ValueError exception on illegal state

* update docs

* rename args, compute_only -> compute, run_zk -> zk

* update help documentation

* update help documentation

* move task memory computation into separate method

* Add validation checks

* remove print

* Add validations

* remove start-druid bash script, rename start-druid-main

* Include tasks in lower bound memory calculation

* Fix test

* 256m instead of 256g

* caffeine cache uses 5% of heap

* ensure min task count is 2, task count is monotonic

* update configs and documentation for runtime props in conf/druid/single-server/quickstart

* Update docs

* Specify memory argument for each profile in single-server.md

* Update middleManager runtime.properties

* Move quickstart configs to conf/druid/base, add bash launch script, support python2

* Update supervise script

* rename base config directory to auto

* rename python script, changes to pass repeated args to supervise

* remove exmaples/conf/druid/base dir

* add docs

* restore changes in conf dir

* update start-druid-auto

* remove hashref for commands in supervise script

* start-druid-main java_opts array is comma separated

* update entry point script name in python script

* Update help docs

* documentation changes

* docs changes

* update docs

* add support for running indexer

* update supported services list

* update help

* Update python.md

* remove dir

* update .spelling

* Remove dependency on psutil and pathlib

* update docs

* Update get_physical_memory method

* Update help docs

* update docs

* update method to get physical memory on python

* udpate spelling

* update .spelling

* minor change

* Minor change

* memory comptuation for indexer

* update start-druid

* Update python.md

* Update single-server.md

* Update python.md

* run python3 --version to check if python is installed

* Update supervise script

* start-druid: echo message if python not found

* update anchor text

* minor change

* Update condition in supervise script

* JVM not jvm in docs
2022-12-09 11:04:02 -08:00
Gian Merlino c61313f4c4
Quieter streaming supervisors. (#13392)
Eliminates two common sources of noise with Kafka supervisors that have
large numbers of tasks and partitions:

1) Log the report at DEBUG rather than INFO level at each run cycle.
   It can get quite large, and can be retrieved via API when needed.

2) Use log4j2.xml to quiet down the org.apache.kafka.clients.consumer.internals
   package. Avoids a log message per-partition per-minute as part of seeking
   to the latest offset in the reporting thread. In the tasks, where this
   sort of logging might be more useful, we have another log message with
   the same information: "Seeking partition[%s] to[%s]".
2022-11-20 23:53:17 -08:00
AmatyaAvadhanula 41e51b21c3
Make http options the default configurations (#13092)
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
2022-10-05 05:35:17 +05:30
Frank Chen eff7c64228
export com.sun.management.internal (#13068) 2022-09-12 09:03:22 -07:00
Vadim Ogievetsky 2a039e7e6a
Add CTA and fix typo (#13009)
* Add CTA and fix typo

* resolve hostname better
2022-09-06 11:16:50 -07:00
zemin 6805a7f9c2
Ease of hidding sensitive properties from /status/proper… (#12950)
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

using one property for hiding properties, updated the index.md to document hiddenProperties

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

Added java docs

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties

fixed typo and removed redundant space

Co-authored-by: zemin <zemin.piao@adyen.com>
2022-09-02 08:51:25 -05:00
Gian Merlino 48ceab2153
Add Java 17 information to documentation. (#12990)
The docs say Java 17 support is experimental, and give tips on running
successfully with Java 17.

This patch also removes java.base/jdk.internal.perf and
jdk.management/com.sun.management.internal from the list of required
exports and opens, because they were formerly needed for JvmMonitor,
which was rewritten in #12481 to use MXBeans instead.
2022-08-30 12:32:49 -07:00
Gian Merlino d7d15ba51f
Add druid-multi-stage-query extension. (#12918)
* Add druid-multi-stage-query extension.

* Adjustments from CI.

* Task ID validation.

* Various changes from code review.

* Remove unnecessary code.

* LGTM-related.
2022-08-23 18:44:01 -07:00
Adarsh Sanjeev 3755f30bc4
Add export parameters for Java 11 (#12859)
* Add exports for Java 11 parameters

* Add parameters for data sketches
2022-08-16 13:05:45 +05:30
vimil-saju 4d65c08576
changes to run examples when CDPATH environment variable is set where cd command returns current dir… (#12877)
* changes to run examples on macos where cd command returns current directory

* Update examples/bin/run-druid

Co-authored-by: Frank Chen <frankchen@apache.org>

* merging

* sending output of cd command to /dev/null

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-08-14 13:15:24 +08:00
Gian Merlino ef6811ef88
Improved Java 17 support and Java runtime docs. (#12839)
* Improved Java 17 support and Java runtime docs.

1) Add a "Java runtime" doc page with information about supported
   Java versions, garbage collection, and strong encapsulation..

2) Update asm and equalsverifier to versions that support Java 17.

3) Add additional "--add-opens" lines to surefire configuration, so
   tests can pass successfully under Java 17.

4) Switch openjdk15 tests to openjdk17.

5) Update FrameFile to specifically mention Java runtime incompatibility
   as the cause of not being able to use Memory.map.

6) Update SegmentLoadDropHandler to log an error for Errors too, not
   just Exceptions. This is important because an IllegalAccessError is
   encountered when the correct "--add-opens" line is not provided,
   which would otherwise be silently ignored.

7) Update example configs to use druid.indexer.runner.javaOptsArray
   instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)

* Adjustments.

* Use run-java in more places.

* Add run-java.

* Update .gitignore.

* Exclude hadoop-client-api.

Brought in when building on Java 17.

* Swap one more usage of java.

* Fix the run-java script.

* Fix flag.

* Include link to Temurin.

* Spelling.

* Update examples/bin/run-java

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2022-08-03 23:16:05 -07:00
Gian Merlino 0ca37c20a6
Python 3 support for post-index-task. (#12841)
* Python 3 support for post-index-task.

Useful when running on macOS or any other system that
doesn't have Python 2.

* Encode JSON returned by read_task_file.

* Adjust.

* Skip needless loads.

* Add a decode.

* Additional decodes needed.
2022-08-02 17:53:34 -07:00
Charles Smith efbb58e90e
docs: remove maxRowsPerSegment where appropriate (#12071)
* remove maxRowsPerSegment where appropriate

* fix tutorial, accept suggestions

* Update docs/design/coordinator.md

* additional tutorial file

* fix initial index spec

* accept comments

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add back comment on maxrows per segment

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rm duplicate entry

* Update native-batch-simple-task.md

remove ref to `maxrowspersegment`

* Update native-batch.md

remove ref to `maxrowspersegment`

* final tenticles

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-28 16:52:13 +05:30
Gian Merlino 0099940808
Add TIME_IN_INTERVAL SQL operator. (#12662)
* Add TIME_IN_INTERVAL SQL operator.

The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.

* SqlParserPos cannot be null here.

* Remove unused import.

* Doc updates.

* Add words to dictionary.
2022-06-21 13:05:37 -07:00
Gian Merlino a27f4f5740
Service stdout log files, move logs to log/. (#12570)
* Service stdout log files, move logs to log/.

Two changes that make log behavior cleaner:

1) Redirect messages from the Java runtime to their own log files.
   Otherwise, they would get jumbled up in the output of the all-in-one
   start command.

2) Use log/ instead of bin/log/ for the default log directory. Makes them
   easier to find.

Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.

* Spelling.

* See if code formatting affects spelling.
2022-06-03 10:44:29 +05:30
Tiffany Yeh 665c926824
Fix zulu8 set-up Dockerfile for hadoop and hadoop3 in hadoop ingestion tutorial (#12248)
Fix errors related to zulu8 installation for building the Hadoop Docker image in the Load From Apache Hadoop tutorial.

The steps to download zulu8 in the Dockerfile and setup-zulu-repo.sh were replaced with the steps in the Dockerfile released by zulu-openjdk: be45d20302/centos/8u282-8.52.0.23/Dockerfile.
2022-04-11 20:28:09 +05:30
AmatyaAvadhanula 7bf1d8c5c0
Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298)
Add config for eager / lazy connection initialization in ResourcePool

Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.

While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.

Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.

A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.

If set to false, lazy initialization of connection resources takes place.

NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR

Algorithm
The current implementation relies on the creation of maxSize resources eagerly.

The new implementation's behaviour is as follows:

If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.
2022-03-09 23:17:43 +05:30
Gian Merlino 3b373114dc
Officially support Java 11. (#12232)
There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.

The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.
2022-03-04 14:15:45 -08:00
Karan Kumar a080fcdd7b
Fixing hadoop 3 Dockerfile (#12284) 2022-02-26 19:18:29 +05:30
Suneet Saldanha 159f97dcb0
Update docs for druid.processing.numThreads in brokers (#12231)
* Update docs for druid.processing.numThreads

* error msg

* one more reference
2022-02-04 17:34:21 -08:00
Laksh Singla dc1703d5f9
Change value of `druid.sql.planner.useGroupingSetForExactDistinct` in common.runtime.properties (#12182)
This PR changes the value of the property `druid.sql.planner.useGroupingSetForExactDistinct` from `false` to `true` in the runtime.properties files, so that newer installations have this property as `true`, while the default still remains as `false`.

The flag determines how queries which contain an aggregation over `DISTINCT` like `SELECT COUNT(DISTINCT foo.dim1) FILTER(WHERE foo.cnt = 1), SUM(foo.cnt) FROM druid.foo` get planned by Calcite. With the flag being set to false, it plans it via joins, whereas with it being set to true, the query is set using grouping sets.

There is a known issue with Calcite (https://github.com/apache/druid/issues/7953), where an NPE is thrown while planning the above query with joins. There is no such issue while planning the query using grouping sets.
2022-01-24 14:00:03 +05:30
Michka Popoff 590cf993c0
Replace source call to make scripts more portable (#12014)
Fixes #10744

Fixes:
./bin/node.sh: 44: ./bin/node.sh: source: not found
Could not find java - please run /opt/druid/apache-druid-0.20.0/bin/verify-java to confirm it is installed.
2021-12-06 13:41:25 +05:30
Frank Chen 4631a66723
Support rolling log files (#10147)
* apply log file rolling strategy

* fix doc

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Use absolute log path and allow spaces in log path

* Update log4j2 configuration

* apply FileAppender to ZooKeeper

* DO NOT redirect application's console log to file in supervisor
2021-12-03 21:32:01 +08:00
Clint Wylie 84b4bf56d8
vectorize logical operators and boolean functions (#11184)
changes:
* adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions
* vectorize logical operators and boolean functions, some only if useStrictBooleans is true
2021-12-02 16:40:23 -08:00
Karan Kumar 90640bb316
Support for hadoop 3 via maven profiles (#11794)
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
2021-10-30 22:46:24 +05:30
Daniel Koepke 497f2a1051
Allow spaces in java home. (#11407)
Quote the $java_exec var in examples/bin/verify-java to support spaces in DRUID_JAVA_HOME/JAVA_HOME. At present, the steps before and after the version check properly quote the path, but the version check spuriously fails when pointing to a Java 8 install that has a space in its path.
2021-07-05 18:50:36 +05:30
Charles Smith d69533dbd9
First refactor of compaction (#10935)
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc

* fix links, typos, some reorganization

* fix spelling. TBD still there for work in progress

* updates tutorial examples, adds more clarification around compaction use cases

* add granularity spec to automatic compaction config

* final edits

* spelling fixes

* apply suggestions from review

* upadtes from review

* last edits

* move note

* clarify null

* fix links & spelling

* latest review

* edits to auto-compaction config

* add back rollup

* fix links & spelling

* Update compaction.md

add granularityspec to example
2021-03-24 11:41:44 -07:00
Vyatcheslav Mogilevsky b0432be07a
Apache archive mirror (#10979)
* Ability to use mirror of archive.apache.org

* Ability to use mirror of archive.apache.org: documentation

* Ability to use mirror of archive.apache.org: fix int test Dockerfile: missing COPY instruction
2021-03-11 09:07:51 -08:00
misqos e684b83e29
Add the ability to supply client certificate to dsql comand line tool. (#10765) 2021-02-11 20:16:47 -08:00
Harini Rajendran c2e26d2e1c
Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679)
Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior.

Also added example config files for indexer process that can be used to spin up the indexer process.
2020-12-14 19:04:14 -08:00
Vyatcheslav Mogilevsky 5324785eac
integration tests fix: update base image for hadoop containers to centos 7 (#10638)
LGTM
2020-12-08 11:00:51 -08:00
Atul Mohan b6ad790dc7
Support combining inputsource for parallel ingestion (#10387)
* Add combining inputsource

* Fix documentation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-09-15 16:25:35 -07:00
Atul Mohan 06539bc828
Set default server.maxsize to the sum of segment cache (#10255)
* Default server.maxsize

* Remove maxsize refs from config

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-08-10 09:21:22 -07:00
frank chen 646fa84d04
Support unit on byte-related properties (#10203)
* support unit suffix on byte-related properties

* add doc

* change default value of byte-related properites in example files

* fix coding style

* fix doc

* fix CI

* suppress spelling errors

* improve code according to comments

* rename Bytes to HumanReadableBytes

* add getBytesInInt to get value safely

* improve doc

* fix problem reported by CI

* fix problem reported by CI

* resolve code review comments

* improve error message

* improve code & doc according to comments

* fix CI problem

* improve doc

* suppress spelling check errors
2020-07-31 09:58:48 +08:00
Gian Merlino 479c290fb9
Add QueryResource to log4j2 template. (#9735) 2020-04-22 09:18:45 -07:00
Clint Wylie 4d277dbf99
Fix double count ssl connection metrics (#9594)
* fix double counted jetty/numOpenConnections metric for ssl connections

* tests

* more better

* style
2020-04-03 23:29:23 -07:00
Suneet Saldanha af3337dac8
DruidInputSource can add new dimensions during re-ingestion (#9590)
* WIP integration tests

* Add integration test for ingestion with transformSpec

* WIP almost working tests

* Add ignored tests

* checkstyle stuff

* remove newPage from index task ingestion spec

* more test cleanup

* still not quite working

* Actually disable the tests

* working tests

* fix codestyle

* dont use junit in integration tests

* actually fix the bug

* fix checkstyle

* bring index tests closer to reindex tests
2020-04-02 17:32:31 -07:00
Maytas Monsereenusorn e9888f41cb
Modify check java version script to indicate experimental support for Java 11 (#9455)
* Modify check java version script to indicate experimental support for Java 11

* update docs
2020-03-11 09:22:39 -07:00
Chi Cao Minh 26eeba636a
Make java version check work on all shells (#9376)
* Make java version check work on all shells

Previously, "perl verify-java" would fail on shells like zsh, which
would cause the quickstart scripts (e.g., bin/start-micro-quickstart) to
fail unless the DRUID_SKIP_JAVA_SKIP environment variable is set.

* Support dash (ubuntu)
2020-02-19 13:44:00 -08:00
Clint Wylie b55657cc26
fix protobuf extension packaging and docs (#9320)
* fix protobuf extension packaging and docs

* fix paths

* Update protobuf.md

* Update protobuf.md
2020-02-07 09:26:52 -08:00
Suneet Saldanha 180c622e0f Minor doc updates (#9217)
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
2020-01-20 11:34:37 -08:00
Suneet Saldanha 85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00