* add protobuf inputformat
* repair pom
* alter intermediateRow to type of Dynamicmessage
* add document
* refine test
* fix document
* add protoBytesDecoder
* refine document and add ser test
* add hash
* add schema registry ser test
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
* Add ability to wait for segment availability for batch jobs
* IT updates
* fix queries in legacy hadoop IT
* Fix broken indexing integration tests
* address an lgtm flag
* spell checker still flagging for hadoop doc. adding under that file header too
* fix compaction IT
* Updates to wait for availability method
* improve unit testing for patch
* fix bad indentation
* refactor waitForSegmentAvailability
* Fixes based off of review comments
* cleanup to get compile after merging with master
* fix failing test after previous logic update
* add back code that must have gotten deleted during conflict resolution
* update some logging code
* fixes to get compilation working after merge with master
* reset interrupt flag in catch block after code review pointed it out
* small changes following self-review
* fixup some issues brought on by merge with master
* small changes after review
* cleanup a little bit after merge with master
* Fix potential resource leak in AbstractBatchIndexTask
* syntax fix
* Add a Compcation TuningConfig type
* add docs stipulating the lack of support by Compaction tasks for the new config
* Fixup compilation errors after merge with master
* Remove erreneous newline
* DruidInputSource: Fix issues in column projection, timestamp handling.
DruidInputSource, DruidSegmentReader changes:
1) Remove "dimensions" and "metrics". They are not necessary, because we
can compute which columns we need to read based on what is going to
be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
the timestamp of the returned InputRows was set to the `__time` column
of the input datasource.
(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.
(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.
(1) and (3) are breaking changes.
Web console changes:
1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
compatibility with the new behavior.
Other changes:
1) Add ColumnsFilter, a new class that allows input readers to determine
which columns they need to read. Currently, it's only used by the
DruidInputSource, but it could be used by other columnar input sources
in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.
* Various fixups.
* Uncomment incorrectly commented lines.
* Move TransformSpecTest to the proper module.
* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.
* Fix.
* Fix build.
* Checkstyle.
* Misc fixes.
* Fix test.
* Move config.
* Fix imports.
* Fixup.
* Fix ShuffleResourceTest.
* Add import.
* Smarter exclusions.
* Fixes based on tests.
Also, add TIME_COLUMN constant in the web console.
* Adjustments for tests.
* Reorder test data.
* Update docs.
* Update docs to say Druid 0.22.0 instead of 0.21.0.
* Fix test.
* Fix ITAutoCompactionTest.
* Changes from review & from merging.
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc
* fix links, typos, some reorganization
* fix spelling. TBD still there for work in progress
* updates tutorial examples, adds more clarification around compaction use cases
* add granularity spec to automatic compaction config
* final edits
* spelling fixes
* apply suggestions from review
* upadtes from review
* last edits
* move note
* clarify null
* fix links & spelling
* latest review
* edits to auto-compaction config
* add back rollup
* fix links & spelling
* Update compaction.md
add granularityspec to example
* prometheus-emitter
* use existing jetty server to expose prometheus collection endpoint
* unused variables
* better variable names
* removed unused dependencies
* more metric definitions
* reorganize
* use prometheus HTTPServer instead of hooking into Jetty server
* temporary empty help string
* temporary non-empty help. fix incorrect dimension value in JSON (also updated statsd json)
* added full help text. added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation
* added documentation for prometheus emitter
* safety for invalid labelNames
* fix travis checks
* Unit test and better sanitization of metrics names and label values
* add precondition to check namespace against regex
* use precompiled regex
* remove static imports. fix metric types
* better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start()
* Update regex for label-value replacements to allow internal numeric values. Additional tests
* Adds missing license header
updates website/.spelling to add words used in prometheus-emitter docs.
updates docs/operations/metrics.md to correct the spelling of
bufferPoolName
* fixes version in extensions-contrib/prometheus-emitter
* fix style guide errors
* update import ordering
* add another word to website/.spelling
* remove unthrown declared exception
* remove unused import
* Pushgateway strategy for metrics
* typo
* Format fix and nullable strategy
* Update pom file for prometheus-emitter
* code review comments. Counter to gauge for cache metrics, periodical task to pushGateway
* Syntax fix
* Dimension label regex include numeric character back, fix previous commit
* bump prometheus-emitter pom dev version
* Remove scheduled task inside poen that push metrics
* Fix checkstyle
* Unit test coverage
* Unit test coverage
* Spelling
* Doc fix
* spelling
Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com>
Co-authored-by: Michael Schiff <schiff.michael@gmail.com>
Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com>
Co-authored-by: Tianxin Zhao <tizhao@adobe.com>
* Add config and header support for confluent schema registry. (porting code from https://github.com/apache/druid/pull/9096)
* Add Eclipse Public License 2.0 to license check
* Update licenses.yaml, revert changes to check-licenses.py and dependencies for integration-tests
* Add spelling exception and remove unused dependency
* Use non-deprecated getSchemaById() and remove duplicated license entry
* Update docs/ingestion/data-formats.md
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
* Added check for schema being null, as per Confluent code
* Missing imports and whitespace
* Updated unit tests with AvroSchema
Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@7-tv.de>
Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@joyn.de>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* fix checkstyle
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* fix test
* fix test
* add log
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
* address comments
* fix checkstyle
* fix checkstyle
* add config to skip overhead memory calculation
* add test for the skipBytesInMemoryOverheadCheck config
* add docs
* fix checkstyle
* fix checkstyle
* fix spelling
* address comments
* fix travis
* address comments
* Multiphase merge for IndexMergerV9
* JSON fix
* Cleanup temp files
* Docs
* Address logging and add IT
* Fix spelling and test unloader datasource name
* integration test for coordinator and overlord leadership, added sys.servers is_leader column
* docs
* remove not needed
* fix comments
* fix compile heh
* oof
* revert unintended
* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster
* fixes
* style
* dang
* heh
* scripts are hard
* fix spelling
* fix thing that must not matter since was already wrong ip, log when test fails
* needs more heap
* fix merge
* less aggro
* fix JSON format
* Change all columns in sys segments to be JSON
* Change all columns in sys segments to be JSON
* add tests
* fix failing tests
* fix failing tests
* Add shuffle metrics for parallel indexing
* javadoc and concurrency test
* concurrency
* fix javadoc
* Feature flag
* doc
* fix doc and add a test
* checkstyle
* add tests
* fix build and address comments
* Fix Avro OCF detection prefix and run formation detection on raw input
* Support Avro Fixed and Enum types correctly
* Check Avro version byte in format detection
* Add test for AvroOCFReader.sample
Ensures that the Sampler doesn't receive raw input that it can't
serialize into JSON.
* Document Avro type handling
* Add TS unit tests for guessInputFormat
* Adding more worker metrics to Druid Overlord
* Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better
* Few more instance of worker usage replaced with peon
* Modifying the peon idle count logic to only use eligible workers available capacity
* Changing the naming to task slot count instead of peon
* Adding some unit test coverage for the new test runner apis
* Addressing Review Comments
* Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics
* Fixing the spelling issue in the docs
* Setting the annotation Nullable on the TaskSlotCountStatsProvider methods
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided
* query context
* fix tests; add more test
* javadoc
* docs and more tests
* remove default and hadoop tests
* consistent name and fix javadoc
* spelling and field name
* default function for partitionsSpec
* other comments
* address comments
* fix tests and spelling
* test
* doc
* Working
* add test
* doc
* fix test
* split other integration test
* exclude other-index from other tests
* doc anchor fix
* adjust task slots and number of merge tasks
* spell check
* reduce maxNumConcurrentSubTasks to 1
* maxNumConcurrentSubtasks for range partitinoing
* reduce memory for historical
* change group name
* support redis cluster
* add 'password', 'database' properties
* test cases passed
* update doc
* some improvements
* fix CI
* add more test cases to improve branch coverage
* fix dependency check for test
* resolve review comments
* support unit suffix on byte-related properties
* add doc
* change default value of byte-related properites in example files
* fix coding style
* fix doc
* fix CI
* suppress spelling errors
* improve code according to comments
* rename Bytes to HumanReadableBytes
* add getBytesInInt to get value safely
* improve doc
* fix problem reported by CI
* fix problem reported by CI
* resolve code review comments
* improve error message
* improve code & doc according to comments
* fix CI problem
* improve doc
* suppress spelling check errors
* fix mvn website build to use mvn supplied nodejs, fix broken redirects, move block from custom.css to custom.scss so will be correctly generated
* sidebar
* fix lol
* init commit, all tests passed
* fix format
Signed-off-by: frank chen <frank.chen021@outlook.com>
* data stored successfully
* modify config path
* add doc
* add aliyun-oss extension to project
* remove descriptor deletion code to avoid warning message output by aliyun client
* fix warnings reported by lgtm-com
* fix ci warnings
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix errors reported by intellj inspection check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc spelling check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix dependency warnings reported by ci
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix warnings reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add package configuration to support showing extension info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add IT test cases and fix bugs
Signed-off-by: frank chen <frank.chen021@outlook.com>
* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add license info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc
Signed-off-by: frank chen <frank.chen021@outlook.com>
* exclude execution of IT testcases of OSS extension from CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* put the extensions under contrib group and add to distribution
* fix names in test cases
* add unit test to cover OssInputSource
* fix names in test cases
* fix dependency problem reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Druid user permissions apply in the console
* Update index.md
* noting user warning in console page; some minor shuffling
* noting user warning in console page; some minor shuffling 1
* touchups
* link checking fixes
* Updated per suggestions
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.
- Add REGEXP_LIKE function that returns a boolean, and is useful in
WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
(previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.
* Changes based on PR review.
* Fix arg check.
* Important fixes!
* Add speller.
* wip
* Additional tests.
* Fix up tests.
* Add validation error tests.
* Additional tests.
* Remove useless call.
* Add AvroOCFInputFormat
* Support supplying a reader schema in AvroOCFInputFormat
* Add docs for Avro OCF input format
* Address review comments
* Address second round of review
* Update data-formats.md
Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"
* Light text cleanup
* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.
* Update index.md
* original quickstart full first pass
* original quickstart full first pass
* first pass all the way through
* straggler
* image touchups and finished old tutorial
* a bit of finishing up
* Review comments
* fixing links
* spell checking gymnastics
* Adding support for autoscaling in GCE
* adding extra google deps also in gce pom
* fix link in doc
* remove unused deps
* adding terms to spelling file
* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT
* GCEXyz -> GceXyz in naming for consistency
* add preconditions
* add VisibleForTesting annotation
* typos in comments
* use StringUtils.format instead of String.format
* use custom exception instead of exit
* factorize interval time between retries
* making literal value a constant
* iter all network interfaces
* use provided on google (non api) deps
* adding missing dep
* removing unneded this and use Objects methods instead o 3-way if in hash and comparison
* adding import
* adding retries around getRunningInstances and adding limit for operation end waiting
* refactor GceEnvironmentConfig.hashCode
* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT
* removing unused config
* adding tests to hash and equals
* adding nullable to waitForOperationEnd
* adding testTerminate
* adding unit tests for createComputeService
* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)
* reverting queryResponseTemplate change
* adding comment for Compute.Builder.build() returning null
* Refresh query docs.
Larger changes:
- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.
Smaller changes:
- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.
* Updates.
* Fix numbering.
* Fix glitches.
* Add new words to spellcheck file.
* Assorted changes.
* Further adjustments.
* Add missing punctuation.
* Document possible vulnerabilities for the druid-ranger-security
In certain configurations the ranger plugin can expose vulnerabilities due
to some of its dependencies having CVEs.
* Spelling checker is a bit tight
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication
* update version in druid-pac4j pom
* introducing unauthorized resource filter
* authenticated but authorized /unified-webconsole.html
* use httpReq.getRequestURI() for matching callback path
* add documentation
* minor doc addition
* licesne file updates
* make dependency analyze succeed
* fix doc build
* hopefully fixes doc build
* hopefully fixes license check build
* yet another try on fixing license build
* revert unintentional changes to website folder
* update version to 0.18.0-SNAPSHOT
* check session and its expiry on each request
* add crypto service
* code for encrypting the cookie
* update doc with cookiePassphrase
* update license yaml
* make sessionstore in Pac4jFilter private non static
* make Pac4jFilter fields final
* okta: use sha256 for hmac
* remove incubating
* add UTs for crypto util and session store impl
* use standard charsets
* add license header
* remove unused file
* add org.objenesis.objenesis to license.yaml
* a bit of nit changes in CryptoService and embedding EncryptionResult for clarity
* rename alg to cipherAlgName
* take cipher alg name, mode and padding as input
* add java doc for CryptoService and make it more understandable
* another UT for CryptoService
* cache pac4j Config
* use generics clearly in Pac4jSessionStore
* update cookiePassphrase doc to mention PasswordProvider
* mark stuff Nullable where appropriate in Pac4jSessionStore
* update doc to mention jdbc
* add error log on reaching callback resource
* javadoc for Pac4jCallbackResource
* introduce NOOP_HTTP_ACTION_ADAPTER
* add correct module name in license file
* correct extensions folder name in licenses.yaml
* replace druid-kubernetes-extensions to druid-pac4j
* cache SecureRandom instance
* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion
* fix build failure
* fix failing build
* fix failing build
* Code cleanup
* fix failing test
* Removed CloudConfigProperties and make specific class for each cloudInputSource
* Removed CloudConfigProperties and make specific class for each cloudInputSource
* pass s3ConfigProperties for split
* lazy init s3client
* update docs
* fix docs check
* address comments
* add ServerSideEncryptingAmazonS3.Builder
* fix failing checkstyle
* fix typo
* wrap the ServerSideEncryptingAmazonS3.Builder in a provider
* added java docs for S3InputSource constructor
* added java docs for S3InputSource constructor
* remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider
* Move Azure extension into Core
Moving the azure extension into Core.
* * Fix build failure
* * Add The MIT License (MIT) to list of compatible licenses
* * Address review comments
* * change reference to contrib azure to core azure
* * Fix spelling mistakes.
* Add config option for namespacePrefix
opentsdb emitter sends metric names to opentsdb verbatim as what druid
names them, for example "query.count", this doesn't fit well with a
central opentsdb server which might have namespaced metrics, for example
"druid.query.count". This adds support for adding an optional prefix.
The prefix also gets a trailing dot (.), after it, so the metric name
becomes <namespacePrefix>.<metricname>
configureable as "druid.emitter.opentsdb.namespacePrefix", as
documented.
Co-authored-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Björn Zettergren <bjorn.zettergren@deltaprojects.com>
* Spelling for PR #9372
Added "namespacePrefix" to .spelling exceptions, it's a variable name
used in documentation for opentsdb-emitter.
* fixing tests for PR #9372
changed naming of variables to be more descriptive
added test of prefix being an empty string: "".
added a conditional to buildNamespacePrefix to check for empty string
being fed if EventConverter called without OpentsdbEmitterConfig
instance.
* fixing checkstyle errors for PR #9372
used == to compare literal string, should be equals()
* cleaned up and updated PR #9372
Created a buildMetric function as suggested by clintropolis, and
removed redundant tests for empty strings as they're only used when
calling EventConverter directly without going through
OpentsdbEmitterConfig.
* consistent naming of tests PR #9372
Changed names of tests in files to match better with what it was
actually testing
changed check for Strings.isNullOrEmpty to just check for `null`, as
empty string valued `namespacePrefix` is handled in
OpentsdbEmitterConfig.
Co-authored-by: Martin Gerholm <inspector-martin@users.noreply.github.com>
* Doc update for new input source and input format.
- The input source and input format are promoted in all docs under docs/ingestion
- All input sources including core extension ones are located in docs/ingestion/native-batch.md
- All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md
- New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md
* parquet
* add warning for range partitioning with sequential mode
* hdfs + s3, gs
* add fs impl for gs
* address comments
* address comments
* gcs
* add parquet support to native batch
* cleanup
* implement toJson for sampler support
* better binaryAsString test
* docs
* i hate spellcheck
* refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest
* add comment, fix some stuff
* adjustments
* fix accident
* tweaks
* sketch of broker parallel merges done in small batches on fork join pool
* fix non-terminating sequences, auto compute parallelism
* adjust benches
* adjust benchmarks
* now hella more faster, fixed dumb
* fix
* remove comments
* log.info for debug
* javadoc
* safer block for sequence to yielder conversion
* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool
* smooth yield rate adjustment, more logs to help tune
* cleanup, less logs
* error handling, bug fixes, on by default, more parallel, more tests
* remove unused var
* comments
* timeboundary mergeFn
* simplify, more javadoc
* formatting
* pushdown config
* use nanos consistently, move logs back to debug level, bit more javadoc
* static terminal result batch
* javadoc for nullability of createMergeFn
* cleanup
* oops
* fix race, add docs
* spelling, remove todo, add unhandled exception log
* cleanup, revert unintended change
* another unintended change
* review stuff
* add ParallelMergeCombiningSequenceBenchmark, fixes
* hyper-threading is the enemy
* fix initial start delay, lol
* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2
* fix those important style issues with the benchmarks code
* lazy sequence creation for benchmarks
* more benchmark comments
* stable sequence generation time
* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs
* add jmh thread based benchmarks, cleanup some stuff
* oops
* style
* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose
* retool benchmark to allow modeling more typical heterogenous heavy workloads
* spelling
* fix
* refactor benchmarks
* formatting
* docs
* add maxThreadStartDelay parameter to threaded benchmark
* why does catch need to be on its own line but else doesnt