* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks
Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.
* The default value is 1/3(Runtime.maxMemory())
* To maintain the current behaviour set 'maxBytesInMemory' to -1
* If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
will be respected i.e. the first one to go above threshold will trigger persist
* Fix check style and remove a comment
* Add overlord unsecured paths to coordinator when using combined service (#5579)
* Add overlord unsecured paths to coordinator when using combined service
* PR comment
* More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting
* PR comments, add support in AppenderatorDriverRealtimeIndexTask
* Use TaskReport instead of metrics/context
* Fix tests
* Use TaskReport uploads
* Refactor fire department metrics retrieval
* Refactor input row serde in hadoop task
* Refactor hadoop task loader names
* Truncate error message in TaskStatus, add errorMsg to task report
* PR comments
* Allow getDomain to return disjointed intervals (#5570)
* Allow getDomain to return disjointed intervals
* Indentation issues
* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)
* Adding feature thetaSketchConstant to do some set operation in PostAggregator
* Updated review comments for PR #5551 - Adding thetaSketchConstant
* Fixed CI build issue
* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant
* Fix taskDuration docs for KafkaIndexingService (#5572)
* With incremental handoff the changed line is no longer true.
* Add doc for automatic pendingSegments (#5565)
* Add missing doc for automatic pendingSegments
* address comments
* Fix indexTask to respect forceExtendableShardSpecs (#5509)
* Fix indexTask to respect forceExtendableShardSpecs
* add comments
* Deprecate spark2 profile in pom.xml (#5581)
Deprecated due to https://github.com/druid-io/druid/pull/5382
* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)
Also switch various firehoses to the new method.
Fixes#5585.
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks
Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.
* The default value is 1/3(Runtime.maxMemory())
* To maintain the current behaviour set 'maxBytesInMemory' to -1
* If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
will be respected i.e. the first one to go above threshold will trigger persist
* Address code review comments
* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues
* Address more code review comments
* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase
* Fix some style checks
* Merge conflicts
* Fix failing tests
Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex
* Address PR comments
* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods
* Fix TeamCity inspection warnings
* Added maxBytesInMemory config to HadoopTuningConfig
* Updated the docs and examples
* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples
* Set maxBytesInMemory to 0 until used
Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts
* Update toString in KafkaSupervisorTuningConfig
* Use correct maxBytesInMemory value in AppenderatorImpl
* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory
Experimenting with various defaults, 1/3 jvm memory causes OOM
* Update docs to correct maxBytesInMemory default value
* Minor to rename and add comment
* Add more details in docs
* Address new PR comments
* Address PR comments
* Fix spelling typo
* Use mergeBuffer instead of processingBuffer in parallelCombiner
* Fix test
* address comments
* fix test
* Fix test
* Update comment
* address comments
* fix build
* Fix test failure
* Adding feature thetaSketchConstant to do some set operation in PostAggregator
* Updated review comments for PR #5551 - Adding thetaSketchConstant
* Fixed CI build issue
* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant
* Add more indexing task status and error reporting
* PR comments, add support in AppenderatorDriverRealtimeIndexTask
* Use TaskReport instead of metrics/context
* Fix tests
* Use TaskReport uploads
* Refactor fire department metrics retrieval
* Refactor input row serde in hadoop task
* Refactor hadoop task loader names
* Truncate error message in TaskStatus, add errorMsg to task report
* PR comments
* Use the official aws-sdk instead of jet3t
* fix compile and serde tests
* address comments and fix test
* add http version string
* remove redundant dependencies, fix potential NPE, and fix test
* resolve TODOs
* fix build
* downgrade jackson version to 2.6.7
* fix test
* resolve the last TODO
* support proxy and endpoint configurations
* fix build
* remove debugging log
* downgrade hadoop version to 2.8.3
* fix tests
* remove unused log
* fix it test
* revert KerberosAuthenticator change
* change hadoop-aws scope to provided in hdfs-storage
* address comments
* address comments
* Future-proof some Guava usage
* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20
* Use `Collections.emptyIterator()`
* Pretty formatting
* Make listenable future transforms a thing in default druid
* Format fix
* Add forbidden guava apis
* Make the ListenableFutrues.transformAsync have comments
* Undo intellij bad pattern matching in comments
* Futrues --> Futures
* Add empty iterators forbidding
* Fix extra `A`
* Correct method signature
* Address review comments
* Finish Gian review comments
* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
* Fix round robining in router.
Say that ten times fast.
For query endpoints, AsyncQueryForwardingServlet called hostFinder.getDefaultServer()
to set a default server, followed by hostFinder.getServer(inputQuery) to override it
with query-specific routing. Since hostFinder is round-robin, this skips a server.
When there are only two servers, one server is _always_ skipped and the router sends
all queries to the same broker.
* Adjust spacing.
* SegmentMetadataQuery: Fix default interval handling.
PR #4131 introduced a new copy builder for segmentMetadata that did
not retain the value of usingDefaultInterval. This led to it being
dropped and the default-interval handling not working as expected.
Instead of using the default 1 week history when intervals are not
provided, the segmentMetadata query would query _all_ segments,
incurring an unexpected performance hit.
This patch fixes the bug and adds a test for the copy builder.
* Intervals
* Support for disabling bitmap indexes.
Can save space for columns where bitmap indexes are pointless (like
free-form text).
* Remove import.
* Fix CompactionTaskTest.
* Update for review comments.
* Review comments, tests.
* Fix test.
* Fix two improper casts in HavingSpecMetricComparator.
Fixes two things:
1. An improper double-to-long cast when comparing double metrics to any
kind of value, which was a regression from #4883.
2. An improper double-to-long cast when comparing a long/int metric to a
double/float value: the value was cast to long/int, drawing strange
conclusions like int 100 matching a havingSpec of equalTo(100.5).
* Add comments.
* Remove extraneous comment.
* Simplify code a bit.
* Properly set "identity" in query metrics.
This patch adds an "identity" field to QueryPlus and sets it in
QueryLifecycle when the query starts executing. This is important
because it allows it to be used for future QueryMetrics created
by that QueryPlus object.
We also add "identity" to the request-level QueryMetrics object
created in emitLogsAndMetrics.
* Remove unused method.
* Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager.
Both were susceptible to the following conditions:
1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the
other was writing, or by writing to the file at the same time.
2. One JVM could partially write a file, then crash, leaving a truncated file.
* Use StringUtils.format
* Use both Joad Ids and Java IDs as Timezone to string readers
Change-Id: Ieb5c18559879f3f3a0104912ce2f0a354ad0aac3
* move the function to DateTimes and add org.joda.time.DateTimeZone#forID as part of forbidden api
Change-Id: Iff97fa044758019ed0c231587d10e31a9cc18da0
* exclude class and remove other usage
Change-Id: Ib458c2caaa1865535767e1009fbf017a92c8f615
* remove it from test classes
Change-Id: I9b576324f6c7e17a74bd8b13879232c9a8cd40b4
* remove unused
Change-Id: If1c5b70c26c2b7c83c20434cb72b2060653f5052
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
Code changes:
- In the lookup-based extractionFns, inherit injective property from
the lookup itself if not specified.
Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
They are necessary and important since they work with filters and
function cascades -- two things that the dimension specs do not do.
They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
more commonly used, so the docs read better this way.
* timewarp and timezones
changes:
* `TimewarpOperator` will now compensate for daylight savings time shifts between date translation ranges for queries using a `PeriodGranularity` with a timezone defined
* introduces a new abstract query type `TimeBucketedQuery` for all queries which have a `Granularity` (100% not attached to this name). `GroupByQuery`, `SearchQuery`, `SelectQuery`, `TimeseriesQuery`, and `TopNQuery` all extend `TimeBucke
tedQuery`, cutting down on some duplicate code and providing a mechanism for `TimewarpOperator` (and anything else) that needs to be aware of granularity
* move precondition check to TimeBucketedQuery, add Granularities.nullToAll, add getTimezone to TimeBucketQuery
* formatting
* more formatting
* unused import
* changes:
* add 'getGranularity' and 'getTimezone' to 'Query' interface
* merge 'TimeBucketedQuery' into 'BaseQuery'
* fixup tests from resulting serialization changes
* dedupe
* fix after merge
* suppress warning
* Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier; slight optimization of buffer.get() in block layout indexed suppliers
* Fix byte order
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers
* Fix DataSegmentTest
* Refinements
* Try to fix
* Fix the second DataSegmentTest
* Nullability
* Fix tests
* Fix tests, unify to use TestHelper.getJsonMapper()
* Revert TestUtil as ServerTestHelper, fix tests
* Add newline
* Fix indexing tests
* Fix s3 tests
* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()
* Fix HDFS tests
* Fix HdfsDataSegmentPusherTest
* Capitalize constant names
* numeric quantiles sketch aggregator
* it seems that we need to synchronize all methods, which modify the state
* Seems like a false positive with -Pstrict
* code style fix
* code style fix
* use sketches-core-0.10.3
* moved cache ids to the central place
* better class names
* support large columns
* explained autodetection, added exception
* added comments regarding sketches moving on heap
* support reindexing
* implemented suggestions from jihoonson
* style fix
* use max(k, other.k) for better accuracy
* check for NilColumnValueSelector instead of null
* throw exceptions instead of providing no-op comparators
* SQL: Improve translation of time floor expressions.
The main change is to TimeFloorOperatorConversion.applyTimestampFloor.
- Prefer timestamp_floor expressions to timeFormat extractionFns, to
avoid turning things into strings when it isn't necessary.
- Collapse CAST(FLOOR(X TO Y) AS DATE) to FLOOR(X TO Y) if appropriate.
* Fix tests.
* ExpressionSelectors: Add caching selectors.
- SingleLongInputCaching selector for expressions on the __time column,
using a similar optimization to SingleScanTimeDimSelector
- SingleStringInputDimensionSelector for expressions on string columns
that return strings, using a similar optimization to ExtractionFn
based DimensionSelectors.
- SingleStringInputCaching selector for expressions on string columns
that return primitives.
Also, in the SQL planner, prefer expressions for time operations
rather than extractionFns.
* Code review comments.
* maxQueryTimeout property in runtime properties.
* extra line
* move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle.
* Fix initialize method.
* remove unused import.
* doc update.
* some more details in doc about query failure..
* minor fix.
* decorating QueryRunner to set and verify context. Added by servers.
* remove whitespace.
* SQL: Improved behavior when implicitly casting strings to date/time literals.
- Handle all flavors of ISO8601 and SQL literals.
- Throw errors on other literals instead of silently transforming them to 0.
* Respect timeZone when format is null.
* Add retries for coordinator fetch and lookup start in LookupReferencesManager
* Fix LookupConfigTest
* Address comments
* Address more comments
* And address more comments
* Address comms
* Recognize 'not found' lookups in LookupReferencesManager.tryGetLookupListFromCoordinator(), by @egor-ryashin
* Add compaction task
* added doc
* use combining aggregators
* address comments
* add support for dimensionsSpec
* fix getUniqueDims and getUniqueMetics
* find unique dimensionsSpec
* fix compilation
* add unit test
* fix test
* fix test
* test for different dimension orderings and types, and doc for type and ordering
* add control for custom ordering and type
* update doc
* fix compile
* fix compile
* add segments param
* fix serde error
* fix build
* Fix havingSpec on complex aggregators.
- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
related classes from druid-server to druid-processing. They probably make
more sense there anyway.
- Adds a SQL query test.
Fixes#4957.
* Remove unused import.
* Fix improper handling of empty arrays in StringDimensionIndexer.
This bug was able to introduce data errors: if the input rows to an
IncrementalIndex contained entirely empty arrays and single values, then
upon persisting to disk, the empty arrays would be replaced with the
lexicographically smallest single value, rather than nulls like they
should have been.
* Style fix.
* Add tests for bitmap indexes too.
* Introduce System wide property to select how to store double.
Set the default to store as float
Change-Id: Id85cca04ed0e7ecbce78624168c586dcc2adafaa
* fix tests
Change-Id: Ib42db724b8a8f032d204b58c366caaeabdd0d939
* Change the property name
Change-Id: I3ed69f79fc56e3735bc8f3a097f52a9f932b4734
* add tests and make default distribution store doubles as 64bits
Change-Id: I237b07829117ac61e247a6124423b03992f550f2
* adding mvn argument to parallel-test profile
Change-Id: Iae5d1328f901c4876b133894fa37e0d9a4162b05
* move property name and helper function to io.druid.segment.column.Column
Change-Id: I62ea903d332515de2b7ca45c02587a1b015cb065
* fix docs and clean style
Change-Id: I726abb8f52d25dc9dc62ad98814c5feda5e4d065
* fix docs
Change-Id: If10f4cf1e51a58285a301af4107ea17fe5e09b6d
* greater-than/less-than/equal-to havingSpec to call AggregatorFactory.finalizeComputation(..)
* fix the unit test and expect having to work on hyperUnique agg
* test fix
* fix style errors
* Changes for lookup synchronization
* Refactor of Lookup classes
* Minor refactors and doc update
* Change coordinator instance to be retrieved by DruidLeaderClient
* Wait before thread shutdown
* Make disablelookups flag true by default
* Update docs
* Rename flag
* Move executorservice shutdown to finally block
* Update LookupConfig
* Refactoring and doc changes
* Remove lookup config constructor
* Revert Lookupconfig constructor changes
* Add tests to LookupConfig
* Make executorservice local
* Update LRM
* Move ListeningScheduledExecutorService to ExecutorCompletionService
* Move exception to outer block
* Remove check to see future is done
* Remove unnecessary assignment
* Add logging
* QueryableIndexStorageAdapter: Lift column cache to Cursor sequence.
This is where it was before #4710, when its was moved to the individual
Cursors, leading to higher than expected memory usage. It could be
extreme for finer query granularities like "second".
* Comment.
* SQL: Upgrade to Calcite 1.14.0, some refactoring of internals.
This brings benefits:
- Ability to do GROUP BY and ORDER BY with ordinals.
- Ability to support IN filters beyond 19 elements (fixes#4203).
Some refactoring of druid-sql internals:
- Builtin aggregators and operators are implemented as SqlAggregators
and SqlOperatorConversions rather being special cases. This simplifies
the Expressions and GroupByRules code, which were becoming complex.
- SqlAggregator implementations are no longer responsible for filtering.
Added new functions:
- Expressions: strpos.
- SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR,
and DATE_TRUNC.
* Add missing @Override annotation.
* Adjustments for forbidden APIs.
* Adjustments for forbidden APIs.
* Disable GROUP BY alias.
* Doc reword.
* adding new post aggregators of test stats to druid-stats extension
* changes to address code review comments
* fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master
* add @Override annotation per CI log
* make changes per review comments/discussions
* remove some blocks per review comments
* Add identity to query metrics, logs.
Also fix a bug where unauthorized requests would not emit any logs or metrics,
and instead would log a "Tried to emit logs and metrics twice" warning.
Also rename QueryResource's "getServer" to "cancelQuery", because that's what
it does.
* Do not emit identity by default.
* Added org.joda.time.DateTime#(java.lang.String) to forbidden API.
* Added org.joda.time.DateTime#(java.lang.String, org.joda.time.format.DateTimeFormatter) to forbidden API.
* Add additional APIs that may create DateTime with default time zone
* Add helper function that accepts formatter to parse String.
* Add additional forbidden APIs
* Replace existing usage of forbidden APIs
* Use wrapper class to enforce Chronology on DateTimeFormatter.
* Creates constant UtcFormatter for constant ISODateTimeFormat.
* Move scan-query from a contrib extension into core.
Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion
This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.
This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.
Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.
* Adjustments from review.
* Add back Select query.
* Adjust SQL docs.
* Restore SelectQuery link.
* SQL: Full TRIM support.
- Support trimming arbitrary characters
- Support BOTH, LEADING, and TRAILING
* Remove unused import.
* Fix tests, add RTRIM / LTRIM.
* Remove unused imports.
* BTRIM and docs.
* Replace for with foreach.
* BufferHashGrouperTest: Better behavior with regard to large buffers.
1) Free buffers after each test
2) Avoid mmaping past the end of a file
* Use CloserRule.
When historical caching is enabled, and a select or topN query is
issued, and then a following query with "descending": true is set, the
cached query returns the ascending result (or vice versa), often
resulting in invalid paging identifiers.
The CacheKey for these queries doesn't include the "descending" flag;
this change adds it, and fixes the problem.
* Factor QueryableIndexColumnSelectorFactory and IncrementalIndexColumnSelectorFactory out of QueryableIndexStorageAdapter and IncrementalIndexStorageAdapter; Add Offset.getBaseReadableOffset(); Remove OffsetHolder interface; Replace Cursor extends ColumnSelectorFactory with composition; Reduce indirection in ColumnValueSelectors created by QueryableIndexColumnSelectorFactory
* Don't override clone() in FilteredOffset (the prev. implementation was broken); Some warnings fixed
* Simplify Cursors in QueryableIndexStorageAdapter
* Address comments
* Remove unused and unimplemented methods from GenericColumn interface
* Comments
* Add "round" option to cardinality and hyperUnique aggregators.
Also turn it on by default in SQL, to make math on distinct counts
work more as expected.
* Fix some compile errors.
* Fix test.
* Formatting.
* Add @ExtensionPoint and @PublicApi annotations.
* Clean up wording.
* Remove unused import.
* Remove unused imports.
* Only types can be extension points.
* Adjust annotations some more.
* Remove unused import.
* Make ServletFilterHolder an extension point.
* Add a couple extension points, and update docs.
* Fix dimension selectors with extractionFns on missing columns.
This patch properly applies the requested extractionFn to missing columns.
It's important when the extractionFn maps null to something other than null.
* Extract helper method.
* Change contracts of VirtualColumns and VirtualColumn methods based on review comments.
* Remove unused import.
* Remove unused method.
* Adjust helper function.
* Adjustments
* Replace Guava Enum.getIfPresent with builtin version.
This is useful for running in Hadoop environments that use Guava 11. Some
code is also simplified.
* Code review
* Don't use limit push down with having spec
* Throw exception when forcing limit push down with having
* Tests for having and limit push down
* Fix pool sizes in unit test
* Add HistoricalCursor.getReadableOffset() to access unwrapped offset in selectors, when the 'main' offset if FilteredOffset (fixes#4628)
* Stack overflow test
* NPE thrown when empty/null is passes to TimeDimExtractionFn
* Add @Nullable where ever applicable
* Add @Nullable to SearchQuerySpec.apply()
* Remove unused
* Remove some unnecessary use of boxed types.
* Fix some incorrect format strings.
* Enable IDEA's MalformedFormatString inspection.
* Add a Checkstyle check for finding uses of incorrect logging packages.
* Fix some incorrect usages of the metamx logger.
* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.
* Fix some more places where the wrong number of arguments are provided to format strings.
* Suppress `MalformedFormatString` inspection on legacy logging test.
* Use @SuppressWarnings rather than a noinspection suppression comment.
* Fix some more incorrect format strings.
* Suppress some more incorrect format string warnings where the incorrect string is intentional.
* Log the aggregator when closing it fails.
* Remove some unneeded log lines.
* Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY instead of Double.MIN_VALUE and Double.MAX_VALUE, same for Float
* Replace usages in comments
* Fix RTree
* Remove commented code
* Add tests
* Fix GroupBy type cast error when ChainedExecutionQueryRunner merges multiple runners
* Move conversion step to separate method
* Remove unnecessary comment
* Use compute to update map
* Avoid usages of Default system Locale and printing to System.out or System.err in production code
* Fix Charset in DruidKerberosUtil
* Remove redundant string format in GenericIndexed
* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()
* Fix testSafeFormat()
* More fixes of redundant StringUtils.format() inside ISE
* Rename unimportantSafeFormat() to nonStrictFormat()
* Add some new expression functions and macros.
See misc/math-expr.md for the list of added functions, except for
"like", which previously existed but was not documented.
* Add easymock to datasketches tests.
* Add easymock to distinctcount tests.
* Add easymock to virtual-columns tests.
* Code review comments.
* Clean up code a bit.
* Add easymock to scan-query tests.
* Rework ExprMacros that have multiple impls.
* Improve test coverage.
* Remove DruidProcessingModule, QueryableModule and QueryRunnerFactoryModule from DI for coordinator, overlord, middle-manager. Add RouterDruidProcessing not to allocate processing resources on router
* Fix examples
* Fixes
* Revert Peon configs and add comments
* Remove qualifier
* Remove ability to create segments in v8 format
* Fix IndexGeneratorJobTest
* Fix parameterized test name in IndexMergerTest
* Remove extra legacy merging stuff
* Remove legacy serializer builders
* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
* ExpressionPostAggregator: Automatically finalize inputs.
Raw HyperLogLogCollectors and such aren't very useful. When writing
expressions like `x / y` users will expect `x` and `y` to be finalized.
* Fix un-merge.
* Code review comments.
* Remove unnecessary ImmutableMap.copyOf.
* Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe.
* - Use of computeIfAbsent over putIfAbsent
- Replace Maps.newXXXMap() with normal instantiation
- Documentations on when is thread-safe required.
- Use Builders for On/OffheapIncrementalIndex
* - Optimization on computeIfAbsent
- Constant EMPTY DimensionsSpec
- Improvement on IncrementalIndexSchema.Builder
- Remove setting of default values
- Use var args for metrics
- Correction on On/OffheapIncrementalIndex Builders
- Combine On/OffheapIncrementalIndex Builders
* - Removing unused imports.
* - Helper method for testing with IncrementalIndex.Builder
* - Correction on javadoc.
* Style fix
* Limit random access in compressed column tests.
Random access leads to lots of block decompressions for reading single elements,
which is time prohibitive for the large column tests. For those tests, limit the
number of randomly accessed elements to 1000.
* Random -> ThreadLocalRandom
* Expressions work better with strings.
- ExpressionObjectSelector able to read from string columns, and able to
return strings.
- ExpressionVirtualColumn able to offer string (and long for that matter)
as its native type.
- ExpressionPostAggregator able to return strings.
- groupBy, topN: Allow post-aggregators to accept dimensions as inputs,
making ExpressionPostAggregator more useful.
- topN: Use DimExtractionTopNAlgorithm for STRING columns that do not
have dictionaries, allowing it to work with STRING-type expression
virtual columns.
- Adjusts null handling to better match the rest of Druid: null and
empty string treated the same; nulls implicitly treated as zeroes in
numeric context.
* Code review comments.
* More code review.
* Fix test.
* Adjust annotations.
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.
ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.
* Address code review comments.
* Enable most IntelliJ 'Probable bugs' inspections
* Fix in RemoteTestNG
* Fix IndexSpec's equals() and hashCode() to include longEncoding
* Fix inspection errors
* Extract global isntance of natural().nullsFirst(); address comments
* Fix
* Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections
* Prohibit Ordering.natural().nullsFirst() using Checkstyle
* Make using implicit system charset an error
* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()
* Use English locale in StringUtils.safeFormat()
* Restore comment
* fix TestKafkaExtractionCluster fail due to port already used
* explicitly unmap hydrant files when abandonSegment to recyle mmap memory
* address the comments
* apply to AppenderatorImpl
* move ProtoBufInputRowParser from processing module to protobuf extensions
* Ported PR #3509
* add DynamicMessage
* fix local test stuff that slipped in
* add license header
* removed redundant type name
* removed commented code
* fix code style
* rename ProtoBuf -> Protobuf
* pom.xml: shade protobuf classes, handle .desc resource file as binary file
* clean up error messages
* pick first message type from descriptor if not specified
* fix protoMessageType null check. add test case
* move protobuf-extension from contrib to core
* document: add new configuration keys, and descriptions
* update document. add examples
* move protobuf-extension from contrib to core (2nd try)
* touch
* include protobuf extensions in the distribution
* fix whitespace
* include protobuf example in the distribution
* example: create new pb obj everytime
* document: use properly quoted json
* fix whitespace
* bump parent version to 0.10.1-SNAPSHOT
* ignore Override check
* touch
Jackson deserializes integers sometimes as int and sometimes as long,
depending on how big they are. This leads to ClassCastException
when comparing deserialized values as part of groupBy merging on the
broker.
* Don't pass QueryMetrics down in concurrent and async QueryRunners
* Rename QueryPlus.threadSafe() to withoutThreadUnsafeState(); Update QueryPlus.withQueryMetrics() Javadocs; Fix generics in MetricsEmittingQueryRunner and CpuTimeMetricQueryRunner; Make DefaultQueryMetrics to fail fast on modifications from concurrent threads
* Remove cache keys from HavingSpecs.
They weren't used, since they aren't part of the groupBy cache key.
Also, it's good that they weren't used, since many of them had
value truncation bugs.
* Fix imports.
* Fix test.
* Monomorphic processing of topN queries with simple double aggregators and historical segments
* Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator
* Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype
* Fix a bug in SpecializationService
* In SpecializationService, emit maxSpecializations warning only once
* Make GenericIndexed.theBuffer final
* Address comments
* Newline
* Reapply 439c906 (Make GenericIndexed.theBuffer final)
* Remove extra PooledTopNAlgorithm.capabilities field
* Improve CachingIndexed.inspectRuntimeShape()
* Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape()
* Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts
* Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop
* Make ValueMatcher to implement HotLoopCallee
* Doc fix
* Fix inspectRuntimeShape() impl in ExpressionSelectors
* INFO logging of specialization events
* Remove modificator
* Fix OrFilter
* Fix AndFilter
* Refactor PooledTopNAlgorithm.scanAndAggregate()
* Small refactoring
* Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations
* Don't care about runtime shape in tests
* Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests
* Doc wording
* Address comments
* Remove MagicAccessorBridge and ensure Offset subclasses are public
* Attach error message to element
* Make Errorprone the default compiler
* Address comments
* Make Error Prone's ClassCanBeStatic rule a error
* Preconditions allow only %s pattern
* Fix DruidCoordinatorBalancerTester
* Try to give the compiler more memory
* Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now
* Don't show compiler warnings
* Try different travis script
* Fix travis.yml
* Make Error Prone optional again
* For error-prone compiler
* Increase compiler's maxmem
* Don't run Error Prone for benchmarks because of OOM
* Skip install step in Travis
* Remove MetricHolder.writeToChannel()
* In travis.yml, check compilation before tests, because it may fail faster
* Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map).
* Fix GroupByMergingQueryRunnerV2
* Fix QueryResourceTest
* Expand the comment to Query.run(walker, context)
* Remove legacy version of BySegmentSkippingQueryRunner.doRun()
* Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs
* optionally add extensions to explicitly specified hadoopContainerClassPath
* note extensions always pushed in hadoop container when druid.extensions.hadoopContainerDruidClasspath is not provided explicitly
* coordinator lookups mgmt improvements
* revert replaces removal, deprecate it instead
* convert and use older specs stored in db
* more tests and updates
* review comments
* add behavior for 0.10.0 to 0.9.2 downgrade
* incorporating more review comments
* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well
* wip on LookupCoordinatorManager
* lifecycle lock
* refactor thread creation into utility method
* more review comments addressed
* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2
* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop
* run lookup mgmt on leader coordinator only
* wip: changes to do multiple start() and stop() on LookupCoordinatorManager
* lifecycleLock fix usage in LookupReferencesManagerTest
* add LifecycleLock back
* fix license hdr
* some fixes
* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless
* address review comments
* addressing leventov's comments
* address charle's comments
* add IOE.java
* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt
* move thread creation utility method to Execs
* fix names
* add tests for LookupCoordinatorManager.lookupManagementLoop()
* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager
* address leventov comments
* remove LookupsStateWithMap and parameterize LookupsState
* address review comments
* address more review comments
* misc fixes
* Add queryMetrics property to Query interface; Fix bugs and removed unused code in Druids
* Fix a bug in TimeBoundaryQuery.getFilter() and remove TimeBoundaryQuery.getDimensionsFilter()
* Don't reassign query's queryMetrics if already present in CPUTimeMetricQueryRunner and MetricsEmittingQueryRunner
* Add compatibility constructor to BaseQuery
* Remove Query.queryMetrics property
* Move nullToNoopLimitSpec() method to LimitSpec interface
* Rename GroupByQuery.applyLimit() to postProcess(); Fix inconsistencies in GroupByQuery.Builder
* Make timeout behavior consistent to document
* Refactoring BlockingPool and add more methods to QueryContexts
* remove unused imports
* Addressed comments
* Address comments
* remove unused method
* Make default query timeout configurable
* Fix test failure
* Change timeout from period to millis
* don't do postAgg in TimeseriesQueryQueryToolChest when not necessary
* set postAggs to empty list in TimeseriesQueryQueryToolChest instead of checking finalizing fn
* fix ut
* fix ut again
* Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream
* Fixes
* Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils
* Move constructors
* Add GenericIndexedBenchmark
* Comments
* Typo
* Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe
* Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap
* Optimize TableLongEncodingWriter
* Add checks to SerializerUtils methods
* Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream()
* Update GenericIndexedBenchmark
* SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian
* Add GenericIndexedBenchmark.indexOf()
* More checks in methods in SerializerUtils
* Use helperBuffer.arrayOffset()
* Optimizations in SerializerUtils
The query caches generally store dimensions and aggregators positionally, so
appendCacheablesIgnoringOrder could lead to incorrect results being pulled
from the cache.
* Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing.
* Use Execs.singleThreaded()
* RuntimeShapeInspector to support nullable fields
* Make CalledFromHotLoop annotation Inherited
* Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory
* Close InputStream in SpecializationService
* Formatting
* Test specialized PooledTopNScanners
* Set flags in PooledTopNAlgorithm directly
* Fix tests, dependent on CountAggragatorFactory toString() form
* Fix
* Revert CountAggregatorFactory changes
* Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector
* Remove duplicate RoaringBitmap dependency in the extendedset pom.xml
* Fix
* Treat ByteBuffers specially in StringRuntimeShape
* Doc fix
* Annotate BufferAggregator.init() with CalledFromHotLoop
* Make triggerSpecializationIterationsThreshold an int
* Remove SpecializationService.PerPrototypeClassState.of()
* Add comments
* Limit the amount of specializations that SpecializationService could make
* Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions
* Use more efficient ConcurrentMap's idioms in SpecializationService
* Allow compilation as Java8 source and target for everything except API
* Remove conditions in tests which assume that we may run with Java 7
* Update easymock to 3.4
* Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration
* Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity()
* Remove java7 special for druid-api
This fixes#4020 because it means the timestamp will always be included for outermost
queries. Historicals receiving queries from older brokers will think they're
outermost (because CTX_KEY_OUTERMOST isn't set to "false"), so they'll include a
timestamp, so the older brokers will be OK.
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.
Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".
Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
be more accurate.
* Remove unused import.
* Restore buffer size.
Their "convertUnsortedEncodedKeyComponentToActualArrayOrList" methods didn't respect the contract,
which says they should return single values (not array/list) if there is only a single value
to return. This affects the behavior of ObjectColumnSelectors on realtime segments.
* In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir
* Use FileUtils.forceMkdir() across the codebase and remove some unused code
* Fix test
* Fix PullDependencies.run()
* Unused import
* No more singleton. Reduce iterations
* Granularities
* Fix the delay in the test
* Add license header
* Remove unused imports
* Lot more unused imports from all the rearranging
* CR feedback
* Move javadoc to constructor
* Refactor Segment Granularity
* Beginning of one granularity
* Copy the fix for custom periods in segment-grunalrity over here.
* Remove the custom serialization for now.
* Compilation cleanup
* Reformat code
* Fixing unit tests
* Unify to use a single iterable
* Backward compatibility for rolling upgrade
* Minor check style. Cosmetic changes.
* Rename length and millis to duration
* CR feedback
* Minor changes.
* initial commits for finalizeFieldAccess #2433
* fix some bugs to run a query
* change name of method Queries.verifyAggregations to Queries.prepareAggregations
* add Uts
* fix Ut failures
* rebased to master
* address comments and add a Ut for arithmetic post aggregators
* rebased to the master
* address the comment of injection within arithmetic post aggregator
* address comments and introduce decorate() in the PostAggregator interface.
* Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name.
* Fix a code style mismatch.
* Rebased to the master
* Removing Integer.MAX column size limit.
* On demand creation of headerLong, use v2 instead of v3
* Avoid reusing the same object from a previous test.
* Avoid reusing the same object from a previous test part#2
* code formatting.
* GenericIndexed/Writer code review changes.
* GenericIndexed/writer code review requested changes.
* checkIndex() to static
* native endianess for genericIndexedV2, code review requested changes.
* Formatting
* Hll fix.
* use native endianess during bag size calculation.
* Code review requested changes.
* IOPeon close() changes.
* use different tmp directory path for testing.
* Code review requested changes.
* Less use of File.deleteOnExit()
* removed deleteOnExit from most of the tests/benchmarks/iopeon
* Made IOpeon closable
* Formatting.
* Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon
* Close all aggregators when closing onHeapIncrementalIndex
* Aggregators are now handled as Closeables, remove unnecessary mock in test
* Fix variable shadowing
* Add PostAggregators to generator cache keys for top-n queries
* Add tests for strings
* Remove debug comments
* Add type keys and list sizes to cache key
* Make post aggregators used for sort are considered for cache key generation
* Use assertArrayEquals()
* Improve findPostAggregatorsForSort()
* Address comments
* fix test failure
* address comments
* Simplify LikeFilter implementation of getBitmapIndex, estimateSelectivity.
LikeFilter:
- Reduce code duplication, and simplify methods, at the cost of incurring an extra box
of ImmutableBitmap into a SingletonImmutableList. I think this is fine, since this
should be cheap and the code path is not hot (just once per filter).
Filters:
- Make estimateSelectivity public since it seems intended that they be used by Filter
implementations, and Filters from extensions may want to use them too. Removed
@VisibleForTesting for the same reason.
- Rename one of the estimatePredicateSelectivity overloads to estimateSelectivity, since
predicates aren't involved.
* Address PR comments.
* Remove unused import
* Change List to Collection
* Add filter selectivity estimation for auto search strategy
* Addressed comments
* Lazy bitmap materialization for bitmap sampling and java docs
* Addressed comments.
- Fix wrong non-overlap ratio computation and added unit tests.
- Change Iterable<Integer> to IntIterable
- Remove unnecessary Iterable<Integer>
* Addressed comments
- Split a long ternary operation into if-else blocks
- Add IntListUtils.fromTo()
* Fix test failure and add a test for RangeIntList
* fix code style
* Diabled selectivity estimation for multi-valued dimensions
* Address comment
* Introduce SegmentizerFactory
- that knows how to deserialize specific type of segment
- Default implementation is MMappedQueryableSegmentizerFactory which creates QueryableIndexSegment
- Unit test for the default behavior
* review comments
* Add virtual column types, holder serde, and safety features.
Virtual columns:
- add long, float, dimension selectors
- put cache IDs in VirtualColumnCacheHelper
- adjust serde so VirtualColumns can be the holder object for Jackson
- add fail-fast validation for cycle detection and duplicates
- add expression virtual column in core
Storage adapters:
- move virtual column hooks before checking base columns, to prevent surprises
when a new base column is added that happens to have the same name as a
virtual column.
* Fix ExtractionDimensionSpecs with virtual dimensions.
* Fix unused imports.
* CR comments
* Merge one more time, with feeling.
* * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils;
* Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality;
* Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers;
* Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector;
* Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher;
* Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn
* Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher
* Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method
* Cache baseIdLookup in ForwardingFilteredDimensionSelector
* Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull)
* Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available
* Doc fixes
* Addressed comments
* Fix
* Fix
* Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector
* throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm
* Removing unused code from io.druid.java.util.common.guava package; fix#3563 (more consistent and paranoiac resource handing in Sequences subsystem); Add Sequences.wrap() for DRY in MetricsEmittingQueryRunner, CPUTimeMetricQueryRunner and SpecificSegmentQueryRunner; Catch MissingSegmentsException in SpecificSegmentQueryRunner's yielder.next() method (follow up on #3617)
* Make Sequences.withEffect() execute the effect if the wrapped sequence throws exception from close()
* Fix strange code in MetricsEmittingQueryRunner
* Add comment on why YieldingSequenceBase is used in Sequences.withEffect()
* Use Closer in OrderedMergeSequence and MergeSequence to close multiple yielders
* streaming version of select query
* use columns instead of dimensions and metrics;prepare for valueVector;remove granularity
* respect query limit within historical
* use constant
* fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner
* add some test for scan query
* add scan query document
* fix merge conflicts
* add compactedList resultFormat, this format is better for json ser/der
* respect query timeout
* respect query limit on broker
* use static consts and remove unused code
* SQL support for nested groupBys.
Allows, for example, doing exact count distinct by writing:
SELECT COUNT(*) FROM (SELECT DISTINCT col FROM druid.foo)
Contrast with approximate count distinct, which is:
SELECT COUNT(DISTINCT col) FROM druid.foo
* Add deeply-nested groupBy docs, tests, and maxQueryCount config.
* Extract magic constants into statics.
* Rework rules to put preconditions in the "matches" method.
* Add an option to SearchQuery to choose a search query execution strategy.
Supported strategies are
1) Index-only query execution
2) Cursor-based scan
3) Auto: choose an efficient strategy for a given query
* Add SearchStrategy and SearchQueryExecutor
* Address comments
* Rename strategies and set UseIndexesStrategy as the default strategy
* Add a cost-based planner for auto strategy
* Add document
* Fix code style
* apply code style
* apply comments
* Filters: Use ColumnSelectorFactory directly for building row-based matchers.
* Adjustments based on code review.
- BoundDimFilter: fewer volatiles, rename matchesAnything to !matchesNothing.
- HavingSpecs: Clarify that they are not thread-safe, and make DimFilterHavingSpec
not thread safe.
- Renamed rowType to rowSignature.
- Added specializations for time-based vs non-time-based DimensionSelector in RBCSF.
- Added convenience method DimensionHanderUtils.createColumnSelectorPlus.
- Added singleton ZeroIndexedInts.
- Added test cases for DimFilterHavingSpec.
* Make ValueMatcherColumnSelectorStrategy actually use the associated selector.
* Add RangeIndexedInts.
* DimFilterHavingSpec: Fix concurrent usage guard on jdk7.
* Add assertion to ZeroIndexedInts.
* Rename no-longer-volatile members.
* add first and last aggregator
* add test and fix
* moving around
* separate aggregator valueType
* address PR comment
* add finalize inner query and adjust v1 inner indexing
* better test and fixes
* java-util import fixes
* PR comments
* Add first/last aggs to ITWikipediaQueryTest
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.
Also use more iterations.
* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.
Specifically:
- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.