* Apply 'power of 2' optimization to BlockLayoutIndexedDoubleSupplier; slight optimization of buffer.get() in block layout indexed suppliers
* Fix byte order
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers
* Fix DataSegmentTest
* Refinements
* Try to fix
* Fix the second DataSegmentTest
* Nullability
* Fix tests
* Fix tests, unify to use TestHelper.getJsonMapper()
* Revert TestUtil as ServerTestHelper, fix tests
* Add newline
* Fix indexing tests
* Fix s3 tests
* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()
* Fix HDFS tests
* Fix HdfsDataSegmentPusherTest
* Capitalize constant names
* numeric quantiles sketch aggregator
* it seems that we need to synchronize all methods, which modify the state
* Seems like a false positive with -Pstrict
* code style fix
* code style fix
* use sketches-core-0.10.3
* moved cache ids to the central place
* better class names
* support large columns
* explained autodetection, added exception
* added comments regarding sketches moving on heap
* support reindexing
* implemented suggestions from jihoonson
* style fix
* use max(k, other.k) for better accuracy
* check for NilColumnValueSelector instead of null
* throw exceptions instead of providing no-op comparators
* SQL: Improve translation of time floor expressions.
The main change is to TimeFloorOperatorConversion.applyTimestampFloor.
- Prefer timestamp_floor expressions to timeFormat extractionFns, to
avoid turning things into strings when it isn't necessary.
- Collapse CAST(FLOOR(X TO Y) AS DATE) to FLOOR(X TO Y) if appropriate.
* Fix tests.
* ExpressionSelectors: Add caching selectors.
- SingleLongInputCaching selector for expressions on the __time column,
using a similar optimization to SingleScanTimeDimSelector
- SingleStringInputDimensionSelector for expressions on string columns
that return strings, using a similar optimization to ExtractionFn
based DimensionSelectors.
- SingleStringInputCaching selector for expressions on string columns
that return primitives.
Also, in the SQL planner, prefer expressions for time operations
rather than extractionFns.
* Code review comments.
* maxQueryTimeout property in runtime properties.
* extra line
* move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle.
* Fix initialize method.
* remove unused import.
* doc update.
* some more details in doc about query failure..
* minor fix.
* decorating QueryRunner to set and verify context. Added by servers.
* remove whitespace.
* SQL: Improved behavior when implicitly casting strings to date/time literals.
- Handle all flavors of ISO8601 and SQL literals.
- Throw errors on other literals instead of silently transforming them to 0.
* Respect timeZone when format is null.
* Add retries for coordinator fetch and lookup start in LookupReferencesManager
* Fix LookupConfigTest
* Address comments
* Address more comments
* And address more comments
* Address comms
* Recognize 'not found' lookups in LookupReferencesManager.tryGetLookupListFromCoordinator(), by @egor-ryashin
* Add compaction task
* added doc
* use combining aggregators
* address comments
* add support for dimensionsSpec
* fix getUniqueDims and getUniqueMetics
* find unique dimensionsSpec
* fix compilation
* add unit test
* fix test
* fix test
* test for different dimension orderings and types, and doc for type and ordering
* add control for custom ordering and type
* update doc
* fix compile
* fix compile
* add segments param
* fix serde error
* fix build
* Fix havingSpec on complex aggregators.
- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
related classes from druid-server to druid-processing. They probably make
more sense there anyway.
- Adds a SQL query test.
Fixes#4957.
* Remove unused import.
* Fix improper handling of empty arrays in StringDimensionIndexer.
This bug was able to introduce data errors: if the input rows to an
IncrementalIndex contained entirely empty arrays and single values, then
upon persisting to disk, the empty arrays would be replaced with the
lexicographically smallest single value, rather than nulls like they
should have been.
* Style fix.
* Add tests for bitmap indexes too.
* Introduce System wide property to select how to store double.
Set the default to store as float
Change-Id: Id85cca04ed0e7ecbce78624168c586dcc2adafaa
* fix tests
Change-Id: Ib42db724b8a8f032d204b58c366caaeabdd0d939
* Change the property name
Change-Id: I3ed69f79fc56e3735bc8f3a097f52a9f932b4734
* add tests and make default distribution store doubles as 64bits
Change-Id: I237b07829117ac61e247a6124423b03992f550f2
* adding mvn argument to parallel-test profile
Change-Id: Iae5d1328f901c4876b133894fa37e0d9a4162b05
* move property name and helper function to io.druid.segment.column.Column
Change-Id: I62ea903d332515de2b7ca45c02587a1b015cb065
* fix docs and clean style
Change-Id: I726abb8f52d25dc9dc62ad98814c5feda5e4d065
* fix docs
Change-Id: If10f4cf1e51a58285a301af4107ea17fe5e09b6d
* greater-than/less-than/equal-to havingSpec to call AggregatorFactory.finalizeComputation(..)
* fix the unit test and expect having to work on hyperUnique agg
* test fix
* fix style errors
* Changes for lookup synchronization
* Refactor of Lookup classes
* Minor refactors and doc update
* Change coordinator instance to be retrieved by DruidLeaderClient
* Wait before thread shutdown
* Make disablelookups flag true by default
* Update docs
* Rename flag
* Move executorservice shutdown to finally block
* Update LookupConfig
* Refactoring and doc changes
* Remove lookup config constructor
* Revert Lookupconfig constructor changes
* Add tests to LookupConfig
* Make executorservice local
* Update LRM
* Move ListeningScheduledExecutorService to ExecutorCompletionService
* Move exception to outer block
* Remove check to see future is done
* Remove unnecessary assignment
* Add logging
* QueryableIndexStorageAdapter: Lift column cache to Cursor sequence.
This is where it was before #4710, when its was moved to the individual
Cursors, leading to higher than expected memory usage. It could be
extreme for finer query granularities like "second".
* Comment.
* SQL: Upgrade to Calcite 1.14.0, some refactoring of internals.
This brings benefits:
- Ability to do GROUP BY and ORDER BY with ordinals.
- Ability to support IN filters beyond 19 elements (fixes#4203).
Some refactoring of druid-sql internals:
- Builtin aggregators and operators are implemented as SqlAggregators
and SqlOperatorConversions rather being special cases. This simplifies
the Expressions and GroupByRules code, which were becoming complex.
- SqlAggregator implementations are no longer responsible for filtering.
Added new functions:
- Expressions: strpos.
- SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR,
and DATE_TRUNC.
* Add missing @Override annotation.
* Adjustments for forbidden APIs.
* Adjustments for forbidden APIs.
* Disable GROUP BY alias.
* Doc reword.
* adding new post aggregators of test stats to druid-stats extension
* changes to address code review comments
* fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master
* add @Override annotation per CI log
* make changes per review comments/discussions
* remove some blocks per review comments
* Add identity to query metrics, logs.
Also fix a bug where unauthorized requests would not emit any logs or metrics,
and instead would log a "Tried to emit logs and metrics twice" warning.
Also rename QueryResource's "getServer" to "cancelQuery", because that's what
it does.
* Do not emit identity by default.
* Added org.joda.time.DateTime#(java.lang.String) to forbidden API.
* Added org.joda.time.DateTime#(java.lang.String, org.joda.time.format.DateTimeFormatter) to forbidden API.
* Add additional APIs that may create DateTime with default time zone
* Add helper function that accepts formatter to parse String.
* Add additional forbidden APIs
* Replace existing usage of forbidden APIs
* Use wrapper class to enforce Chronology on DateTimeFormatter.
* Creates constant UtcFormatter for constant ISODateTimeFormat.
* Move scan-query from a contrib extension into core.
Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion
This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.
This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.
Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.
* Adjustments from review.
* Add back Select query.
* Adjust SQL docs.
* Restore SelectQuery link.
* SQL: Full TRIM support.
- Support trimming arbitrary characters
- Support BOTH, LEADING, and TRAILING
* Remove unused import.
* Fix tests, add RTRIM / LTRIM.
* Remove unused imports.
* BTRIM and docs.
* Replace for with foreach.
* BufferHashGrouperTest: Better behavior with regard to large buffers.
1) Free buffers after each test
2) Avoid mmaping past the end of a file
* Use CloserRule.
When historical caching is enabled, and a select or topN query is
issued, and then a following query with "descending": true is set, the
cached query returns the ascending result (or vice versa), often
resulting in invalid paging identifiers.
The CacheKey for these queries doesn't include the "descending" flag;
this change adds it, and fixes the problem.
* Factor QueryableIndexColumnSelectorFactory and IncrementalIndexColumnSelectorFactory out of QueryableIndexStorageAdapter and IncrementalIndexStorageAdapter; Add Offset.getBaseReadableOffset(); Remove OffsetHolder interface; Replace Cursor extends ColumnSelectorFactory with composition; Reduce indirection in ColumnValueSelectors created by QueryableIndexColumnSelectorFactory
* Don't override clone() in FilteredOffset (the prev. implementation was broken); Some warnings fixed
* Simplify Cursors in QueryableIndexStorageAdapter
* Address comments
* Remove unused and unimplemented methods from GenericColumn interface
* Comments
* Add "round" option to cardinality and hyperUnique aggregators.
Also turn it on by default in SQL, to make math on distinct counts
work more as expected.
* Fix some compile errors.
* Fix test.
* Formatting.
* Add @ExtensionPoint and @PublicApi annotations.
* Clean up wording.
* Remove unused import.
* Remove unused imports.
* Only types can be extension points.
* Adjust annotations some more.
* Remove unused import.
* Make ServletFilterHolder an extension point.
* Add a couple extension points, and update docs.
* Fix dimension selectors with extractionFns on missing columns.
This patch properly applies the requested extractionFn to missing columns.
It's important when the extractionFn maps null to something other than null.
* Extract helper method.
* Change contracts of VirtualColumns and VirtualColumn methods based on review comments.
* Remove unused import.
* Remove unused method.
* Adjust helper function.
* Adjustments
* Replace Guava Enum.getIfPresent with builtin version.
This is useful for running in Hadoop environments that use Guava 11. Some
code is also simplified.
* Code review
* Don't use limit push down with having spec
* Throw exception when forcing limit push down with having
* Tests for having and limit push down
* Fix pool sizes in unit test
* Add HistoricalCursor.getReadableOffset() to access unwrapped offset in selectors, when the 'main' offset if FilteredOffset (fixes#4628)
* Stack overflow test
* NPE thrown when empty/null is passes to TimeDimExtractionFn
* Add @Nullable where ever applicable
* Add @Nullable to SearchQuerySpec.apply()
* Remove unused
* Remove some unnecessary use of boxed types.
* Fix some incorrect format strings.
* Enable IDEA's MalformedFormatString inspection.
* Add a Checkstyle check for finding uses of incorrect logging packages.
* Fix some incorrect usages of the metamx logger.
* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.
* Fix some more places where the wrong number of arguments are provided to format strings.
* Suppress `MalformedFormatString` inspection on legacy logging test.
* Use @SuppressWarnings rather than a noinspection suppression comment.
* Fix some more incorrect format strings.
* Suppress some more incorrect format string warnings where the incorrect string is intentional.
* Log the aggregator when closing it fails.
* Remove some unneeded log lines.
* Use Double.NEGATIVE_INFINITY and Double.POSITIVE_INFINITY instead of Double.MIN_VALUE and Double.MAX_VALUE, same for Float
* Replace usages in comments
* Fix RTree
* Remove commented code
* Add tests
* Fix GroupBy type cast error when ChainedExecutionQueryRunner merges multiple runners
* Move conversion step to separate method
* Remove unnecessary comment
* Use compute to update map
* Avoid usages of Default system Locale and printing to System.out or System.err in production code
* Fix Charset in DruidKerberosUtil
* Remove redundant string format in GenericIndexed
* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()
* Fix testSafeFormat()
* More fixes of redundant StringUtils.format() inside ISE
* Rename unimportantSafeFormat() to nonStrictFormat()
* Add some new expression functions and macros.
See misc/math-expr.md for the list of added functions, except for
"like", which previously existed but was not documented.
* Add easymock to datasketches tests.
* Add easymock to distinctcount tests.
* Add easymock to virtual-columns tests.
* Code review comments.
* Clean up code a bit.
* Add easymock to scan-query tests.
* Rework ExprMacros that have multiple impls.
* Improve test coverage.
* Remove DruidProcessingModule, QueryableModule and QueryRunnerFactoryModule from DI for coordinator, overlord, middle-manager. Add RouterDruidProcessing not to allocate processing resources on router
* Fix examples
* Fixes
* Revert Peon configs and add comments
* Remove qualifier
* Remove ability to create segments in v8 format
* Fix IndexGeneratorJobTest
* Fix parameterized test name in IndexMergerTest
* Remove extra legacy merging stuff
* Remove legacy serializer builders
* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
* ExpressionPostAggregator: Automatically finalize inputs.
Raw HyperLogLogCollectors and such aren't very useful. When writing
expressions like `x / y` users will expect `x` and `y` to be finalized.
* Fix un-merge.
* Code review comments.
* Remove unnecessary ImmutableMap.copyOf.
* Adding a flag to indicate when ObjectCachingColumnSelectorFactory need not be threadsafe.
* - Use of computeIfAbsent over putIfAbsent
- Replace Maps.newXXXMap() with normal instantiation
- Documentations on when is thread-safe required.
- Use Builders for On/OffheapIncrementalIndex
* - Optimization on computeIfAbsent
- Constant EMPTY DimensionsSpec
- Improvement on IncrementalIndexSchema.Builder
- Remove setting of default values
- Use var args for metrics
- Correction on On/OffheapIncrementalIndex Builders
- Combine On/OffheapIncrementalIndex Builders
* - Removing unused imports.
* - Helper method for testing with IncrementalIndex.Builder
* - Correction on javadoc.
* Style fix
* Limit random access in compressed column tests.
Random access leads to lots of block decompressions for reading single elements,
which is time prohibitive for the large column tests. For those tests, limit the
number of randomly accessed elements to 1000.
* Random -> ThreadLocalRandom
* Expressions work better with strings.
- ExpressionObjectSelector able to read from string columns, and able to
return strings.
- ExpressionVirtualColumn able to offer string (and long for that matter)
as its native type.
- ExpressionPostAggregator able to return strings.
- groupBy, topN: Allow post-aggregators to accept dimensions as inputs,
making ExpressionPostAggregator more useful.
- topN: Use DimExtractionTopNAlgorithm for STRING columns that do not
have dictionaries, allowing it to work with STRING-type expression
virtual columns.
- Adjusts null handling to better match the rest of Druid: null and
empty string treated the same; nulls implicitly treated as zeroes in
numeric context.
* Code review comments.
* More code review.
* Fix test.
* Adjust annotations.
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.
ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.
* Address code review comments.
* Enable most IntelliJ 'Probable bugs' inspections
* Fix in RemoteTestNG
* Fix IndexSpec's equals() and hashCode() to include longEncoding
* Fix inspection errors
* Extract global isntance of natural().nullsFirst(); address comments
* Fix
* Use noinspection comments instead of SuppressWarnings on method for IntelliJ-specific inspections
* Prohibit Ordering.natural().nullsFirst() using Checkstyle
* Make using implicit system charset an error
* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()
* Use English locale in StringUtils.safeFormat()
* Restore comment
* fix TestKafkaExtractionCluster fail due to port already used
* explicitly unmap hydrant files when abandonSegment to recyle mmap memory
* address the comments
* apply to AppenderatorImpl
* move ProtoBufInputRowParser from processing module to protobuf extensions
* Ported PR #3509
* add DynamicMessage
* fix local test stuff that slipped in
* add license header
* removed redundant type name
* removed commented code
* fix code style
* rename ProtoBuf -> Protobuf
* pom.xml: shade protobuf classes, handle .desc resource file as binary file
* clean up error messages
* pick first message type from descriptor if not specified
* fix protoMessageType null check. add test case
* move protobuf-extension from contrib to core
* document: add new configuration keys, and descriptions
* update document. add examples
* move protobuf-extension from contrib to core (2nd try)
* touch
* include protobuf extensions in the distribution
* fix whitespace
* include protobuf example in the distribution
* example: create new pb obj everytime
* document: use properly quoted json
* fix whitespace
* bump parent version to 0.10.1-SNAPSHOT
* ignore Override check
* touch
Jackson deserializes integers sometimes as int and sometimes as long,
depending on how big they are. This leads to ClassCastException
when comparing deserialized values as part of groupBy merging on the
broker.
* Don't pass QueryMetrics down in concurrent and async QueryRunners
* Rename QueryPlus.threadSafe() to withoutThreadUnsafeState(); Update QueryPlus.withQueryMetrics() Javadocs; Fix generics in MetricsEmittingQueryRunner and CpuTimeMetricQueryRunner; Make DefaultQueryMetrics to fail fast on modifications from concurrent threads
* Remove cache keys from HavingSpecs.
They weren't used, since they aren't part of the groupBy cache key.
Also, it's good that they weren't used, since many of them had
value truncation bugs.
* Fix imports.
* Fix test.
* Monomorphic processing of topN queries with simple double aggregators and historical segments
* Add CalledFromHotLoop annocations to specialized methods in SimpleDoubleBufferAggregator
* Fix a bug in Historical1SimpleDoubleAggPooledTopNScannerPrototype
* Fix a bug in SpecializationService
* In SpecializationService, emit maxSpecializations warning only once
* Make GenericIndexed.theBuffer final
* Address comments
* Newline
* Reapply 439c906 (Make GenericIndexed.theBuffer final)
* Remove extra PooledTopNAlgorithm.capabilities field
* Improve CachingIndexed.inspectRuntimeShape()
* Fix CompressedVSizeIntsIndexedSupplier.inspectRuntimeShape()
* Don't override inspectRuntimeShape() in subclasses of CompressedVSizeIndexedInts
* Annotate methods in specializations of DimensionSelector and FloatColumnSelector with @CalledFromHotLoop
* Make ValueMatcher to implement HotLoopCallee
* Doc fix
* Fix inspectRuntimeShape() impl in ExpressionSelectors
* INFO logging of specialization events
* Remove modificator
* Fix OrFilter
* Fix AndFilter
* Refactor PooledTopNAlgorithm.scanAndAggregate()
* Small refactoring
* Add 'nothing to inspect' messages in empty HotLoopCallee.inspectRuntimeShape() implementations
* Don't care about runtime shape in tests
* Fix accessor bugs in Historical1SimpleDoubleAggPooledTopNScannerPrototype and HistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopNScannerPrototype, cover them with tests
* Doc wording
* Address comments
* Remove MagicAccessorBridge and ensure Offset subclasses are public
* Attach error message to element