* Moved Scan Builder to Druids class and started on Scan Benchmark setup
* Need to form queries
* It runs.
* Remove todos
* Change number of benchmark iterations
* Changed benchmark params
* More param changes
* Made Jon's changes and removed TODOs
* Broke some long lines into two lines
* Decrease segment size for less memory usage
* Committing a param change to kick teamcity
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool
* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java
* Remove unnecessary comment
* Checkstyle
* Fix getFromExtensions()
* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization
* Fix UriCacheGeneratorTest
* Workaround issue with MaterializedViewQueryQueryToolChest
* Strengthen Appenderator's contract regarding concurrency
* fix issue with SingleLongInputCachingExpressionColumnValueSelector when sql compatible null handling enabled
* add test with doubles to show same behavior for floats/doubles that lack the optimization of longs
* simplify
* fix import
* blooming aggs
* partially address review
* fix docs
* minor test refactor after rebase
* use copied bloomkfilter
* add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests
* add methods to BloomKFilter to get number of set bits, use in comparator, fixes
* more docs
* fix
* fix style
* simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets
* oof, more fixes
* more sane docs example
* fix it
* do the right thing in the right place
* formatting
* fix
* avoid conflict
* typo fixes, faster comparator, docs for comparator behavior
* unused imports
* use buffer comparator instead of deserializing
* striped readwrite lock for buffer agg, null handling comparator, other review changes
* style fixes
* style
* remove sync for now
* oops
* consistency
* inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception
* CardinalityBufferAggregator inspect selectors instead of selectorPluses
* fix style
* refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments
* adjustment
* fix teamcity error?
* rename nil aggs to empty, change empty agg constructor signature, add comments
* use stringutils base64 stuff to be chill with master
* add aggregate combiner, comment
* * Add few methods about base64 into StringUtils
* Use `java.util.Base64` instead of others
* Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis
* Rename encodeBase64String & decodeBase64String
* Update druid-forbidden-apis
* Replacing Math.random() with ThreadLocalRandom.current().nextDouble()
* Added java.lang.Math#random() in forbidden-apis.txt
* Minor change in the message - druid-forbidden-apis.txt
* use SqlLifecyle to manage sql execution, add sqlId
* add sql request logger
* fix UT
* rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc
* add docs and more sql request logger impls
* add UT for http and jdbc
* fix forbidden use of com.google.common.base.Charsets
* fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId
* do not use default method in QueryMetrics interface
* capitalize 'sql' everywhere in the non-property parts of the docs
* use RequestLogger interface to log sql query
* minor bugfixes and add switching request logger
* add filePattern configs for FileRequestLogger
* address review comments, adjust sql request log format
* fix inspection error
* try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider
* Use multi-guava version friendly direct executor implementation
* Don't use a singleton
* Fix strict compliation complaints
* Copy Guava's DirectExecutor
* Fix javadoc
* Imports are the devil
* make logs that are only useful for debugging be at debug level so log volume is much more chill
* info level messages for total merge buffer allocated/free
* more chill compaction logs
* FileUtils: Sync directory entry too on writeAtomically.
See the fsync(2) man page for why this is important:
https://linux.die.net/man/2/fsync
This also plumbs CompressionUtils's "zip" function through
writeAtomically, so the code for handling atomic local filesystem
writes is all done in the same place.
* Remove unused import.
* Avoid FileOutputStream.
* Allow non-atomic writes to overwrite.
* Add some comments. And no need to flush an unbuffered stream.
* Double-checked locking bug is fixed.
* @Nullable is removed since there is no need to use along with @MonotonicNonNull.
* Static import is removed.
* Lazy initialization is implemented.
* Local variables used instead of volatile ones.
* Local variables used instead of volatile ones.
* Fix travis timeout in BufferHashGrouperTest
* adjust buffer size
* adjust bufferSize and loadFactor
* increase memory
* add debug code
* cat error
* after script
* print logs
* print per 2 min
* use direct mem
* clean up
* autosize processing buffers based on direct memory sizing
* remove oops, more test
* max 1gb autosize buffers, test, start of docs
* fix oops
* revert accidental change
* print buffer size in exception
* change the things
Not putting this to 0.13 milestone because the found bugs are not critical (one is a harmless DI config duplicate, and another is in a benchmark.
Change in `DumpSegment` is just an indentation change.
* Add checkstyle rules about imports and empty lines between members
* Add suppressions
* Update Eclipse import order
* Add empty line
* Fix StatsDEmitter
* Expressions: Fix improper supplier reuse with missing columns.
ExpressionSelectors has an optimization that skips building a Map
when there is only one input supplier. However, this optimization
should not be used in the case where the is one input supplier but
more than one input identifier (which can happen when only one
input identifier corresponds to an actual column).
Fixes#6556.
* Add underscores to statics.
* Optimization for expressions that hit a single long column.
There was previously a single-long-input optimization that applied only
to the time column. These have been combined together. Also adds
type-specific value caching to ExprEval, which allowed simplifying
the SingleLongInputCachingExpressionColumnValueSelector code.
* Add more benchmarks.
* Don't use LRU cache for __time.
* Simplify a bit.
* Let the cache grow.
* Prohibit some guava collection APIs and use JDK APIs directly
* reset files that changed by accident
* sort codestyle/druid-forbidden-apis.txt alphabetically
* add PrefixFilteredDimensionSpec for multi-value dimensions
* add docs for PrefixFilteredDimensionSpec
* remove unnecessary null handling
* add null check to the result of NullHandling
* Add optional `name` to top level of FilteredAggregatorFactory
* Add compat constructor for tests
* Address comments
* Add equals and hash code updates
* Rename test
* Fix imports and code style
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
- Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
- Generified `ComplexMetricExtractor`
* Added backpressure metric
* Updated channelReadable to AtomicBoolean and fixed broken test
* Moved backpressure metric logic to NettyHttpClient
* Fix placement of calculating backPressureDuration
Possibly related to https://github.com/apache/incubator-druid/issues/4937
--------
There is currently a race condition in IncrementalIndexStorageAdapter that can lead to exceptions like the following, when running queries with filters on String dimensions that hit realtime tasks:
```
org.apache.druid.java.util.common.ISE: id[5] >= maxId[5]
at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:591)
at org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector$2.matches(StringDimensionIndexer.java:562)
at org.apache.druid.segment.incremental.IncrementalIndexStorageAdapter$IncrementalIndexCursor.advance(IncrementalIndexStorageAdapter.java:284)
```
When the `filterMatcher` is created in the constructor of `IncrementalIndexStorageAdapter.IncrementalIndexCursor`, `StringDimensionIndexer.makeDimensionSelector` gets called eventually, which calls:
```
final int maxId = getCardinality();
...
@Override
public int getCardinality()
{
return dimLookup.size();
}
```
So `maxId` is set to the size of the dictionary at the time that the `filterMatcher` is created.
However, the `maxRowIndex` which is meant to prevent the Cursor from returning rows that were added after the Cursor was created (see https://github.com/apache/incubator-druid/pull/4049) is set after the `filterMatcher` is created.
If rows with new dictionary values are added after the `filterMatcher` is created but before `maxRowIndex` is set, then it is possible for the Cursor to return rows that contain the new values, which will have `id >= maxId`.
This PR sets `maxRowIndex` before creating the `filterMatcher` to prevent rows with unknown dictionary IDs from being passed to the `filterMatcher`.
-----------
The included test triggers the error with a custom Filter + DruidPredicateFactory.
The DimensionSelector for predicate-based filter matching is created here in `Filters.makeValueMatcher`:
```
public static ValueMatcher makeValueMatcher(
final ColumnSelectorFactory columnSelectorFactory,
final String columnName,
final DruidPredicateFactory predicateFactory
)
{
final ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(columnName);
// This should be folded into the ValueMatcherColumnSelectorStrategy once that can handle LONG typed columns.
if (capabilities != null && capabilities.getType() == ValueType.LONG) {
return getLongPredicateMatcher(
columnSelectorFactory.makeColumnValueSelector(columnName),
predicateFactory.makeLongPredicate()
);
}
final ColumnSelectorPlus<ValueMatcherColumnSelectorStrategy> selector =
DimensionHandlerUtils.createColumnSelectorPlus(
ValueMatcherColumnSelectorStrategyFactory.instance(),
DefaultDimensionSpec.of(columnName),
columnSelectorFactory
);
return selector.getColumnSelectorStrategy().makeValueMatcher(selector.getSelector(), predicateFactory);
}
```
The test Filter adds a row to the IncrementalIndex in the test when the predicateFactory creates a new String predicate, after `DimensionHandlerUtils.createColumnSelectorPlus` is called.
* Broker backpressure.
Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.
Fixes#4933.
* Fix query context doc.
* make COMPLEX column filterable in Druid code
* Revert "make COMPLEX column filterable in Druid code"
This reverts commit 9fc6ec768c.
* complex columns can be optionally made filterable
* some types are always filterable
* add ColumnCapabilitiesImpl serde tests
* add SuppresedWarnings annotation
* Rename io.druid to org.apache.druid.
* Fix META-INF files and remove some benchmark results.
* MonitorsConfig update for metrics package migration.
* Reorder some dimensions in inner queries for some reason.
* Fix protobuf tests.
* add subtotalsSpec attribute to groupBy query
* dont sent subtotalsSpec to downstream nodes from broker and other updates
* address review comment
* fix checkstyle issues after merge to master
* add docs for subtotalsSpec feature
* address doc review comments