* Add published segment cache in broker
* Change the DataSegment interner so it's not based on DataSEgment's equals only and size is preserved if set
* Added a trueEquals to DataSegment class
* Use separate interner for realtime and historical segments
* Remove trueEquals as it's not used anymore, change log message
* PR comments
* PR comments
* Fix tests
* PR comments
* Few more modification to
* change the coordinator api
* removeall segments at once from MetadataSegmentView in order to serve a more consistent view of published segments
* Change the poll behaviour to avoid multiple poll execution at same time
* minor changes
* PR comments
* PR comments
* Make the segment cache in broker off by default
* Added a config to PlannerConfig
* Moved MetadataSegmentView to sql module
* Add doc for new planner config
* Update documentation
* PR comments
* some more changes
* PR comments
* fix test
* remove unintentional change, whether to synchronize on lifecycleLock is still in discussion in PR
* minor changes
* some changes to initialization
* use pollPeriodInMS
* Add boolean cachePopulated to check if first poll succeeds
* Remove poll from start()
* take the log message out of condition in stop()
* use SqlLifecyle to manage sql execution, add sqlId
* add sql request logger
* fix UT
* rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc
* add docs and more sql request logger impls
* add UT for http and jdbc
* fix forbidden use of com.google.common.base.Charsets
* fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId
* do not use default method in QueryMetrics interface
* capitalize 'sql' everywhere in the non-property parts of the docs
* use RequestLogger interface to log sql query
* minor bugfixes and add switching request logger
* add filePattern configs for FileRequestLogger
* address review comments, adjust sql request log format
* fix inspection error
* try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider
The documentation for Bound filter's lowerStrict/upperStrict is incorrect. It is not consistent with the examples provided and actual behaviour of the bound filter. Correct this.
Also add a "fromIndex" argument to the strpos expression function. There
are some -1 and +1 adjustment terms due to the fact that the strpos
expression behaves like Java indexOf (0-indexed), but the POSITION SQL
function is 1-indexed.
* add PrefixFilteredDimensionSpec for multi-value dimensions
* add docs for PrefixFilteredDimensionSpec
* remove unnecessary null handling
* add null check to the result of NullHandling
* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header
* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header
o- if Accept header is absent, it defaults to Content-Type header
* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' PR #4033
Minor change to a comment - restoring to previous wording
* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' PR #4033
o- minor change to check for empty string
* Added SystemSchema with following tables (#5989)
* SEGMENTS table provides details on served and published segments
* SERVERS table provides details on data servers
* SERVERSEGMETS table is the JOIN of SEGMENTS and SERVERS
* TASKS table provides details on tasks
* Add documentation for system schema
* Fix static-analysis warnings
* Address PR comments
*Add unit tests
* Fix a test
* Try to fix a test
* Fix a bug around replica count
* rename io.druid to org.apache.druid
* Major change is to make tasks and segment queries streaming
* Made tasks/segments stream to calcite instead of storing it in memory
* Add num_rows to segments table
* Refactor JsonParserIterator
* Replace with closeable iterator
* Fix docs, make num_rows column nullable, some unit test changes
* make num_rows column type long, allow it to be null
fix a compile error after merge, add TrafficCop param to InputStreamResponseHandler
* Filter null rows for segments table from Linq4j enumerable
* change num_replicas datatype to long in segments table
* Fix some tests and address comments
* Doc updates, other PR comments
* Update tests
* Address comments
* Add auth check
* Update docs
* Refactoring
* Fix teamcity warning, change the getQueryableServer in TimelineServerView
* Fix compilation after rebase
* Use the stream API from AuthorizationUtils
* Added LeaderClient interface and NoopDruidLeaderClient class
* Revert "Added LeaderClient interface and NoopDruidLeaderClient class"
This reverts commit 100fa46e39.
* Make the naming consistent to server_segments for the join table
* Add ForbiddenException on auth check failure
* Remove static block from SystemSchema
* Try to fix a test in CalciteQueryTest due to rename of server_segments
* Fix the json output format in the coordinator API
* Add auth check in the segments API
* Add null check to avoid NPE
* Use annonymous class object instead of mock for DruidLeaderClient in SqlBenchmark
* Fix test failures, type long/BIGINT can be nullable
* Revert long nullability to fix tests
* Fix style for tests
* PR comments
* Address PR comments
* Add the missing BytesAccumulatingResponseHandler class
* Use Sequences.withBaggage in DruidPlanner
* Fix docs, add comments
* Close the iterator if hasNext returns false
* Broker backpressure.
Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.
Fixes#4933.
* Fix query context doc.
* add subtotalsSpec attribute to groupBy query
* dont sent subtotalsSpec to downstream nodes from broker and other updates
* address review comment
* fix checkstyle issues after merge to master
* add docs for subtotalsSpec feature
* address doc review comments
* SQL: Support more result formats, add columns header.
- Add result formats for line-based JSON and CSV.
- Add X-Druid-Sql-Columns header with a list of all columns that
the response will contain.
- Add more comprehensive documentation on what callers should expect
when making Druid SQL queries.
* Fix some tests.
* Adjust tests.
* Adjust trailer, add types header.
* Fix trailers.
* Add lastString and firstString aggregators extension
* Remove duplicated class
* Move first-last-string doc page to extensions-contrib
* Fix ObjectStrategy compare method
* Fix doc bad aggregatos type name
* Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery
* Add getMaxStringBytes() method to support JSON serialization
* Fix null pointer exception at segment creation phase when the string value is null
* Control the valueSelector object class on BufferAggregators
* Perform all improvements
* Add java doc on SerializablePairLongStringSerde
* Refactor ObjectStraty compare method
* Remove unused ;
* Add aggregateCombiner unit tests. Rename BufferAggregators unit tests
* Remove unused imports
* Add license header
* Add class name to java doc class serde
* Throw exception if value is unsupported class type
* Move first-last-string extension into druid core
* Update druid core docs
* Fix null pointer exception when pair->string is null
* Add null control unit tests
* Remove unused imports
* Add first/last string folding aggregator on AggregatorsModule to support segment metadata query
* Change SerializablePairLongString to extend SerializablePair
* Change vars from public to private
* Convert vars to primitive type
* Clarify compare comment
* Change IllegalStateException to ISE
* Remove TODO comments
* Control possible null pointer exception
* Add @Nullable annotation
* Remove empty line
* Remove unused parameter type
* Improve AggregatorCombiner javadocs
* Add filterNullValues option at StringLast and StringFirst aggregators
* Add filterNullValues option at agg documentation
* Fix checkstyle
* Update header license
* Fix StringFirstAggregatorFactory.VALUE_COMPARATOR
* Fix StringFirstAggregatorCombiner
* Fix if condition at StringFirstAggregateCombiner
* Remove filterNullValues from string first/last aggregators
* Add isReset flag in FirstAggregatorCombiner
* Change Arrays.asList to Collections.singletonList
* Update defaultHadoopCoordinates in documentation.
To match changes applied in #5382.
* Remove a parameter with defaults from example configuration file.
If it has reasonable defaults, then why would it be in an example config file?
Also, it is yet another place that has been forgotten to be updated and will be forgotten in the future.
Also, if someone is running different hadoop version, then there's much more work to be done than just changing this property, so why give users false hopes?
* Fix typo in documentation.
* Use mergeBuffer instead of processingBuffer in parallelCombiner
* Fix test
* address comments
* fix test
* Fix test
* Update comment
* address comments
* fix build
* Fix test failure
Originally written by @AlexanderSaydakov in druid-io/druid-io.github.io#448.
I also added redirects and updated links to point to the new
datasketches-extension.html landing page for the extension, rather than to
the old page about theta sketches.
* Update post-aggregations.md
I think this is more clear. I am not sure how multiplying by 100 is involved in averaging...
* Update post-aggregations.md
adding additional aggregator
* Update post-aggregations.md
Code changes:
- In the lookup-based extractionFns, inherit injective property from
the lookup itself if not specified.
Doc changes:
- Add a "Query execution" section to the lookups doc explaining how
injective lookups and their optimizations work.
- Remove scary warnings against using registeredLookup extractionFns.
They are necessary and important since they work with filters and
function cascades -- two things that the dimension specs do not do.
They deserve to be first class citizens.
- Move the "registeredLookup" fn above the "lookup" fn. It's probably
more commonly used, so the docs read better this way.
* Add retries for coordinator fetch and lookup start in LookupReferencesManager
* Fix LookupConfigTest
* Address comments
* Address more comments
* And address more comments
* Address comms
* Recognize 'not found' lookups in LookupReferencesManager.tryGetLookupListFromCoordinator(), by @egor-ryashin
* Changes for lookup synchronization
* Refactor of Lookup classes
* Minor refactors and doc update
* Change coordinator instance to be retrieved by DruidLeaderClient
* Wait before thread shutdown
* Make disablelookups flag true by default
* Update docs
* Rename flag
* Move executorservice shutdown to finally block
* Update LookupConfig
* Refactoring and doc changes
* Remove lookup config constructor
* Revert Lookupconfig constructor changes
* Add tests to LookupConfig
* Make executorservice local
* Update LRM
* Move ListeningScheduledExecutorService to ExecutorCompletionService
* Move exception to outer block
* Remove check to see future is done
* Remove unnecessary assignment
* Add logging
* SQL: Upgrade to Calcite 1.14.0, some refactoring of internals.
This brings benefits:
- Ability to do GROUP BY and ORDER BY with ordinals.
- Ability to support IN filters beyond 19 elements (fixes#4203).
Some refactoring of druid-sql internals:
- Builtin aggregators and operators are implemented as SqlAggregators
and SqlOperatorConversions rather being special cases. This simplifies
the Expressions and GroupByRules code, which were becoming complex.
- SqlAggregator implementations are no longer responsible for filtering.
Added new functions:
- Expressions: strpos.
- SQL: TRUNCATE, TRUNC, LENGTH, CHAR_LENGTH, STRLEN, STRPOS, SUBSTR,
and DATE_TRUNC.
* Add missing @Override annotation.
* Adjustments for forbidden APIs.
* Adjustments for forbidden APIs.
* Disable GROUP BY alias.
* Doc reword.
* Move scan-query from a contrib extension into core.
Based on a proposal at: https://groups.google.com/d/topic/druid-development/ME_OatUDnbk/discussion
This patch also adds support for virtual columns to the Scan query,
and updates Druid SQL to use Scan instead of Select.
This patch also makes some behavioral changes to handling of the __time
column. In particular, it is now is returned as "__time" rather than
"timestamp"; it is no longer included if you do not specifically ask for
it in your "columns"; and it is returned as a long rather than a string.
Users can revert time handling to the legacy extension behavior by
setting "legacy" : true in their queries, or setting the property
druid.query.scan.legacy = true. This is meant to provide a migration
path for users that were formerly using the contrib extension.
* Adjustments from review.
* Add back Select query.
* Adjust SQL docs.
* Restore SelectQuery link.
* SQL: Full TRIM support.
- Support trimming arbitrary characters
- Support BOTH, LEADING, and TRAILING
* Remove unused import.
* Fix tests, add RTRIM / LTRIM.
* Remove unused imports.
* BTRIM and docs.
* Replace for with foreach.
* Add "round" option to cardinality and hyperUnique aggregators.
Also turn it on by default in SQL, to make math on distinct counts
work more as expected.
* Fix some compile errors.
* Fix test.
* Formatting.
* Improved SQL support for floats and doubles.
- Use Druid FLOAT for SQL FLOAT, and Druid DOUBLE for SQL DOUBLE, REAL,
and DECIMAL.
- Use float* aggregators when appropriate.
- Add tests involving both float and double columns.
- Adjust documentation accordingly.
* CR comments.
* Fix braces.
* SQL + Expressions = Best friends forever.
- Use expressions as a projection layer for anything that can't be
expressed using traditional Druid extractionFns. Sometimes they're
embedded directly (like "expression" filters, builtin aggregators,
or "expression" post-aggregators). Sometimes they're referenced
through virtual columns (like dimensionSpecs, which can't innately
reference functions of more than one column without the virtual
column layer).
- Add many new functions and operators, taking advantage of the
expression capability (see the querying/sql.md doc).
- Improve consistency of constant reduction and of casting by
using Druid expressions for this instead of Calcite's RexExecutor.
* Fix casting bug, and other code review comments.
* Fix docs.
There result would be {"error"=>"Unknown exception",
"errorMessage"=>nil, "errorClass"=>"java.lang.NullPointerException",
"host"=>nil} when the json lack of 『granularity』.
* coordinator lookups mgmt improvements
* revert replaces removal, deprecate it instead
* convert and use older specs stored in db
* more tests and updates
* review comments
* add behavior for 0.10.0 to 0.9.2 downgrade
* incorporating more review comments
* remove explicit lock and use LifecycleLock in LookupReferencesManager. use LifecycleLock in LookupCoordinatorManager as well
* wip on LookupCoordinatorManager
* lifecycle lock
* refactor thread creation into utility method
* more review comments addressed
* support smooth roll back of lookup snapshots from 0.10.0 to 0.9.2
* correctly use LifecycleLock in LookupCoordinatorManager and remove synchronization from start/stop
* run lookup mgmt on leader coordinator only
* wip: changes to do multiple start() and stop() on LookupCoordinatorManager
* lifecycleLock fix usage in LookupReferencesManagerTest
* add LifecycleLock back
* fix license hdr
* some fixes
* make LookupReferencesManager.getAllLookupsState() consistent while still being lockless
* address review comments
* addressing leventov's comments
* address charle's comments
* add IOE.java
* for safety in LookupReferencesManager mainThread check for lifecycle started state on each loop in addition to interrupt
* move thread creation utility method to Execs
* fix names
* add tests for LookupCoordinatorManager.lookupManagementLoop()
* add further tests for figuring out toBeLoaded and toBeDropped on LookupCoordinatorManager
* address leventov comments
* remove LookupsStateWithMap and parameterize LookupsState
* address review comments
* address more review comments
* misc fixes
Updating the description of useCache
Updating query-context doc based on Gian's comment
Updating query-context doc based on Gian's comment
Updating query-context doc based on Gian's comment
Updating query-context doc based on Gian's comment
* Make timeout behavior consistent to document
* Refactoring BlockingPool and add more methods to QueryContexts
* remove unused imports
* Addressed comments
* Address comments
* remove unused method
* Make default query timeout configurable
* Fix test failure
* Change timeout from period to millis
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.
Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".
Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
be more accurate.
* Remove unused import.
* Restore buffer size.