* optimize FileWriteOutBytes to avoid high sys cpu
* optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException
* optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size
* Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size"
This reverts commit 965f7421
* Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException"
This reverts commit 149e08c0
* optimize FileWriteOutBytes to avoid high sys cpu -- avoid IOEception never thrown check
* Fix size counting to handle IOE in FileWriteOutBytes + tests
* remove unused throws IOException in WriteOutBytes.size()
* Remove redundant throws IOExcpetion clauses
* Parameterize IndexMergeBenchmark
Co-authored-by: huanghui.bigrey <huanghui.bigrey@bytedance.com>
Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io>
- Reorder both the datasource and query-execution page orderings to
table, lookup, union, inline, query, join. (Roughly increasing order
of conceptual "fanciness".)
- Add more crosslinks from datasource page to query-execution page:
one per datasource type.
* add kafka admin and kafka writer
* refactor kinesis IT
* fix typo refactor
* parallel
* parallel
* parallel
* parallel works now
* add kafka it
* add doc to readme
* fix tests
* fix failing test
* test
* test
* test
* test
* address comments
* addressed comments
* fixes for inline subqueries when multi-value dimension is present
* fix test
* allow missing capabilities for vectorized group by queries to be treated as single dims since it means that column doesnt exist
* add comment
* Refresh query docs.
Larger changes:
- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.
Smaller changes:
- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.
* Updates.
* Fix numbering.
* Fix glitches.
* Add new words to spellcheck file.
* Assorted changes.
* Further adjustments.
* Add missing punctuation.
* Add API to trigger a compaction by the coordinator for integration tests
* Add missing integration tests for the compaction by the coordinator
* address comments
web-console/e2e-tests/tutorial-batch.spec.ts would occasionally timeout
between the transition from the data loader "configure schema" and
"partition" steps due to missing waits when toggling the rollup setting.
Also, fix shellcheck warnings for script/druid.
* Test file format extensions for inputSource (orc, parquet)
* Test file format extensions for inputSource (orc, parquet)
* fix path
* resolve merge conflict
* fix typo
* fix issue with group by limit pushdown for extractionFn, expressions, joins, etc
* remove unused
* fix test
* revert unintended change
* more tests
* consider capabilities for StringGroupByColumnSelectorStrategy
* fix test
* fix and more test
* revert because im scared
* Fix off-by-one in IndexedTableJoinMatcher.getCardinality.
It would report a cardinality that is one lower than the actual cardinality.
The missing value is the phantom null that can be generated by outer joins.
* Fix tests.
ApproximateHistogram - seems unlikely
SegmentAnalyzer - unclear if this is an actual issue
GenericIndexedWriter - unclear if this is an actual issue
IncrementalIndexRow and OnheapIncrementalIndex are non-issues becaus it's very
unlikely for the number of dims to be large enough to hit the overflow
condition
* Remove no-op assert statement
The assert statement in ClientQuerySegmentWalker will always be true because
of the preceeding while loop which has the same condition.
This change removes dead code to fix an error reported by LGTM
* Suppress lgtm
* cleanup whitespace
* IntelliJ inspections cleanup
* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type
* fix warnings in test code
* more test fixes
* remove string concatenation inspection error
* fix extra curly brace
* cleanup AzureTestUtils
* fix charsets for RangerAdminClient
* review comments
This change fixes a potential integer overflow in BufferArrayGrouper that
was flagged by LGTM. It also adds a check that the vectorized arrays are
initialized before aggregateVector is called.
The changes in HashTableUtils should not have any effect since the numbers
being multiplied are small, but the change will remove the warnings from
being flagged in LGTM.
* SQL: More straightforward handling of join planning.
Two changes that simplify how joins are planned:
1) Stop using JoinProjectTransposeRule as a way of guiding subquery
removal. Instead, add logic to DruidJoinRule that identifies removable
subqueries and removes them at the point of creating a DruidJoinQueryRel.
This approach reduces the size of the planning space and allows the
planner to complete quickly.
2) Remove rules that reorder joins. Not because of an impact on the
planning time (it seems minimal), but because the decisions that the
planner was making in the new tests were sometimes worse than the
user-provided order. I think we'll need to go with the user-provided
order for now, and revisit reordering when we can add more smarts to
the cost estimator.
A third change updates numeric ExprEval classes to store their
value as a boxed type that corresponds to what it is supposed to be.
This is useful because it affects the behavior of "asString", and
is included in this patch because it is needed for the new test
"testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This
test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an
actual double would.
Fixes#9646.
* Fix comments.
* Fix tests.
Load data and query (i.e., automate
https://druid.apache.org/docs/latest/tutorials/tutorial-batch.html) to
have some basic checks ensuring the web console is wired up to druid
correctly.
The new end-to-end tests (tutorial-batch.spec.ts) are added to
`web-console/e2e-tests`. Within that directory:
- `components` represent the various tabs of the web console. Currently,
abstractions for `load data`, `ingestion`, `datasources`, and `query`
are implemented.
- `components/load-data/data-connector` contains abstractions for the
different data source options available to the data loader's `Connect`
step. Currently, only the `Local file` data source connector is
implemented.
- `components/load-data/config` contains abstractions for the different
configuration options available for each step of the data loader flow.
Currently, the `Configure Schema`, `Partition`, and `Publish` steps
have initial implementation of their configuration options.
- `util` contains various helper methods for the tests and does not
contain abstractions of the web console.
Changes to add the new tests to CI:
- `.travis.yml`: New "web console end-to-end tests" job
- `web-console/jest.*.js`: Refactor jest configurations to have
different flavors for unit tests and for end-to-end tests. In
particular, the latter adds a jest setup configuration to wait for the
web console to be ready (`web-console/e2e-tests/util/setup.ts`).
- `web-console/package.json`: Refactor run scripts to add new script for
running end-to-end tests.
- `web-console/script/druid`: Utility scripts for building, starting,
and stopping druid.
Other changes:
- `pom.xml`: Refactor various settings disable java static checks and to
disable java tests into two new maven profiles. Since the same
settings are used in several places (e.g., .travis.yml, Dockerfiles,
etc.), having them in maven profiles makes it more maintainable.
- `web-console/src/console-application.tsx`: Fix typo ("the the").
* Document possible vulnerabilities for the druid-ranger-security
In certain configurations the ranger plugin can expose vulnerabilities due
to some of its dependencies having CVEs.
* Spelling checker is a bit tight