Commit Graph

87 Commits

Author SHA1 Message Date
Martin Grigorov 9e6978609a
Add TravisCI job that builds and tests on ARM64 CPU architecture (#10562) 2020-11-16 21:08:43 +05:30
Maytas Monsereenusorn 1b9a8c4687
Fix compaction integration test CI timeout (#10517)
* fix flaky IT Compaction test

* fix flaky IT Compaction test

* test

* test

* test

* test

* Fix compaction integration test CI timeout

* address comments

* test

* test

* Add print logs

* add error msg

* add taskId to logging
2020-10-21 22:38:11 -07:00
Jihoon Son 8657b23ab2
Integration tests and docs for auto compaction with different partitioning (#10354)
* Working

* add test

* doc

* fix test

* split other integration test

* exclude other-index from other tests

* doc anchor fix

* adjust task slots and number of merge tasks

* spell check

* reduce maxNumConcurrentSubTasks to 1

* maxNumConcurrentSubtasks for range partitinoing

* reduce memory for historical

* change group name
2020-09-15 11:28:09 -07:00
Chi Cao Minh 5751d0edc1
Skip coverage check for tag builds (#10397)
The code coverage diff calculation assumes the TRAVIS_BRANCH environment
variable is the name of a branch; however, for tag builds it is the name
of the tag so the diff calculation fails. Since builds triggered by tags
do not have a code diff, the coverage check should be skipped to avoid
the error and to save some CI resources.
2020-09-14 19:46:33 -07:00
Jihoon Son 6fdce36e41
Add integration tests for query retry on missing segments (#10171)
* Add integration tests for query retry on missing segments

* add missing dependencies; fix travis conf

* address comments

* Integration tests extension

* remove unused dependency

* remove druid_main

* fix java agent port
2020-07-22 22:30:35 -07:00
Maytas Monsereenusorn dd7a32ad48
Fix ITSqlInputSourceTest (#10194)
* Fix ITSqlInputSourceTest.java

* Fix ITSqlInputSourceTest.java

* Fix ITSqlInputSourceTest.java

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2020-07-21 09:52:13 -07:00
Maytas Monsereenusorn 0cabc53bd5
Add integration tests for Appends (#10186)
* append test

* add append IT

* fix checkstyle

* fix checkstyle

* Remove parallel

* fix checkstyle

* fix

* fix

* address comments

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2020-07-20 13:43:13 -07:00
Maytas Monsereenusorn 4e8570b71b
Add integration tests for all InputFormat (#10088)
* Add integration tests for Avro OCF InputFormat

* Add integration tests for Avro OCF InputFormat

* add tests

* fix bug

* fix bug

* fix failing tests

* add comments

* address comments

* address comments

* address comments

* fix test data

* reduce resource needed for IT

* remove bug fix

* fix checkstyle

* add bug fix
2020-07-08 12:50:29 -07:00
frank chen 60c6bd5b4c
support Aliyun OSS service as deep storage (#9898)
* init commit, all tests passed

* fix format

Signed-off-by: frank chen <frank.chen021@outlook.com>

* data stored successfully

* modify config path

* add doc

* add aliyun-oss extension to project

* remove descriptor deletion code to avoid warning message output by aliyun client

* fix warnings reported by lgtm-com

* fix ci warnings

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix errors reported by intellj inspection check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc spelling check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix dependency warnings reported by ci

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix warnings reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add package configuration to support showing extension info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add IT test cases and fix bugs

Signed-off-by: frank chen <frank.chen021@outlook.com>

* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add license info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc

Signed-off-by: frank chen <frank.chen021@outlook.com>

* exclude execution of IT testcases of OSS extension from CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* put the extensions under contrib group and add to distribution

* fix names in test cases

* add unit test to cover OssInputSource

* fix names in test cases

* fix dependency problem reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-01 22:20:53 -07:00
Suneet Saldanha b7d771f633
More prominent instructions on code coverage failure (#10060)
* More prominent instructions on code coverage failure

* Update .travis.yml
2020-06-25 19:48:30 -07:00
Chi Cao Minh 3d513b0bec
Adjust code coverage check (#9969)
Since there is not currently a good way to have fine-grain code coverage
check exclusions, lower the coverage thresholds to make the check more
lenient for now. Also, display the code coverage report in the Travis CI
logs to make it easier to understand how to improve coverage.
2020-06-02 15:34:58 -07:00
Chi Cao Minh b8c2266aa0
Disable function code coverage check (#9933)
As observed in https://github.com/apache/druid/pull/9905 and
https://github.com/apache/druid/pull/9915, the function code coverage
check flags false positive issues, so it should be disabled for now.
2020-05-26 20:13:08 -07:00
Chi Cao Minh 427239f451
Enforce code coverage (#9863)
* Enforce code coverage

Add an automated way of checking if new code has adequate unit tests,
since merging code coverage reports and check coverage thresholds via
coveralls or codecov is unreliable.

The following minimum unit test code coverage is now enforced:
- 80% functions
- 65% branch
- 65% line

Branch and line coverage thresholds are slightly lower for now as they
are harder to achieve.

After the code coverage check looks reliable, the thresholds can be
increased later if needed.

* Add comments
2020-05-20 09:31:37 -07:00
Jonathan Wei 61295bd002
More Hadoop integration tests (#9714)
* More Hadoop integration tests

* Add missing s3 instructions

* Address PR comments

* Address PR comments

* PR comments

* Fix typo
2020-04-30 14:33:01 -07:00
Jihoon Son 39722bd064
Integration tests for stream ingestion with various data formats (#9783)
* Integration tests for stream ingestion with various data formats

* fix npe

* better logging; fix tsv

* fix tsv

* exclude kinesis from travis

* some readme
2020-04-29 13:18:01 -07:00
Maytas Monsereenusorn 16f5ae4405
Add integration tests for kafka ingestion (#9724)
* add kafka admin and kafka writer

* refactor kinesis IT

* fix typo refactor

* parallel

* parallel

* parallel

* parallel works now

* add kafka it

* add doc to readme

* fix tests

* fix failing test

* test

* test

* test

* test

* address comments

* addressed comments
2020-04-22 10:43:34 -07:00
Jihoon Son 6a52bdc605
Skip license check for dependency reduced pom files (#9687) 2020-04-11 18:11:53 -07:00
Chi Cao Minh 84c1c2505d
Web console basic end-to-end-test (#9595)
Load data and query (i.e., automate
https://druid.apache.org/docs/latest/tutorials/tutorial-batch.html) to
have some basic checks ensuring the web console is wired up to druid
correctly.

The new end-to-end tests (tutorial-batch.spec.ts) are added to
`web-console/e2e-tests`. Within that directory:
- `components` represent the various tabs of the web console. Currently,
  abstractions for `load data`, `ingestion`, `datasources`, and `query`
  are implemented.
- `components/load-data/data-connector` contains abstractions for the
  different data source options available to the data loader's `Connect`
  step. Currently, only the `Local file` data source connector is
  implemented.
- `components/load-data/config` contains abstractions for the different
  configuration options available for each step of the data loader flow.
  Currently, the `Configure Schema`, `Partition`, and `Publish` steps
  have initial implementation of their configuration options.
- `util` contains various helper methods for the tests and does not
  contain abstractions of the web console.

Changes to add the new tests to CI:
- `.travis.yml`: New "web console end-to-end tests" job
- `web-console/jest.*.js`: Refactor jest configurations to have
  different flavors for unit tests and for end-to-end tests. In
  particular, the latter adds a jest setup configuration to wait for the
  web console to be ready (`web-console/e2e-tests/util/setup.ts`).
- `web-console/package.json`: Refactor run scripts to add new script for
  running end-to-end tests.
- `web-console/script/druid`: Utility scripts for building, starting,
  and stopping druid.

Other changes:
- `pom.xml`: Refactor various settings disable java static checks and to
  disable java tests into two new maven profiles. Since the same
  settings are used in several places (e.g., .travis.yml, Dockerfiles,
  etc.), having them in maven profiles makes it more maintainable.
- `web-console/src/console-application.tsx`: Fix typo ("the the").
2020-04-09 12:38:09 -07:00
Maytas Monsereenusorn 1852bf33ea
Add Integration Test for functionality of kinesis ingestion (#9576)
* kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* fix kinesis timeout

* Kinesis IT

* Kinesis IT

* fix checkstyle

* Kinesis IT

* address comments

* fix checkstyle
2020-04-03 09:45:22 -07:00
Maytas Monsereenusorn 3f521943fc
S3 ingestion spec should not uses the default credentials provider chain when environment value password provider is misconfigured. (#9552)
* fix s3 optional cred

* S3 ingestion spec uses the default credentials provider chain when environment value password provider is misconfigured.

* fix failing test
2020-03-24 15:09:02 -07:00
Maytas Monsereenusorn 5f127a1829
Add integration tests for HDFS (#9542)
* HDFS IT

* HDFS IT

* HDFS IT

* fix checkstyle
2020-03-20 15:46:08 -07:00
Maytas Monsereenusorn 4c620b8f1c
Adding s3, gcs, azure integration tests (#9501)
* exclude pulling s3 segments for tests that doesnt need it

* fix script

* fix script

* fix script

* add s3 test

* refactor sample data script

* add tests

* add tests

* add license header

* fix failing tests

* change bucket and path to config

* update integration test readme

* fix typo
2020-03-17 03:08:44 -07:00
Maytas Monsereenusorn 9231f2acb3
Integration test compile with Java 8 and run with Java 8 and 11 (#9491)
* test integration compile with 8 and run with 11

* Integration test compile with Java 8 and run with Java 8 and 11
2020-03-11 09:22:27 -07:00
Chi Cao Minh e7eb45e648
Run IntelliJ inspections on Travis (#9179)
* Run IntelliJ inspections on Travis

Running IntelliJ inspections currently takes about 90 minutes, but they
can be run in about 30 minutes on Travis.

* Restore assert statements
2020-02-19 11:34:19 +03:00
Maytas Monsereenusorn 31528bcdaf
Integration tests for JDK 11 (#9249)
* Integration tests for JDK 11

* fix vm option

* fix superviosrd

* fix pom

* add integration tests for java 11

* add logs

* update docs

* Update dockerfile to ack AdoptOpenJdk for Java 11 install commands
2020-02-12 16:36:31 -08:00
Chi Cao Minh a5c49cc4bd
Change security vulnerability scan to cron job (#9340)
* Change security vulnerability scan to cron job

Previously, when new CVEs were reported, the security vulnerability scan
would unfortunately block PRs that did not modify any dependencies. To
prevent this issue, the security scan is now run as a Travis cron job
that runs on master and notifies the druid dev list if it fails. The
security scan has also been added to the "apache-release" maven profile,
to ensure that it passes before a release.

Also adjusted some Travis CI job failure help messages to not be folded
in the Travis CI job logs.

* Dedup plugin configuration definition
2020-02-11 13:43:08 -08:00
Chi Cao Minh bab78fc80e Parallel indexing single dim partitions (#8925)
* Parallel indexing single dim partitions

Implements single dimension range partitioning for native parallel batch
indexing as described in #8769. This initial version requires the
druid-datasketches extension to be loaded.

The algorithm has 5 phases that are orchestrated by the supervisor in
`ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`.
These phases and the main classes involved are described below:

1) In parallel, determine the distribution of dimension values for each
   input source split.

   `PartialDimensionDistributionTask` uses `StringSketch` to generate
   the approximate distribution of dimension values for each input
   source split. If the rows are ungrouped,
   `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter`
   uses a Bloom filter to skip rows that would be grouped. The final
   distribution is sent back to the supervisor via
   `DimensionDistributionReport`.

2) The range partitions are determined.

   In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the
   supervisor uses `StringSketchMerger` to merge the individual
   `StringSketch`es created in the preceding phase. The merged sketch is
   then used to create the range partitions.

3) In parallel, generate partial range-partitioned segments.

   `PartialRangeSegmentGenerateTask` uses the range partitions
   determined in the preceding phase and
   `RangePartitionCachingLocalSegmentAllocator` to generate
   `SingleDimensionShardSpec`s.  The partition information is sent back
   to the supervisor via `GeneratedGenericPartitionsReport`.

4) The partial range segments are grouped.

   In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`,
   the supervisor creates the `PartialGenericSegmentMergeIOConfig`s
   necessary for the next phase.

5) In parallel, merge partial range-partitioned segments.

   `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to
   retrieve the partial range-partitioned segments generated earlier and
   then merges and publishes them.

* Fix dependencies & forbidden apis

* Fixes for integration test

* Address review comments

* Fix docs, strict compile, sketch check, rollup check

* Fix first shard spec, partition serde, single subtask

* Fix first partition check in test

* Misc rewording/refactoring to address code review

* Fix doc link

* Split batch index integration test

* Do not run parallel-batch-index twice

* Adjust last partition

* Split ITParallelIndexTest to reduce runtime

* Rename test class

* Allow null values in range partitions

* Indicate which phase failed

* Improve asserts in tests
2019-12-09 23:05:49 -08:00
Chi Cao Minh af74acaa85 Address security vulnerabilities CVSS >= 7 (#8980)
* Address security vulnerabilities CVSS >= 7

Update dependencies to address security vulnerabilities with CVSS scores
of 7 or higher. A new Travis CI job is added to prevent new
high/critical security vulnerabilities from being added.

Updated dependencies:
- api-util 1.0.0 -> 1.0.3
- jackson 2.9.10 -> 2.10.1
- kafka 2.1.0 -> 2.1.1
- libthrift 0.10.0 -> 0.13.0
- protobuf 3.2.0 -> 3.11.0

The following high/critical security vulnerabilities are currently
suppressed (so that the new Travis CI job can be added now) and are left
as future work to fix:
- hibernate-validator:5.2.5
- jackson-mapper-asl:1.9.13
- libthrift:0.6.1
- netty:3.10.6
- nimbus-jose-jwt:4.41.1

* Rename EDL1 license file

* Fix inspection errors
2019-12-05 14:34:35 -08:00
Chi Cao Minh 7dcbaca658 Spellcheck docs (#8548)
* Spellcheck docs

Fix spelling mistakes in docs and add CI job for running spellcheck on
docs.

* Add missing license header
2019-09-17 12:47:30 -07:00
Chi Cao Minh 5f61374cb3 Fix dependency analyze warnings (#8230)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports and
updated druid-forbidden-apis to prevent regressions.

* Address review comments

* Adjust scope for org.glassfish.jaxb:jaxb-runtime

* Fix dependencies for hdfs-storage

* Consolidate netty4 versions
2019-09-09 14:37:21 -07:00
Clint Wylie 984958122b
packaging script adjustments (#8436)
* set encoding for license and notice scripts, split generate-license.py into generate-binary-license.py and check-licenses.py, check-licenses when -Papache-release is used

* missing docs

* doc fix

* more doc fix

* remove comments

* good catch travis +1

* fix lgtm alerts
2019-08-29 23:27:43 -07:00
Xavier Léauté 4b69ce0f09 enable unit tests with JDK11 (#8400)
* enable unit tests with JDK11

This enables unit tests with openjdk11, splitting up the build into
stages to have it fail faster

The integration test docker image still uses openjdk8, so there is
little reason to run those tests with JDK11 yet

* remove stages
2019-08-28 10:29:13 -07:00
Chi Cao Minh 31e6280b75 Use Codecov (#8388)
* Use Codecov

Upload coverage reports to Codecov. For now, having Codecov comment on
PRs or enforcing a minimum coverage threshold are both disabled until
the Codecov coverage reports look reliable:
https://codecov.io/gh/apache/incubator-druid

* Split bash and curl into separate lines
2019-08-28 08:49:30 -07:00
Chi Cao Minh d8b81f4fd9 Speed up javascript Travis CI jobs (#8361)
* Speed up javascript Travis CI jobs

Skip mvn install for javascript CI jobs since it is not needed.

* Specify base filepath for source files

* Remove coveralls
2019-08-27 12:03:48 -07:00
Chi Cao Minh 2383d9e522 Disable coveralls (#8382)
The coveralls code coverage reports inaccurate coverage for our parallel
builds. Disable it until it can be fixed or a better alternative can be
found.
2019-08-23 08:05:37 -07:00
Chi Cao Minh 64f3c6588b Add coveralls parallel builds webhook (#8374)
Since we use parallel builds for unit tests, follow the instructions for
properly merging the results on coveralls:
https://docs.coveralls.io/parallel-build-webhook
2019-08-22 12:50:00 -07:00
Clint Wylie 010f70b371
autogenerate NOTICE.BINARY from NOTICE and licenses.yaml (#8306)
* migrate binary notice entries to live in licenses.yaml, use licenses.yaml and NOTICE to generate NOTICE.BINARY at distribution time

* +x

* move release scripts to distribution/bin, fixup notice script, trim dependencies for avro and kerberos in licenses.yaml

* add missing hdfs-storage dependencies

* revert to old syntax, fixes

* formatting

* update notices for recently updated dependencies
2019-08-21 12:46:27 -07:00
Gian Merlino d007477742
Docusaurus build framework + ingestion doc refresh. (#8311)
* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes
2019-08-20 21:48:59 -07:00
Chi Cao Minh 6fa22f6939 Enable code coverage (#8303)
* Enable code coverage

Code coverage was disabled via
https://github.com/apache/incubator-druid/pull/3122 due to an issue with
cobertura in Travis CI. Switch code coverage tool from cobertura to
jacoco to avoid issue and re-enable coveralls for Travis CI.

* Exclude non-production code

* Exclude benchmark generated code

* Exclude DruidTestRunnerFactory
2019-08-20 15:36:19 -07:00
Chi Cao Minh ccd87d2667 Speedup package check job (#8282)
The package check job sometimes hits the 50 minute Travis CI job time
limit. Move license checking tasks in "package check" job to "license
check" job to rebalance the job runtime (the "license check" job
currently takes about 1 minute). Moving the logic from build.sh to
.travis.yml also gives more visibility into how long each step takes
(i.e., generate-license-dependency-reports.py vs generate-license.py).
2019-08-12 13:25:08 -07:00
Chi Cao Minh b359c5b3d9 Fix SIGAR dependency connection timeout (#8258)
After enabling parallel builds for "mvn install", the sigar dependency
would sometimes resolve to the incorrect artifact repo for some of the
maven modules. This issue seems to be fixed by moving the definition of
the sigar dependency's artifact repo to the root POM.

Also, depending on network speeds, "mvn -q install" may take longer than
the default 10 minute timeout to print any output. Use travis_wait to
extend the timeout to 15 minutes.
2019-08-08 20:13:18 -05:00
Chi Cao Minh 05b44e3467 Speedup Travis CI jobs (#8240)
Reorganize Travis CI jobs into smaller faster (and more) jobs. Add
various maven options to skip unnecessary work and refactored Travis CI
job definitions to follow DRY.

Detailed changes:

.travis.yml
- Refactor build logic to get rid of copy-and-paste logic
- Skip static checks and enable parallelism for maven install
- Split static analysis into different jobs to ease triage
- Use "name" attribute instead of NAME environment variable
- Split "indexing" and "web console" out of "other modules test"
- Split 2 integration test jobs into multiple smaller jobs

build.sh
- Enable parallelism
- Disable more static checks

travis_script_integration.sh
travis_script_integration_part2.sh
integration-tests/README.md
- Use TestNG groups instead of shell scripts and move definition of jobs
  into Travis CI yaml

integration-tests/pom.xml
- Show elapsed time of individual tests to aid in future rebalancing of
  Travis CI integration test jobs run time

TestNGGroup.java
- Use TestNG groups to make it easy to have multiple Travis CI
  integration test jobs. TestNG groups also make it easier to have an
  "other" integration test group and make it less likely a test will
  accidentally not be included in a CI job.

IT*Test.java
AbstractITBatchIndexTest.java
AbstractKafkaIndexerTest.java
- Add TestNG group
- Fix various IntelliJ inspection warnings
- Reduce scope of helper methods since the TestNG group annotation on
  the class makes TestNG consider all public methods as test methods

pom.xml
- Allow enforce plugin to be run from command-line
- Bump resources plugin version so that "[debug] execute contextualize"
  output is correctly suppressed by "mvn -q"
- Bump exec plugin version so that skip property is renamed from "skip"
  to "exec.skip"

web-console/pom.xml
- Add property to allow disabling javascript-related work. This property
  is overridden in Travis CI to speed up the jobs.
2019-08-07 09:52:42 -07:00
Chi Cao Minh ab71a2e1e4 Revert "Fix dependency analyze warnings (#8128)" (#8189)
This reverts commit 5dd0d8e873.
2019-07-29 11:42:16 -07:00
Chi Cao Minh 5dd0d8e873 Fix dependency analyze warnings (#8128)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.

* Fix licenses and dependencies

* Fix licenses and dependencies again

* Fix integration test dependency

* Address review comments

* Fix unit test dependencies

* Fix integration test dependency

* Fix integration test dependency again

* Fix integration test dependency third time

* Fix integration test dependency fourth time

* Fix compile error

* Fix assert package
2019-07-26 10:49:03 -07:00
Gian Merlino ffa25b7832
Query vectorization. (#6794)
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.
2019-07-12 12:54:07 -07:00
Jihoon Son 0a3538b569 Fix license check in travis and make it optional (#8049)
* Fix license check in travis and make it optional

* debug

* fix build

* too loud maven

* move MAVEN_OPTS to top and add comments

* adjust script

* remove mvn option from python script
2019-07-09 19:35:29 -07:00
Jihoon Son 12f12676e3
Binary license management system (#7998)
* Binary license management system

* add missing file

* add comment

* Address comments

* print missing licenses

* print druid module name

* Add missing licenses and update versions

* fix library versions and add missing ones. also fix pom.xml

* testing multi thread

* Parallel report generation

* fix build error

* install pyyaml and use old api

* install python3

* fix travis script

* python3.6

* pip

* setuptools

* python3-setuptools

* address comment

* error on not found reports or registered licenses

* removed licenses

* debug

* travis debug

* add missing licenses

* travis debug

* debug

* remove debug code

* test build script

* travis debug

* still debug

* add missing python lib

* debug

* debug

* fix travis

* fix travis

* debug travis

* flush print

* print something more to keep travis alive

* adjust print

* single threaded

* single threaded

* debug

* debug

* remove debug

* remove deprecated-2017Q4 from travis conf

* remove comments and duplicate sudo
2019-07-08 12:24:51 -07:00
Khwunchai Jaengsawang fb56f8d53c Rename io.druid to org.druid in tdigestsketch extension (#7996)
* Rename io.druid to org.druid in tdigestsketch extension

* Fix typo

Signed-off-by: Khwunchai Jaengsawang <khwunchai.j@ku.th>

* Add packaging check for community extensions

Signed-off-by: Khwunchai Jaengsawang <khwunchai.j@ku.th>
2019-07-01 12:46:35 -07:00
Fokko Driesprong 48f20fe754 Add Spotbugs (#7894)
* Add Spotbugs

Exclude all the issues for now, so we can add them one by one.

(cherry picked from commit ceda4754dc8c703d1e0de85b48cd5f5409cfd5b7)

* Add additional rules to the list

* More rules

* More rules

* Add comments to the xml

* Move the spotbugs-exclude.xml to codestyle/
2019-06-20 21:06:52 +03:00
Jihoon Son d00a9676b7 Set aws.region for unit tests automatically (#7868)
* Set aws.region for unit tests automatically

* Update README.template
2019-06-14 15:34:21 -07:00