druid/.travis.yml
Chi Cao Minh bab78fc80e Parallel indexing single dim partitions (#8925)
* Parallel indexing single dim partitions

Implements single dimension range partitioning for native parallel batch
indexing as described in #8769. This initial version requires the
druid-datasketches extension to be loaded.

The algorithm has 5 phases that are orchestrated by the supervisor in
`ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`.
These phases and the main classes involved are described below:

1) In parallel, determine the distribution of dimension values for each
   input source split.

   `PartialDimensionDistributionTask` uses `StringSketch` to generate
   the approximate distribution of dimension values for each input
   source split. If the rows are ungrouped,
   `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter`
   uses a Bloom filter to skip rows that would be grouped. The final
   distribution is sent back to the supervisor via
   `DimensionDistributionReport`.

2) The range partitions are determined.

   In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the
   supervisor uses `StringSketchMerger` to merge the individual
   `StringSketch`es created in the preceding phase. The merged sketch is
   then used to create the range partitions.

3) In parallel, generate partial range-partitioned segments.

   `PartialRangeSegmentGenerateTask` uses the range partitions
   determined in the preceding phase and
   `RangePartitionCachingLocalSegmentAllocator` to generate
   `SingleDimensionShardSpec`s.  The partition information is sent back
   to the supervisor via `GeneratedGenericPartitionsReport`.

4) The partial range segments are grouped.

   In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`,
   the supervisor creates the `PartialGenericSegmentMergeIOConfig`s
   necessary for the next phase.

5) In parallel, merge partial range-partitioned segments.

   `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to
   retrieve the partial range-partitioned segments generated earlier and
   then merges and publishes them.

* Fix dependencies & forbidden apis

* Fixes for integration test

* Address review comments

* Fix docs, strict compile, sketch check, rollup check

* Fix first shard spec, partition serde, single subtask

* Fix first partition check in test

* Misc rewording/refactoring to address code review

* Fix doc link

* Split batch index integration test

* Do not run parallel-batch-index twice

* Adjust last partition

* Split ITParallelIndexTest to reduce runtime

* Rename test class

* Allow null values in range partitions

* Indicate which phase failed

* Improve asserts in tests
2019-12-09 23:05:49 -08:00

327 lines
13 KiB
YAML

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
language: java
sudo: true
dist: xenial
jdk:
- openjdk8
cache:
directories:
- $HOME/.m2
env:
global:
- DOCKER_IP=127.0.0.1 # for integration tests
- MVN="mvn -B"
- > # Various options to make execution of maven goals faster (e.g., mvn install)
MAVEN_SKIP="
-Danimal.sniffer.skip=true
-Dcheckstyle.skip=true
-Ddruid.console.skip=true
-Denforcer.skip=true
-Dforbiddenapis.skip=true
-Dmaven.javadoc.skip=true
-Dpmd.skip=true
-Dspotbugs.skip=true
"
- MAVEN_SKIP_TESTS="-DskipTests -Djacoco.skip=true"
# Add various options to make 'mvn install' fast and skip javascript compile (-Ddruid.console.skip=true) since it is not
# needed. Depending on network speeds, "mvn -q install" may take longer than the default 10 minute timeout to print any
# output. To compensate, use travis_wait to extend the timeout.
install: MAVEN_OPTS='-Xmx3000m' travis_wait 15 ${MVN} clean install -q -ff ${MAVEN_SKIP} ${MAVEN_SKIP_TESTS} -T1C
jobs:
include:
- name: "animal sniffer checks"
script: ${MVN} animal-sniffer:check --fail-at-end
- name: "checkstyle"
script: ${MVN} checkstyle:checkstyle --fail-at-end
- name: "enforcer checks"
script: ${MVN} enforcer:enforce --fail-at-end
- name: "forbidden api checks"
script: ${MVN} forbiddenapis:check forbiddenapis:testCheck --fail-at-end
- name: "pmd checks"
script: ${MVN} pmd:check --fail-at-end # TODO: consider adding pmd:cpd-check
- name: "spotbugs checks"
script: ${MVN} spotbugs:check --fail-at-end -pl '!benchmarks'
- name: "license checks"
install: skip
before_script: &setup_generate_license
- sudo apt-get update && sudo apt-get install python3 python3-pip python3-setuptools -y
- pip3 install wheel # install wheel first explicitly
- pip3 install pyyaml
script:
- >
${MVN} apache-rat:check -Prat --fail-at-end
-Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn
-Drat.consoleOutput=true
# Generate dependency reports and checks they are valid. When running on Travis CI, 2 cores are available
# (https://docs.travis-ci.com/user/reference/overview/#virtualisation-environment-vs-operating-system).
- mkdir -p target
- distribution/bin/generate-license-dependency-reports.py . target --clean-maven-artifact-transfer --parallel 2
- distribution/bin/check-licenses.py licenses.yaml target/license-reports
- &compile_strict
name: "(openjdk8) strict compilation"
install: skip
# Strict compilation requires more than 2 GB
script: >
MAVEN_OPTS='-Xmx3000m' ${MVN} clean -Pstrict compile test-compile --fail-at-end
-pl '!benchmarks' ${MAVEN_SKIP} ${MAVEN_SKIP_TESTS}
- name: "analyze dependencies"
script: MAVEN_OPTS='-Xmx3000m' ${MVN} ${MAVEN_SKIP} dependency:analyze -DoutputXML=true -DignoreNonCompile=true -DfailOnWarning=true
after_failure: |-
echo "FAILURE EXPLANATION:
The dependency analysis has found a dependency that is either:
1) Used and undeclared: These are available as a transitive dependency but should be explicitly
added to the POM to ensure the dependency version. The XML to add the dependencies to the POM is
shown above.
2) Unused and declared: These are not needed and removing them from the POM will speed up the build
and reduce the artifact size. The dependencies to remove are shown above.
If there are false positive dependency analysis warnings, they can be suppressed:
https://maven.apache.org/plugins/maven-dependency-plugin/analyze-mojo.html#usedDependencies
https://maven.apache.org/plugins/maven-dependency-plugin/examples/exclude-dependencies-from-dependency-analysis.html
For more information, refer to:
https://maven.apache.org/plugins/maven-dependency-plugin/analyze-mojo.html
"
- name: "security vulnerabilities"
install: skip
script: ${MVN} dependency-check:check
after_failure: |-
echo "FAILURE EXPLANATION:
The OWASP dependency check has found security vulnerabilities. Please use a newer version
of the dependency that does not have vulenerabilities. If the analysis has false positives,
they can be suppressed by adding entries to owasp-dependency-check-suppressions.xml (for more
information, see https://jeremylong.github.io/DependencyCheck/general/suppression.html).
"
- &package
name: "(openjdk8) packaging check"
install: skip
before_script: *setup_generate_license
script: >
MAVEN_OPTS='-Xmx3000m' ${MVN} clean install -Pdist -Pbundle-contrib-exts --fail-at-end
-pl '!benchmarks' ${MAVEN_SKIP} ${MAVEN_SKIP_TESTS} -Ddruid.console.skip=false -T1C
- <<: *package
name: "(openjdk11) packaging check"
jdk: openjdk11
- &test_processing_module
name: "(openjdk8) processing module test"
env: &processing_env
- MAVEN_PROJECTS='processing'
before_script: &setup_java_test
- unset _JAVA_OPTIONS
script: &run_java_test
# Set MAVEN_OPTS for Surefire launcher. Skip remoteresources to avoid intermittent connection timeouts when
# resolving the SIGAR dependency.
- >
MAVEN_OPTS='-Xmx800m' ${MVN} test -pl ${MAVEN_PROJECTS}
${MAVEN_SKIP} -Dremoteresources.skip=true
- sh -c "dmesg | egrep -i '(oom|out of memory|kill process|killed).*' -C 1 || exit 0"
- free -m
after_success: &upload_java_unit_test_coverage
- ${MVN} -pl ${MAVEN_PROJECTS} jacoco:report
# retry in case of network error
- travis_retry curl -o codecov.sh -s https://codecov.io/bash
- travis_retry bash codecov.sh -X gcov
- <<: *test_processing_module
name: "(openjdk11) processing module test"
jdk: openjdk11
- &test_processing_module_sqlcompat
name: "(openjdk8) processing module test (SQL Compatibility)"
env: *processing_env
before_script: *setup_java_test
script: &run_java_sql_compat_test
# Set MAVEN_OPTS for Surefire launcher. Skip remoteresources to avoid intermittent connection timeouts when
# resolving the SIGAR dependency.
- >
MAVEN_OPTS='-Xmx800m' ${MVN} test -pl ${MAVEN_PROJECTS} -Ddruid.generic.useDefaultValueForNull=false
${MAVEN_SKIP} -Dremoteresources.skip=true
- sh -c "dmesg | egrep -i '(oom|out of memory|kill process|killed).*' -C 1 || exit 0"
- free -m
after_success: *upload_java_unit_test_coverage
- <<: *test_processing_module_sqlcompat
name: "(openjdk11) processing module test (SQL Compatibility)"
jdk: openjdk11
- &test_indexing_module
name: "(openjdk8) indexing modules test"
env: &indexing_env
- MAVEN_PROJECTS='indexing-hadoop,indexing-service,extensions-core/kafka-indexing-service,extensions-core/kinesis-indexing-service'
before_script: *setup_java_test
script: *run_java_test
after_success: *upload_java_unit_test_coverage
- <<: *test_indexing_module
name: "(openjdk11) indexing modules test"
jdk: openjdk11
- &test_indexing_module_sqlcompat
name: "(openjdk8) indexing modules test (SQL Compatibility)"
env: *indexing_env
before_script: *setup_java_test
script: *run_java_sql_compat_test
after_success: *upload_java_unit_test_coverage
- <<: *test_indexing_module_sqlcompat
name: "(openjdk11) indexing modules test (SQL Compatibility)"
jdk: openjdk11
- &test_server_module
name: "(openjdk8) server module test"
env: &server_env
- MAVEN_PROJECTS='server'
before_script: *setup_java_test
script: *run_java_test
after_success: *upload_java_unit_test_coverage
- <<: *test_server_module
name: "(openjdk11) server module test"
jdk: openjdk11
- &test_server_module_sqlcompat
name: "(openjdk8) server module test (SQL Compatibility)"
env: *server_env
before_script: *setup_java_test
script: *run_java_sql_compat_test
after_success: *upload_java_unit_test_coverage
- <<: *test_server_module_sqlcompat
name: "(openjdk11) server module test (SQL Compatibility)"
jdk: openjdk11
- &test_modules
name: "(openjdk8) other modules test"
env: &other_env
- MAVEN_PROJECTS='!processing,!indexing-hadoop,!indexing-service,!extensions-core/kafka-indexing-service,!extensions-core/kinesis-indexing-service,!server,!web-console'
before_script: *setup_java_test
script: *run_java_test
after_success: *upload_java_unit_test_coverage
- <<: *test_modules
name: "(openjdk11) other modules test"
jdk: openjdk11
- &test_modules_sqlcompat
name: "(openjdk8) other modules test (SQL Compatibility)"
env: *other_env
before_script: *setup_java_test
script: *run_java_sql_compat_test
after_success: *upload_java_unit_test_coverage
- <<: *test_modules_sqlcompat
name: "(openjdk11) other modules test (SQL Compatibility)"
jdk: openjdk11
- &test_webconsole
name: "web console"
install: skip
script:
- ${MVN} test -pl 'web-console'
after_success:
- (cd web-console && travis_retry npm run codecov) # retry in case of network error
- name: "docs"
install: (cd website && npm install)
script: (cd website && npm run lint && npm run spellcheck)
after_failure: |-
echo "FAILURE EXPLANATION:
If there are spell check errors:
1) Suppressing False Positives: Edit website/.spelling to add suppressions. Instructions
are at the top of the file and explain how to suppress false positives either globally or
within a particular file.
2) Running Spell Check Locally: cd website && npm install && npm run spellcheck
For more information, refer to: https://www.npmjs.com/package/markdown-spellcheck
"
- &integration_batch_index
name: "batch index integration test"
services: &integration_test_services
- docker
env: TESTNG_GROUPS='-Dgroups=batch-index'
script: &run_integration_test
- ${MVN} verify -pl integration-tests -P integration-tests ${TESTNG_GROUPS} ${MAVEN_SKIP}
after_failure: &integration_test_diags
- for v in ~/shared/logs/*.log ; do
echo $v logtail ======================== ; tail -100 $v ;
done
- for v in broker middlemanager overlord router coordinator historical ; do
echo $v dmesg ======================== ;
docker exec -it druid-$v sh -c 'dmesg | tail -3' ;
done
- &integration_perfect_rollup_parallel_batch_index
name: "perfect rollup parallel batch index integration test"
services: *integration_test_services
env: TESTNG_GROUPS='-Dgroups=perfect-rollup-parallel-batch-index'
script: *run_integration_test
after_failure: *integration_test_diags
- &integration_kafka_index
name: "kafka index integration test"
services: *integration_test_services
env: TESTNG_GROUPS='-Dgroups=kafka-index'
script: *run_integration_test
after_failure: *integration_test_diags
- &integration_query
name: "query integration test"
services: *integration_test_services
env: TESTNG_GROUPS='-Dgroups=query'
script: *run_integration_test
after_failure: *integration_test_diags
- &integration_realtime_index
name: "realtime index integration test"
services: *integration_test_services
env: TESTNG_GROUPS='-Dgroups=realtime-index'
script: *run_integration_test
after_failure: *integration_test_diags
- &integration_tests
name: "other integration test"
services: *integration_test_services
env: TESTNG_GROUPS='-DexcludedGroups=batch-index,perfect-rollup-parallel-batch-index,kafka-index,query,realtime-index'
script: *run_integration_test
after_failure: *integration_test_diags