Apache Druid: a high performance real-time analytics database.
Go to file
Gian Merlino a8c7132482 Logic adjustments to SeekableStreamIndexTaskRunner. (#7267)
* Logic adjustments to SeekableStreamIndexTaskRunner.

A mix of simplifications and bug fixes. They are intermingled because
some of the bugs were made difficult to fix, and also more likely to
happen in the first place, by how the code was structured. I tried to
keep restructuring to a minimum. The changes are:

- Remove "initialOffsetsSnapshot", which was used to determine when to
  skip start offsets. Replace it with "lastReadOffsets", which I hope
  is more intuitive. (There is a connection: start offsets must be
  skipped if and only if they have already been read, either by a
  previous task or by a previous sequence in the same task, post-restoring.)
- Remove "isStartingSequenceOffsetsExclusive", because it should always
  be the opposite of isEndOffsetExclusive. The reason is that starts are
  exclusive exactly when the prior ends are inclusive: they must match
  up in that way for adjacent reads to link up properly.
- Don't call "seekToStartingSequence" after the initial seek. There is
  no reason to, since we expect to read continuous message streams
  throughout the task. And calling it makes offset-tracking logic
  trickier, so better to avoid the need for trickiness. I believe the
  call being here was causing a bug in Kinesis ingestion where a
  message might get double-read.
- Remove the "continue" calls in the main read loop. They are bad
  because they prevent keeping currOffsets and lastReadOffsets up to
  date, and prevent us from detecting that we have finished reading.
- Rework "verifyInitialRecordAndSkipExclusivePartition" into
  "verifyRecordInRange". It no longer has side effects. It does a sanity
  check on the message offset and also makes sure that it is not past
  the endOffsets.
- Rework "assignPartitions" to replace inline comparisons with
  "isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I
  believe this fixes an off-by-one error with Kinesis where the last
  record would not get read. It also makes the logic easier to read.
- When doing the final publish, only adjust end offsets of the final
  sequence, rather than potentially adjusting any unpublished sequence.
  Adjusting sequences other than the last one is a mistake since it
  will extend their endOffsets beyond what they actually read. (I'm not
  sure if this was an issue in practice, since I'm not sure if real
  world situations would have more than one unpublished sequence.)
- Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's
  shorter and more clear, I think.
- Add equals/hashCode/toString methods to OrderedSequenceNumber.

Kafka test changes:

- Added a Kafka "testRestoreAtEndOffset" test to verify that restores at
  the very end of the task lifecycle still work properly.

Kinesis test changes:

- Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that
  given Kinesis semantics, the right behavior when start offset equals
  end offset (and there aren't exclusive partitions set) is to read that
  single offset. This is because they are both meant to be treated as
  inclusive.
- Adjusted "testRestoreAfterPersistingSequences" to expect one more
  message read. I believe the old test was wrong; it expected the task
  not to read message number 5.
- Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a
  checkpoint starting from 1 rather than 2. I believe the old test was
  wrong here too; it was expecting the task to start reading from the
  checkpointed offset, but it actually should have started reading from
  one past the checkpointed offset.
- Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect
  11 messages read instead of 12. It's starting at message 0 and reading
  up to 10, which should be 11 messages.

* Changes from code review.
2019-03-15 00:22:42 -07:00
.github Adjust issue templates (#7188) 2019-03-05 16:06:40 -08:00
.idea fix intellij UnusedInspectionsScope.xml (#7158) 2019-03-04 14:56:41 -08:00
aws-common Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
benchmarks Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
ci Fix and add sys IT tests to travis script (#7208) 2019-03-08 16:40:59 -08:00
codestyle Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
core Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
distribution Exclude non-source files from src assembly (#7235) 2019-03-11 19:23:10 -07:00
docs Add missing doc link for operations/http-compression.html; Fix magic numbers in test cases using JettyServerInitUtils.wrapWithDefaultGzipHandler (#7110) 2019-03-13 14:09:19 -07:00
examples Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
extendedset Set version to 0.15.0-incubating-SNAPSHOT (#7014) 2019-02-07 14:02:52 -08:00
extensions-contrib Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
extensions-core Logic adjustments to SeekableStreamIndexTaskRunner. (#7267) 2019-03-15 00:22:42 -07:00
hll Densify swapped hll buffer (#6865) 2019-03-06 14:50:04 -08:00
indexing-hadoop Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
indexing-service Logic adjustments to SeekableStreamIndexTaskRunner. (#7267) 2019-03-15 00:22:42 -07:00
integration-tests Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
licenses Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
processing Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
publications add missing license headers, in particular to MD files; clean up RAT … (#6563) 2018-11-13 09:38:37 -08:00
server Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
services Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
sql Prohibit Throwables.propagate() (#7121) 2019-03-14 18:28:33 -03:00
web-console allow killing waiting and pending tasks (#7247) 2019-03-14 13:17:19 -07:00
.dockerignore Add docker container for druid (#6896) 2019-02-08 12:12:28 +00:00
.gitignore Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
.travis.yml Adding a Unified web console. (#6923) 2019-01-31 17:26:41 -08:00
CONTRIBUTING.md add missing license headers, in particular to MD files; clean up RAT … (#6563) 2018-11-13 09:38:37 -08:00
DISCLAIMER add missing license headers, in particular to MD files; clean up RAT … (#6563) 2018-11-13 09:38:37 -08:00
INTELLIJ_SETUP.md change propertyBase in ServerViewModule (#6774) 2019-01-02 16:44:02 +08:00
LABELS.md Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
LICENSE Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
LICENSE.BINARY Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
NOTICE Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
NOTICE.BINARY Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
README.md Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
druid_intellij_formatting.xml Add IntelliJ codestyle setting for "blank lines before package". (#6766) 2018-12-20 10:13:03 -08:00
eclipse.importorder Add checkstyle rules about imports and empty lines between members (#6543) 2018-11-20 12:42:15 +01:00
eclipse_formatting.xml Update license headers. (#5976) 2018-07-11 09:55:18 -07:00
intellij-sdk-config.jpg Prohibit and remove unused declarations in the processing module (#4930) 2017-11-09 09:27:27 -08:00
pom.xml Update LICENSE and NOTICE files (#7026) 2019-03-04 18:45:22 -08:00
upload.sh Adding licenses and enable apache-rat-plugin. (#6215) 2018-09-18 08:39:26 -07:00

README.md

Build Status Inspections Status Coverage Status IRC#druid-dev

Apache Druid (incubating)

Apache Druid (incubating) is a high performance analytics data store for event-driven data.

Disclaimer: Apache Druid is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

License

Apache License, Version 2.0

More Information

More information about Druid can be found on http://www.druid.io.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs/content in this repository and submit a pull request.

Getting Started

You can get started with Druid with our quickstart.

Reporting Issues

If you find any bugs, please file a GitHub issue.

Community

The Druid community is in the process of migrating to Apache by way of the Apache Incubator. Eventually, as we proceed along this path, our site will move from http://druid.io/ to https://druid.apache.org/.

Community support is available on the druid-user mailing list(druid-user@googlegroups.com), which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

We also have a couple people hanging out on IRC in #druid-dev on irc.freenode.net.

Building From Source

For instructions on building Druid from source, see docs/content/development/build.md

Contributing

Please follow the guidelines listed here.