druid

Apache Druid: a high performance real-time analytics database.

druid

Go to file

Gian Merlino c204d68376 Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834 ) There is a class of bugs due to the fact that BaseObjectColumnValueSelector has both "getObject" and "isNull" methods, but in most selector implementations and most call sites, it is clear that the intent of "isNull" is only to apply to the primitive getters, not the object getter. This makes sense, because the purpose of isNull is to enable detection of nulls in otherwise-primitive columns. Imagine a string column with a numeric selector built on top of it. You would want it to return isNull = true, so numeric aggregators don't treat it as all zeroes. Sometimes this design leads people to accidentally guard non-primitive get methods with "selector.isNull" checks, which is improper. This patch has three goals: 1) Fix null-handling bugs that already exist in this class. 2) Make interface and doc changes that reduce the probability of future bugs. 3) Fix other, unrelated bugs I noticed in the stringFirst and stringLast aggregators while fixing null-handling bugs. I thought about splitting this into its own patch, but it ended up being tough to split from the null-handling fixes. For (1) the fixes are, - Fix StringFirst and StringLastAggregatorFactory to stop guarding getObject calls on isNull, by no longer extending NullableAggregatorFactory. Now uses -1 as a sigil value for null, to differentiate nulls and empty strings. - Fix ExpressionFilter to stop guarding getObject calls on isNull. Also, use eval.asBoolean() to avoid calling getLong on the selector after already calling getObject. - Fix ObjectBloomFilterAggregator to stop guarding DimensionSelector calls on isNull. Also, refactored slightly to avoid the overhead of calling getObject followed by another getter (see BloomFilterAggregatorFactory for part of this). For (2) the main changes are, - Remove the "isNull" method from BaseObjectColumnValueSelector. - Clarify "isNull" doc on BaseNullableColumnValueSelector. - Rename NullableAggregatorFactory -> NullbleNumericAggregatorFactory to emphasize that it only works on aggregators that take numbers as input. - Similar naming changes to the Aggregator, BufferAggregator, and AggregateCombiner. - Similar naming changes to helper methods for groupBy, ValueMatchers, etc. For (3) the other fixes for StringFirst and StringLastAggregatorFactory are, - Fixed buffer overrun in the buffer aggregators when some characters in the string code into more than one byte (the old code used "substring" to apply a byte limit, which is bad). I did this by introducing a new StringUtils.toUtf8WithLimit method. - Fixed weird IncrementalIndex logic that led to reading nulls for the timestamp. - Adjusted weird StringFirst/Last logic that worked around the weird IncrementalIndex behavior. - Refactored to share code between the four aggregators. - Improved test coverage. - Made the base stringFirst, stringLast aggregators adaptive, and streamlined the xFold versions into aliases. The adaptiveness is similar to how other aggregators like hyperUnique work.		2019-11-07 17:46:59 -08:00
.github	add checkbox for licenses.yaml in PR template, mention it in CONTRIBUTING.md (#8367 )	2019-08-22 14:14:24 -07:00
.idea	Implementing dropwizard emitter for druid (#7363 )	2019-10-01 14:59:30 -07:00
benchmarks	parallel broker merges on fork join pool (#8578 )	2019-11-07 11:58:46 -08:00
cloud	Add credentials for ECS (#8651 )	2019-10-12 09:12:14 -07:00
codestyle	Fix dependency analyze warnings (#8230 )	2019-09-09 14:37:21 -07:00
core	Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834 )	2019-11-07 17:46:59 -08:00
dev	Add an item to concurrency checklist about assertions in parall… (#8701 )	2019-10-29 11:38:04 +03:00
distribution	update how to release doc (#8590 )	2019-10-02 08:51:25 -07:00
docs	parallel broker merges on fork join pool (#8578 )	2019-11-07 11:58:46 -08:00
examples	Fix verify script. (#8798 )	2019-10-30 23:30:01 -07:00
extendedset	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
extensions-contrib	parallel broker merges on fork join pool (#8578 )	2019-11-07 11:58:46 -08:00
extensions-core	Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834 )	2019-11-07 17:46:59 -08:00
hll	Fix dependency analyze warnings (#8230 )	2019-09-09 14:37:21 -07:00
indexing-hadoop	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 )	2019-11-06 11:07:04 -08:00
indexing-service	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 )	2019-11-06 11:07:04 -08:00
integration-tests	remove select query (#8739 )	2019-10-30 19:29:56 -07:00
licenses	add jaxb-runtime to fix exception with newer versions of java (#8409 )	2019-08-27 14:25:05 -06:00
processing	Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834 )	2019-11-07 17:46:59 -08:00
publications	[ImgBot] Optimize images (#7873 )	2019-06-24 21:27:48 -07:00
server	parallel broker merges on fork join pool (#8578 )	2019-11-07 11:58:46 -08:00
services	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 )	2019-11-06 11:07:04 -08:00
sql	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 )	2019-11-06 11:07:04 -08:00
web-console	Web console: Interval input component (#8777 )	2019-11-07 13:07:17 -08:00
website	parallel broker merges on fork join pool (#8578 )	2019-11-07 11:58:46 -08:00
.codecov.yml	Use Codecov (#8388 )	2019-08-28 08:49:30 -07:00
.dockerignore	Add docker container for druid (#6896 )	2019-02-08 12:12:28 +00:00
.gitignore	autogenerate NOTICE.BINARY from NOTICE and licenses.yaml (#8306 )	2019-08-21 12:46:27 -07:00
.travis.yml	Spellcheck docs (#8548 )	2019-09-17 12:47:30 -07:00
CONTRIBUTING.md	Fix incorrect build from source path in README.md and druid repo url. (#8531 )	2019-09-12 19:48:01 -07:00
DISCLAIMER	add missing license headers, in particular to MD files; clean up RAT … (#6563 )	2018-11-13 09:38:37 -08:00
LABELS	Add plain text README.txt, use relative link from README.md to build.md (#7611 )	2019-05-09 21:29:26 -07:00
LICENSE	Add missing license pointer for Porter Stemmer (#7941 )	2019-06-24 12:21:40 -07:00
NOTICE	add copyright info back to NOTICE and NOTICE.BINARY (#8298 )	2019-08-14 19:42:47 -05:00
README.md	Update README.md (#8829 )	2019-11-06 08:59:00 -08:00
README.template	switch links from druid.io to druid.apache.org (#7914 )	2019-06-18 09:06:27 -07:00
licenses.yaml	Upgrade joda-time to 2.10.5 (#8821 )	2019-11-06 14:30:22 -08:00
pom.xml	Upgrade joda-time to 2.10.5 (#8821 )	2019-11-06 14:30:22 -08:00
upload.sh	Adding licenses and enable apache-rat-plugin. (#6215 )	2018-09-18 08:39:26 -07:00

README.md

Apache Druid (incubating)

Apache Druid (incubating) is a high performance real-time analytics database.

Druid is a next-gen open source alternative to analytical databases such as Vertica, Greenplum, and Exadata, and data warehouses such as Snowflake, BigQuery, and Redshift.

Getting started

You can get started with Druid with our quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and servers from one convenient location. All powered by SQL systems tables allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the #druid channel in the Apache Slack team. Please use this invitation link to join the ASF Slack, and once joined, go into the #druid channel.

Building from source

Please note that JDK 8 is required to build Druid.

For instructions on building Druid from source, see docs/development/build.md

Contributing

Please follow the community guidelines for contributing.

License

Apache License, Version 2.0

Disclaimer: Apache Druid is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.