druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	337f3870d8	Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. (#4007 ) * Fix TimeFormatExtractionFn getCacheKey when tz, locale are not provided. * Remove unused import. * Use defaults in cache key.	2017-03-04 17:41:59 -08:00
Gian Merlino	af5a4cce3c	SQL: Clarify approximate distinct count behavior. (#4000 )	2017-03-03 13:42:30 -08:00
Himanshu	e7e3c2dc5a	support singleThreaded flag for groupBy-v2 as well (#3992 )	2017-03-03 23:43:06 +05:30
Gian Merlino	4a56d7d8a0	SQL: Ability to generate exact distinct count queries. (#3999 )	2017-03-03 23:40:36 +05:30
Gian Merlino	3e8dbd59f8	Fix groupBy docs to reflect that 'v2' is default. (#3993 )	2017-03-02 15:13:39 -08:00
Gian Merlino	e63eefd7ff	Revert "SQL: Make row extractions extensible and add one for lookups. (#3989 )" The PR was merged to master accidentally. This reverts commit `23927a3c96`.	2017-03-01 17:06:12 -08:00
Jonathan Wei	5fb1638534	Add default configuration for select query 'fromNext' parameter (#3986 ) * Add default configuration for select query 'fromNext' parameter * PR comments * Fix PagingSpec config injection * Injection fix for test	2017-03-01 17:05:35 -08:00
Gian Merlino	23927a3c96	SQL: Make row extractions extensible and add one for lookups. (#3989 ) * SQL: Make row extractions extensible and add one for lookups. * Fix QuantileSqlAggregatorTest.	2017-03-01 17:03:43 -08:00
Aseem Bansal	b8ba237f78	Update toc.md (#3704 )	2017-03-01 14:33:39 -08:00
Fokko Driesprong	add17fa7db	Remove the metadataUpdateSpec from specfile (#3973 ) Get rid of the metadataUpdateSpec section in the json example to ingest parquet into druid. When this element is present, it will fail start an indexing job.	2017-03-01 14:24:36 -08:00
Akash Dwivedi	94da5e80f9	Namespace optimization for hdfs data segments. (#3877 ) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs.	2017-03-01 09:51:20 -08:00
Jonathan Wei	a08660a9ca	Support ingestion of long/float dimensions (#3966 ) * Support ingestion for long/float dimensions * Allow non-arrays for key components in indexing type strategy interfaces * Add numeric index merge test, fixes * Docs for numeric dims at ingestion * Remove unused import * Adjust docs, add aggregate on numeric dims tests * remove unused imports * Throw exception for bitmap method on numerics * Move typed selector creation to DimensionIndexer interface * unused imports * Fix * Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter * Remove spaces	2017-02-28 19:04:41 -08:00
kaijianding	ef6a19c81b	buildV9Directly in MergeTask and AppendTask (#3976 ) * buildV9Directly in MergeTask and AppendTask * add doc	2017-02-28 10:04:32 -08:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Aseem Bansal	1098ba7a7f	Update toc.md (#3703 )	2017-02-23 09:39:06 -08:00
Jihoon Son	ebd100cbb0	Set default query granularity for null value (#3965 )	2017-02-22 17:38:43 -08:00
Jihoon Son	7200dce112	Atomic merge buffer acquisition for groupBys (#3939 ) * Atomic merge buffer acquisition for groupBys * documentation * documentation * address comments * address comments * fix test failure * Addressed comments - Add InsufficientResourcesException - Renamed GroupByQueryBrokerResource to GroupByQueryResource * addressed comments * Add takeBatch() to BlockingPool	2017-02-22 14:49:37 -06:00
Gian Merlino	e7d01b67b6	Move SQL configs to sql.md. (#3959 ) This puts all the SQL stuff in one place. It also makes life easier by pointing out that configs be made in either common.runtime.properties or the broker runtime.properties.	2017-02-22 08:37:24 -08:00
Jonathan Wei	bc33b68b51	Use GroupBy V2 as default (#3953 ) * Use GroupBy V2 as default * Remove unused line * Change assert to exception propagation	2017-02-18 07:40:40 -08:00
michaelschiff	e5fb0e1ff5	New property for each metric that tells the StatsDEmitter to convert metric values from range 0-1 to 0-100. This (#3936 ) prevents rates and percentages expressed as Doubles (0.xx) from being rounded down to 0.	2017-02-16 13:55:56 -08:00
Gian Merlino	ca6053d045	SQL: Resolve column type conflicts in favor of newer segments. (#3930 ) * SQL: Resolve column type conflicts in favor of newer segments. Helps with schema evolution from e.g. long -> float, which is supported on the query side. * Take columns from highest timestamp instead of max segment id. * Fixes and docs.	2017-02-15 17:48:49 -08:00
Gian Merlino	16ef513c7d	SQL: Add context and contextual functions to planner. (#3919 ) * SQL: Add context and contextual functions to planner. Added support for context parameters specified as JDBC connection properties or a JSON object for SQL-over-JSON-over-HTTP. Also added features that depend on context functionality: - Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions. - Added support for time zones other than UTC via a "timeZone" context. - Pass down query context to Druid queries too. Also some bug fixes: - Fix DATE handling, it was largely done incorrectly before. - Fix CAST(__time TO DATE) which should do a floor-to-day. - Fix non-equality comparisons to FLOOR(__time TO X). - Fix maxQueryCount property. * Pass down context to nested queries too.	2017-02-15 14:09:14 -08:00
Jihoon Son	a459db68b6	Fine grained buffer management for groupby (#3863 ) * Fine-grained buffer management for group by queries * Remove maxQueryCount from GroupByRules * Fix code style * Merge master * Fix compilation failure * Address comments * Address comments - Revert Sequence - Add isInitialized() to Grouper - Initialize the grouper in RowBasedGrouperHelper.Accumulator - Simple refactoring RowBasedGrouperHelper.Accumulator - Add tests for checking the number of used merge buffers - Improve docs * Revert unnecessary changes * change to visible to testing * fix misspelling	2017-02-14 12:55:54 -08:00
Gian Merlino	78b0d134ae	Require Java 8 and include some Java 8 dependencies. (#3914 ) * Require Java 8 and include some Java 8 dependencies. - Upgrade Jetty to 9.3.16.v20170120. - Upgrade DataSketches to 0.8.4. - Bundle caffeine-cache by default. - Still target Java 7 when compiling base Druid classes. * Update cluster, quickstart docs. * Remove oraclejdk7 from travis.yml.	2017-02-14 12:51:51 -08:00
DaimonPl	a2875a4d91	pre-computed HLL support for hyperUnique aggregator (#3909 )	2017-02-13 15:26:20 -08:00
Himanshu	9dfcf0763a	disable javascript execution by default (#3818 )	2017-02-13 15:11:18 -08:00
Himanshu	8cf7ad1e3a	druid.coordinator.asOverlord.enabled flag at coordinator to make it an overlord too (#3711 )	2017-02-13 15:03:59 -08:00
Jonathan Wei	ca2b04f0fd	Add long/float ColumnSelectorStrategy implementations (#3838 ) * Add long/float ColumnSelectorStrategy implementations * Address PR comments * Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests * PR comments * Use BaseSingleValueDimensionSelector for long/float wrapping * remove unused import * Address PR comments * PR comments * PR comments * More PR comments * Fix failing calcite histogram subquery tests * ScanQuery test and comment about isInputRaw * Add outputType to extractionDimensionSpec, tweak SQL tests * Fix limit spec optimization for numerics * Add cardinality sanity checks to TopN * Fix import from merge * Add tests for filtered dimension spec outputType * Address PR comments * Allow filtered dimspecs on numerics * More comments	2017-02-08 20:39:29 -08:00
Erik Dubbelboer	2aa2fa57b5	Simple doc fix (#3907 )	2017-02-06 15:52:17 +05:30
Darío	8f4394ca49	Update segments.md (#3904 )	2017-02-03 10:31:14 -06:00
Nishant Bangarwa	a457cded28	Druid Extension to enable Authentication using Kerberos. (#3853 ) * Add extension for supporting kerberos security - This PR adds an extension for supporting druid authentication via Kerberos. - Working on the docs. * Add docs * review comments * more review comments * Block all paths by default * more review comments - use proper Oid * Allow extensions to override httpclient for integration tests * Add kerberos lock to prevent multithreaded issues. * review comment - remove enabled flag and fix router injection * Add Cookie Handling and more detailed docs * review comment - rename DruidKerberosConfig -> AuthKerberosConfig * review comments * fix travis failure on jdk7	2017-02-02 14:55:21 -06:00
Jonathan Wei	182261f713	Allow configurable temp directory for query processing (#3893 )	2017-02-02 10:22:28 -08:00
Gian Merlino	151ff6d064	flattenSpec: Document that "expr" is ignored for type "root". (#3884 )	2017-01-31 10:27:20 -08:00
Gian Merlino	ac84a3e011	SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868 ) * SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE. * Fix bug with re-use of filtered approximate histogram aggregators. Also add APPROX_QUANTILE tests for filtering and running on complex columns. Includes some slight refactoring to allow tests to make DruidTables that include complex columns. * Remove unused import	2017-01-25 18:39:26 -08:00
Niketh Sabbineni	2b8d3c102b	Remove throttling on drop segments (#3736 ) * Remove throttling on drop * Throttle loadqueuepeon segment change requests to ZK * Make initial delay configurable, add docs, shutdown gracefully * Make loadqueuepeon repeat delay configurable	2017-01-20 10:02:19 -08:00
Gian Merlino	d51f5e058d	SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852 ) * SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. Switched from CalciteConnection to Planner, bringing benefits: - CalciteConnection's JDBC interface no longer sits between the SQL server (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use Druid Sequence objects directly, reducing overhead in the query return path. - Implemented our own Planner-based Avatica Meta, letting us control connection timeouts and connection / statement limits. The previous CalciteConnection-based implementation didn't have any limits or timeouts. - The Planner interface lets us override the operator table, opening up SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT in core, and a QUANTILE aggregator in the druid-histogram extension. Also: - Added INFORMATION_SCHEMA metadata schema. - Added tests for Unicode literals and escapes. * Verify statement is actually open before closing it. * More detailed INFORMATION_SCHEMA docs.	2017-01-19 16:32:20 -08:00
kaijianding	33ae9dd485	streaming version of select query (#3307 ) * streaming version of select query * use columns instead of dimensions and metrics;prepare for valueVector;remove granularity * respect query limit within historical * use constant * fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner * add some test for scan query * add scan query document * fix merge conflicts * add compactedList resultFormat, this format is better for json ser/der * respect query timeout * respect query limit on broker * use static consts and remove unused code	2017-01-19 16:09:53 -06:00
David Lim	ff52581bd3	IndexTask improvements (#3611 ) * index task improvements * code review changes * add null check	2017-01-18 14:24:37 -08:00
Fokko Driesprong	31bea380eb	Updated Apache Zookeeper to the latest stable version (#3841 )	2017-01-12 13:39:29 -08:00
Gian Merlino	e86859b228	SQL support for nested groupBys. (#3806 ) * SQL support for nested groupBys. Allows, for example, doing exact count distinct by writing: SELECT COUNT() FROM (SELECT DISTINCT col FROM druid.foo) Contrast with approximate count distinct, which is: SELECT COUNT(DISTINCT col) FROM druid.foo Add deeply-nested groupBy docs, tests, and maxQueryCount config. * Extract magic constants into statics. * Rework rules to put preconditions in the "matches" method.	2017-01-11 18:32:53 -08:00
Gian Merlino	76620615a1	Properly respect the enableAvatica and enableJsonOverHttp options. (#3834 )	2017-01-11 14:43:34 -06:00
Jihoon Son	c099977a5b	Add an option to SearchQuery to choose a search query execution strategy (#3792 ) * Add an option to SearchQuery to choose a search query execution strategy. Supported strategies are 1) Index-only query execution 2) Cursor-based scan 3) Auto: choose an efficient strategy for a given query * Add SearchStrategy and SearchQueryExecutor * Address comments * Rename strategies and set UseIndexesStrategy as the default strategy * Add a cost-based planner for auto strategy * Add document * Fix code style * apply code style * apply comments	2017-01-10 18:04:20 -08:00
Vinh Tran	dddeae813a	Update caching.md typo (#3824 ) * Update caching.md Typo of Command vs Comma * Update index.md Fixing `Command` typo	2017-01-06 12:14:07 -08:00
Yuusaku Taniguchi	02519d5b64	Exhibitor Support (#3664 ) * allow JsonConfigTesterBase to treat the fields of collections * [Feature] Exhibitor Support (#3664) This patch provides the integration of Druid & Netflix Exhibitor. Druid currently use Apache Curator as ZooKeeper client. Curator can be integrated with Exhibitor to achieve a live/updating list of the ZooKeeper ensemble. This patch enables Druid to use this features.	2017-01-02 09:15:36 -08:00
Himanshu	4ca3b7f1e4	overlord helpers framework and tasklog auto cleanup (#3677 ) * overlord helpers framework and tasklog auto cleanup * review comment changes * further review comments addressed	2016-12-21 15:18:55 -08:00
Nishant	f576a0ff14	Contrib Extension for Ambari Metrics Emitter (#3767 ) * Contrib Extension for Ambari Metrics Emitter extension to enable druid to send metrics to ambari metrics server (https://cwiki.apache.org/confluence/display/AMBARI/Metrics) review comments switch to public repo * review comments * add docs * fix pom version * Add link for doc page in extensions.md * remove unused imports * review comments review comments remove unused dependency review comment	2016-12-19 11:12:47 -08:00
Nishant	35160e5595	Add metrics for Query Count statistics (#3470 ) * Add metrics for Query Count statistics This PR adds a new metrics monitor “QueryCountStatsMonitor” which emits three new metrics - 1) query/success/count - number of successful queries 2) query/failed/count - number of failed queries 3) query/interrupted/count - number of interrupted/timedout queries fix bindings * make fields final * fix imports * AsyncQueryForwardingServlet implement QueryStatsProvider * remove unused import	2016-12-19 09:47:58 -08:00
David Lim	8eee259629	add documentation on segments generated (#3785 )	2016-12-19 09:41:47 -08:00
Dongkyu Hwangbo	da007ca3c2	Replace caravel with superset (#3780 )	2016-12-16 20:47:52 -08:00
Gian Merlino	dd63f54325	Built-in SQL. (#3682 )	2016-12-16 17:15:59 -08:00
Jonathan Wei	2bfcc8a592	First and Last Aggregator (#3566 ) * add first and last aggregator * add test and fix * moving around * separate aggregator valueType * address PR comment * add finalize inner query and adjust v1 inner indexing * better test and fixes * java-util import fixes * PR comments * Add first/last aggs to ITWikipediaQueryTest	2016-12-16 15:26:40 -08:00
Nishant	8cfcb95fbc	Add Filtered and Composing request loggers (#3469 ) * Add Filtered and Composing request loggers Add Filtered and Composite Request loggers - enables users to filter request logs for slow queries. fix test * review comments * review comment * remove unused import	2016-12-16 11:18:32 -08:00
Himanshu	ed322a4beb	remove size from default analysisTypes list for segmentMetadata query (#3773 )	2016-12-13 18:01:21 -08:00
Ninglin Du	469ab21091	[Feature] Thrift support for realtime and batch ingestion (#3418 ) * Thrift ingestion plugin 1. thrift binary is platform dependent, use scrooge to generate java files to avoid style check failure 2. stream and hadoop ingesion are both supported, input format can be sequence file and lzo thrift block file. 3. base64 and protocol aware change header * fix conlicts in pom	2016-12-13 10:05:15 -08:00
kaijianding	4be3eb0ce7	report message gap, source gap and sink count in RealtimePlumber (#3744 ) * report message gap, source gap and sink count in RealtimePlumber * report message gap, sink count in Appenderator * add ingest/events/sourceGap in metrics.md * remove source gap	2016-12-13 11:23:02 -06:00
Erik Dubbelboer	bb9e35e1af	Add Greatest and Least post aggregations (#3567 )	2016-12-07 17:58:23 -08:00
Gian Merlino	943982b7b0	Configurable HTTP compression. (#3759 ) * Configurable HTTP compression. * Call real-time nodes real-time processes in docs.	2016-12-07 17:40:39 -08:00
Himanshu	06d0ef9c6c	allow and load extensions with absolute paths in druid.extensions.loadList (#3747 )	2016-12-06 17:40:23 -08:00
Himanshu	45da7e48f1	groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672 ) * sortByDimsFirst flag for groupBy query * Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType> * fix review comments * fix review comments regarding removing code duplication of dim/time comparison * move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde * remove unnecessary system.out.println * make access static var NATURAL_NULLS_FIRST directly * further review comments addressing	2016-12-06 09:48:56 -08:00
Niketh Sabbineni	d904c79081	Normalized Cost Balancer (#3632 ) * Normalized Cost Balancer * Adding documentation and renaming to use diskNormalizedCostBalancer * Remove balancer from the strings * Update docs and include random cost balancer * Fix checkstyle issues	2016-12-05 17:18:20 -08:00
Gian Merlino	353fee79dd	Add "asMillis" option to "timeFormat" extractionFn. (#3733 ) This is useful for chaining extractionFns that all want to treat time as millis, such as having a javascript extractionFn after a timeFormat.	2016-12-02 13:45:16 -08:00
Gian Merlino	102375d9bb	Add "strlen" extractionFn. (#3731 )	2016-12-02 12:08:51 -08:00
Gian Merlino	4c5d10f8a3	Add DimFilterHavingSpec. (#3727 ) * Add DimFilterHavingSpec. * Add test for DimFilterHavingSpec with extractionFns.	2016-12-02 10:04:30 -08:00
Gian Merlino	e4465e63bd	Fix ordering of sections on dimensionspecs.md. (#3722 ) The Filtered and List DimensionSpecs were mixed in with the extraction functions.	2016-11-29 16:28:36 -08:00
Niketh Sabbineni	2640d170c3	Blacklist workers if they fail for too many times (#3643 ) * Blacklist workers if they fail for too many times * Adding documentation * Changing to timeout to period and updating docs * 1. Add configurable maxPercentageBlacklistWorkers 2. Rename variable * Change maxPercentageBlacklistWorkers to double * Remove thread.sleep	2016-11-29 12:38:56 +05:30
Erik Dubbelboer	9f7050e221	Fix some grammar and spelling mistakes (#3717 )	2016-11-28 11:49:30 -08:00
Himanshu	7d37f675ba	fix the documented property name for specifying avro reader schema (#3708 )	2016-11-22 15:02:41 -08:00
Parag Jain	7ee6bb7410	option to reset offest automatically in case of OffsetOutOfRangeException (#3678 ) * option to reset offset automatically in case of OffsetOutOfRangeException if the next offset is less than the earliest available offset for that partition * review comments * refactoring * refactor * review comments	2016-11-21 16:29:46 -06:00
Jonathan Wei	7c63bee7f5	Add mapreduce.job.classloader.system.classes property to 'Other Hadoop Versions' docs (#3706 )	2016-11-18 16:16:50 -08:00
Erik Dubbelboer	7d36f540e8	WIP: Add Google Storage support (#2458 ) Also excludes the correct artifacts from #2741	2016-11-16 14:06:45 +05:30
Keuntae Park	094f5b851b	Support Min/Max for Timestamp (#3299 ) * Min/Max aggregator for Timestamp * remove unused imports and method * rebase and zip the test data * add docs	2016-11-14 23:00:21 -08:00
Joan Viladrosa	2df98bcaa6	Fixed Missing commas in json example of Lookup (#3680 )	2016-11-15 14:56:18 +09:00
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
praveev	52a74cf84f	Use timestamp in millis as Map key instead of DateTime object (#3674 ) * Use Long timestamp as key instead of DateTime. DateTime representation is screwed up when you store with an obj and read with a different DateTime obj. For example: The code below fails when you use DateTime as key ``` DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles"))); HashMap<DateTime, String> map = new HashMap<>(); map.put(odt, "abc"); DateTime dt = new DateTime(odt.getMillis()); System.out.println(map.get(dt)); ``` * Respect timezone when creating the file. * Update docs with timezone caveat in granularity spec * Remove unused imports	2016-11-11 10:20:20 -08:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Mark	575aeb843a	Metadata Storage extension for Microsoft SqlServer (sqlserver-metadata-storage) (#3421 )	2016-11-08 14:56:52 -08:00
Nicolas Colomer	37ecffb648	Add support for Confluent Schema Registry in the avro extension (#3529 )	2016-11-08 16:10:45 -06:00
cheddar	c49a9d5693	Call out semver expectations for modules (#3659 ) * Call out semver expectations for modules * Update modules.md * Link to versioning	2016-11-04 12:52:05 -07:00
Gian Merlino	2c504b6258	Add "like" filter. (#3642 ) * Add "like" filter. * Addressed some PR comments. * Slight simplifications to LikeFilter. * Additional simplifications. * Fix comment in LikeFilter. * Clarify comment in LikeFilter. * Simplify LikeMatcher a bit. * No use going through the optimized path if prefix is empty. * Add more tests.	2016-11-04 23:25:03 +05:30
Gian Merlino	4203580290	URIExtractionNamespace: Treat null values in lookup maps as missing entries. (#3512 ) * URIExtractionNamespace: Treat null values in lookup maps as missing entries. This is useful when many logical lookups are derived from the same base JSON file, and some lookups' values may be unknown sometimes. * Add test, logging message, and address other comments. * Update docs.	2016-11-03 13:53:04 -07:00
Navis Ryu	e10def32f2	Support string type in math expression (#2836 ) * Support string type in math expression addressed comments addressed comments Addressed comments * Updated math function document * Addressed comments	2016-11-02 21:10:48 -06:00
Himanshu	641469fc38	manage overshadowing efficiently at coordinator (#3584 ) * manage overshadowing efficiently at coordinator * take readlock in VersionedIntervalTimeline.isOvershadowed()	2016-10-24 22:49:08 +05:30
jaehong choi	6f21778364	Support finding segments in AWS S3. (#3399 ) * support finding segments from a AWS S3 storage. * add more Uts * address comments and add a document for the feature. * update docs indentation * update docs indentation * address comments. 1. add a Ut for json ser/deser for the config object. 2. more informant error message in a Ut. * address comments. 1. use @Min to validate the configuration object 2. change updateDescriptor to a string as it does not take an argument otherwise * fix a Ut failure - delete a Ut for testing default max length.	2016-10-10 17:27:09 -07:00
Akash Dwivedi	3a83e0513e	Doc update(batch-ingestion) to include useExplicitVersion. (#3557 )	2016-10-07 14:48:00 -07:00
praveev	43cdc675c7	Add support for timezone in segment granularity (#3528 ) * Add support for timezone in segment granularity * CR feedback. Handle null timezone during equals check. * Include timezone in docs. Add timezone for ArbitraryGranularitySpec.	2016-10-03 08:15:42 -07:00
David Lim	9226d4af3c	configurable shutdownTimeout for Kakfa supervisor (#3497 ) * configurable shutdownTimeout * cr change	2016-09-23 13:26:45 -06:00
David Lim	ca9114b41b	add supervisor reset API (#3484 ) * add supervisor reset API * CR doc changes and kill running tasks / clear offsets from supervisor	2016-09-22 17:51:06 -07:00
Gian Merlino	f8d71fc602	groupBy: Fix maxMergingDictionarySize config. (#3488 )	2016-09-22 10:02:33 -07:00
Navis Ryu	49c0fe0e8b	Show candidate hosts for the given query (#2282 ) * Show candidate hosts for the given query * Added test cases & minor changes to address comments * Changed path-param to query-pram for intervals/numCandidates	2016-09-22 08:32:38 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
David Lim	96fcca18ea	update KafkaSupervisor to make HTTP requests to tasks in parallel where possible (#3452 )	2016-09-20 22:51:15 +05:30
Slim	3175e17a3b	Cached lookup module. first cut implementing JDBC cache (#2819 )	2016-09-16 13:45:54 -07:00
Gian Merlino	7a2a4bc6de	JavaScript: Disable now affects worker selection and router strategy too. (#3458 )	2016-09-13 16:37:42 -07:00
Gian Merlino	e0e28866ee	JavaScript docs: Fix links and typos, add to TOC. (#3457 )	2016-09-13 15:26:44 -07:00
Himanshu	a069257d37	avro-extension -- feature to specify multiple avro reader schemas inline (#3368 ) * rename SimpleAvroBytesDecoder to InlineSchemaAvroBytesDecoder * feature to specify multiple schemas inline in avro module	2016-09-13 14:54:31 -07:00
Gian Merlino	76a24054e3	JavaScript docs, including docs for globals. (#3454 )	2016-09-13 13:46:55 -07:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Gian Merlino	1344e3c3af	Clearer filter docs. (#3448 )	2016-09-09 13:47:13 -07:00
Gian Merlino	1e3f94237e	groupBy v2: Configurable load factor. (#3437 ) Also change defaults: - bufferGrouperMaxLoadFactor from 0.75 to 0.7. - maxMergingDictionarySize to 100MB from 25MB, should be more appropriate for most heaps.	2016-09-07 14:14:59 -05:00
David Lim	3a97fd4d6c	doc fix (#3430 )	2016-09-06 13:13:30 -06:00
Gian Merlino	e9050c2b4c	TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411 )	2016-08-31 20:58:53 +05:30
Stéphane Derosiaux	48dce88aab	Add flag binaryAsString for parquet ingestion (#3381 )	2016-08-30 17:30:50 -07:00
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Jonathan Wei	4e91330a17	Use DimensionSpec in CardinalityAggregatorFactory (#3406 ) * Use DimensionSpec in CardinalityAggregatorFactory * Address PR comments * Fix requiredFields()	2016-08-30 15:54:02 -07:00
Ashish	6b40bf8b32	doc: added note to README, about necessary hdfs config after insert-segment-to-db (#3402 )	2016-08-28 16:39:33 -07:00
Chanh Le	d624037698	Pull-deps: correct the library directory in the document (#3361 ) * Pull-deps: correct the library directory in the document * Pull-deps: correct the library directory in the document in the last example command	2016-08-16 17:18:15 -07:00
Fangjin Yang	edb0eca3a9	fix docs (#3370 )	2016-08-16 16:25:50 -07:00
Fangjin Yang	6beb8ac342	fix some docs and add new content (#3369 )	2016-08-16 15:00:18 -07:00
rajk-tetration	362b9266f8	Adding filters for TimeBoundary on backend (#3168 ) * Adding filters for TimeBoundary on backend Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com> * updating TimeBoundaryQuery constructor in QueryHostFinderTest * add filter helpers * update filterSegments + test * Conditional filterSegment depending on whether a filter exists * Style changes * Trigger rebuild * Adding documentation for timeboundaryquery filtering * added filter serialization to timeboundaryquery cache * code style changes	2016-08-15 10:25:24 -07:00
Himanshu	03cfcf002b	fix the race described in #3174 (#3205 )	2016-08-10 11:29:50 -07:00
Himanshu	46da682231	avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249 )	2016-08-10 10:49:26 -07:00
Nishant	8035c73409	Implement EnvironmentVariablePasswordProvider (#3329 ) * Implement EnvironmentVariablePasswordProvider * Review Comment : rename passwordKey to passwordVariable * add docs * improve doc layout * review comment: rename property for variable	2016-08-10 05:33:51 +08:00
Gian Merlino	8899affe48	Introduce standardized "Resource limit exceeded" error. (#3338 ) Fixes #3336.	2016-08-09 10:50:56 -07:00
Gian Merlino	21bce96c4c	More useful query errors. (#3335 ) Follow-up to #1773, which meant to add more useful query errors but did not actually do so. Since that patch, any error other than interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`. With this patch, the error fields are: - error, one of the specific strings "Query interrupted", "Query timeout", "Query cancelled", or "Unknown exception" (same behavior as before). - errorMessage, the message of the topmost non-QueryInterruptedException in the causality chain. - errorClass, the class of the topmost non-QueryInterruptedException in the causality chain. - host, the host that failed the query.	2016-08-09 16:14:52 +08:00
Navis Ryu	39351fb8d2	Mask properties from logging (#3332 ) * Mask properties from logging * mask "password" by default	2016-08-08 21:36:10 +05:30
Himanshu	ed5b92d612	document how to check MM enabled/disabled (#3331 )	2016-08-06 05:56:51 +08:00
Jonathan Wei	decefb7477	Add time interval dim filter and retention analysis example (#3315 ) * Add time interval dim filter and retention analysis example * Use closed-open matching for intervals, update cache key generation * Fix time filtering tests for interval boundary change	2016-08-05 07:25:04 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
Dave Li	bc20658239	groupBy nested query using v2 strategy (#3269 ) * changed v2 nested query strategy * add test for #3239 * update for new ValueMatcher interface and add benchmarks * enable time filtering * address PR comments * add failing test for outer filter aggregator * add helper class for sharing code * update nested groupby doc * move temporary storage instantiation * address PR comment * address PR comment 2	2016-08-01 18:30:39 -07:00
Fangjin Yang	d51ec398d4	fix parquet docs (#3304 )	2016-08-01 07:54:48 -07:00
Jonathan Wei	a6105cbb86	Add numeric StringComparator (#3270 ) * Add numeric StringComparator * Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query * Address PR comments, add multithreaded BoundDimFilter test * Add comment on strlen tie handling * Add timeseries interval filter benchmark * Adjust docs * Use jackson for StringComparator, address PR comments * Add new TopNMetricSpec and SearchSortSpec with tests (WIP) * More TopNMetricSpec and SearchSortSpec tests * Fix NewSearchSortSpec serde * Update docs for new DimensionTopNMetricSpec * Delete NumericDimensionTopNMetricSpec * Delete old SearchSortSpec * Rename NewSearchSortSpec to SearchSortSpec * Add TopN numeric comparator benchmark, address PR comments * Refactor OrderByColumnSpec * Add null checks to NumericComparator and String->BigDecimal conversion function * Add more OrderByColumnSpec serde tests	2016-07-29 15:44:16 -07:00
Charles Allen	d04af6aee4	Add `slf4j` requst logger (#3146 ) * Add `slf4j` requst logger * Address comments * Fix conflicts with master * Fix removed map value	2016-07-29 15:15:41 -07:00
Gian Merlino	e5397ed316	Link up Hadoop class loading docs better. (#3302 )	2016-07-29 10:19:54 -07:00
Gian Merlino	2553997200	Associate groupBy v2 resources with the Sequence lifecycle. (#3296 ) This fixes a potential issue where groupBy resources could be allocated to create a Sequence, but then the Sequence is never used, and thus the resources are never freed. Also simplifies how groupBy handles config overrides (this made the new unit test easier to write).	2016-07-27 18:44:19 -07:00
Charles Allen	546e4f79b0	Add size of pending deletes to historical metrics (#3295 ) * Add size of pending deletes to historical metrics	2016-07-27 11:30:47 -07:00
Charles Allen	b1e3fe77f5	More logging around how the coordinator balancer is happening (#3279 ) * More logging around how the coordinator balancer is happening * Address comments * Address code review comments and add actual logging	2016-07-27 13:24:32 +05:30
David Lim	9a068e1ba6	fix broken link and use of pipes in table (#3290 )	2016-07-26 15:46:51 -07:00
Keuntae Park	95a58097e2	Hadoop InputRowParser for Orc file (#3019 ) * InputRowParser to decode OrcStruct from OrcNewInputFormat * add unit test for orc hadoop indexing * update docs and fix test code bug * doc updated * resove maven dependency conflict * remove unused imports * fix returning array type from Object[] to correct primitive array type * fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list * rebase and updated based on comments * updated based on comments * on reflecting review comments * fix bug in typeStringFromParseSpec() and add unit test * add license header	2016-07-26 09:42:56 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
Jonathan Wei	a42ccb6d19	Support filtering on long columns (including __time) (#3180 ) * Support filtering on __time column * Rename DruidPredicate * Add docs for ValueMatcherFactory, add comment on getColumnCapabilities * Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate * Address PR comments (support filter on all long columns) * Use predicate factory instead of composite predicate * Address PR comments * Lazily initialize long handling in selector/in filter * Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe * Add multithreaded selector/in filter test * Fix non-final lock object in SelectorDimFilter	2016-07-20 17:08:49 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Gian Merlino	dd4ec751d0	Update docs for working with Hadoop dependencies. (#3252 ) - Attempt to make things clearer in general - Point out that HDFS deep storage and MR jobs don't use the same loading mechanism - Recommend using mapreduce.job.classloader = true when possible	2016-07-18 07:47:58 -05:00
Himanshu	3f82108d15	optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250 )	2016-07-17 18:47:52 -07:00
Gian Merlino	90f5d8cd17	Fix path in cluster.md. (#3253 )	2016-07-17 08:30:20 -07:00
Gian Merlino	6a03a0cfec	Fix ingest/persist/backPressure docs. (#3243 )	2016-07-13 21:56:28 -07:00
Gian Merlino	3ab4a4efbc	Fix formatting in granularities doc. (#3229 )	2016-07-08 09:29:58 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Charles Allen	3f1681c16c	Caffeine cache extension (#3028 ) * Initial commit of caffeine cache * Address code comments * Move and fixup README.md a bit * Improve caffeine readme information * Cleanup caffeine pom * Address review comments * Bump caffeine to 2.3.1 * Bump druid version to 0.9.2-SNAPSHOT * Make test not fail randomly. See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation * Fix distribution and documentation * Add caffeine to extensions.md * Fix links in extensions.md * Lexicographic	2016-07-06 15:42:54 -07:00
Gian Merlino	b8a4f4ea7b	DumpSegment: Add --dump bitmaps option. (#3221 ) Also make --dump metadata respect --column.	2016-07-06 12:42:50 -07:00
Gian Merlino	fdc7e88a7d	Allow queries with no aggregators. (#3216 ) This is actually reasonable for a groupBy or lexicographic topNs that is being used to do a "COUNT DISTINCT" kind of query. No aggregators are needed for that query, and including a dummy aggregator wastes 8 bytes per row. It's kind of silly for timeseries, but why not.	2016-07-06 20:38:54 +05:30
Fangjin Yang	8eeae2e844	remove bad docs on setting up clusters (#3188 )	2016-07-01 15:41:40 -05:00
Parag Jain	99844dfeb5	remove need for tmp extensions dir (#3211 ) correct lib path relative to base distribution dir	2016-07-01 12:55:57 -07:00
Charles Allen	8b7d9750ee	Update extension docs for global lookup module (#3206 )	2016-06-29 12:51:52 -07:00
David Lim	b24425a280	update docs with new behavior (#3200 )	2016-06-28 16:17:04 -07:00
jaehong choi	efbcbf5315	Support alphanumeric sort in search query (#2593 ) * support alphanumeric sort in search query * address a comment about handling equals() and hashCode() * address comments * add Ut for string comparators * address a comment about space indentations.	2016-06-28 15:06:18 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
michaelschiff	66d8ad36d7	adds new coordinator metrics 'segment/unavailable/count' and (#3176 ) 'segment/underReplicated/count' (#3173)	2016-06-23 14:53:15 -07:00
Gian Merlino	da660bb592	DumpSegment tool. (#3182 ) Fixes #2723.	2016-06-23 14:37:50 -07:00
Dave Li	12be1c0a4b	Add bucket extraction function (#3033 ) * add bucket extraction function * add doc and header * updated doc and test	2016-06-17 09:24:27 -07:00

1 2 3 4 5 ...

1413 Commits