druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	d21babc5b8	remix nested columns (#14014 ) changes: * introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation * introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>. * revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility * fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)	2023-04-04 17:51:59 -07:00
Clint Wylie	d5b1b5bc8e	nested columns + arrays = array columns! (#13803 ) array columns! changes: * add support for storing nested arrays of string, long, and double values as specialized nested columns instead of breaking them into separate element columns * nested column type mimic behavior means that columns ingested with only root arrays of primitive values will be ARRAY typed columns * neat test refactor stuff * add v4 segment test * add array element indexes * add tests for unnest and array columns * fix unnest column value selector cursor handling of null and empty arrays	2023-03-27 12:42:35 -07:00
Clint Wylie	1d8fff4096	sampler + type detection = bff (#13711 ) * sampler + type detection = bff * split logical and physical dimensions, tidy up	2023-02-28 04:14:30 -08:00
Clint Wylie	fb26a1093d	discover nested columns when using nested column indexer for schemaless ingestion (#13672 ) * discover nested columns when using nested column indexer for schemaless * move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec	2023-01-18 12:57:28 -08:00
Clint Wylie	d9e5245ff0	allow string dimension indexer to handle byte[] as base64 strings (#13573 ) This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`. This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing). I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.	2022-12-16 14:50:17 +05:30
Clint Wylie	7002ecd303	add protobuf flattener, direct to plain java conversion for faster flattening (#13519 ) * add protobuf flattener, direct to plain java conversion for faster flattening, nested column tests	2022-12-09 12:24:21 -08:00
Jonathan Wei	9b8e69c99a	Add inline descriptor Protobuf bytes decoder (#13192 ) * Add inline descriptor Protobuf bytes decoder * PR comments * Update tests, check for IllegalArgumentException * Fix license, add equals test * Update extensions-core/protobuf-extensions/src/main/java/org/apache/druid/data/input/protobuf/InlineDescriptorProtobufBytesDecoder.java Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-10-11 13:37:28 -05:00
Laksh Singla	3f709db173	Make ParseExceptions more informative (#12259 ) This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader) Following changes are addressed in this PR: A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next(). IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number"). TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error. This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).	2022-02-28 22:31:15 +05:30
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Karan Kumar	b86f2d4c2e	Performance fixes in proto readers (#12267 )	2022-02-24 23:21:48 +05:30
Jonathan Wei	229f82a6f0	Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961 ) * Add parse error list API for stream supervisors, simplify parse exception message * Add input string to parse exception * Use structured ParseExceptionReport * Fix tests * Add test * PR comments, add ParseExceptionReport equals verifier * Fix test	2021-12-09 15:42:55 -06:00
Yi Yuan	aa7cb50f24	Add DynamicConfigProvider for Schema Registry (#11362 ) * add_DynamicConfigProvider_for_schema_registry * bug fixed * add document * fix document * fix spot bug * fix document * inject ObjectMapper * add DynamicConfigProviderUtils * add UT * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-03 13:24:52 -07:00
Xavier Léauté	712f2a5d00	upgrade error-prone to 2.7.1 and support checks with Java 11+ (#11363 ) * upgrade error-prone to 2.7.1 and support checks with Java 11+ - upgrade error-prone to 2.7.1 - support running error-prone with Java 11 and above using -Xplugin instead of custom compiler - add compiler arguments to ignore warnings/errors in Java 15/16 - introduce strictCompile property to enable strict profiles since we now need multiple strict profiles for Java 8 - properly exclude all generated source files from error-prone - fix druid-processing overriding annotation processors from parent pom - fix druid-core disabling most non-default checks - align plugin and annotation errorprone versions - fix / suppress additional issues found by error-prone: * fix bug in SeekableStreamSupervisor initializing ArrayList size with the taskGroupdId * fix missing @Override annotations - remove outdated compiler plugin in benchmarks - remove deleted ParameterPackage error-prone rule - re-enable checks on benchmark module as well * fix IntelliJ inspections * disable LongFloatConversion due to bug in error-prone with JDK 8 * add comment about InsecureCrypto	2021-06-16 12:55:34 -07:00
Abhishek Agarwal	44d629319d	handle timestamps of complex types when parsing protobuf messages (#11293 ) * handle timestamps correctly when parsing protobuf * Add timestamp handling to ProtobufReader * disable checkstyle for generated sourcecode * Fix test * try this * refactor tests	2021-06-07 15:19:39 +05:30
Yi Yuan	0e0c1a1aaf	add protobuf inputformat (#11018 ) * add protobuf inputformat * repair pom * alter intermediateRow to type of Dynamicmessage * add document * refine test * fix document * add protoBytesDecoder * refine document and add ser test * add hash * add schema registry ser test Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-04-12 22:03:13 -07:00
Yi Yuan	36e86a2880	Add protobuf schema registry (#10839 ) * dd_protobuf_schema_registry * change licese * delete some annotation * nodify tests * delete extra exception * add licenses * add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version * seperate kafka-protobuf-provider * modify protobuf.md * refine protobuf.md * add config and header * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-03-09 15:15:51 -08:00
xhl0726	1596b3eacd	Optimize protobuf parsing for flatten data (#9999 ) * optimize for protobuf parsing * fix import error and maven dependency * add unit test in protobufInputrowParserTest for flatten data * solve code duplication (remove the log and main()) * rename 'flatten' to 'flat' to make it clearer Co-authored-by: xionghuilin <xionghuilin@bytedance.com>	2020-06-24 18:01:31 -07:00
mcbrewster	28be107a1c	add flag to flattenSpec to keep null columns (#9814 ) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests	2020-05-08 21:53:39 -07:00
Chi Cao Minh	1166bbcb75	Remove static imports from tests (#8036 ) Make static imports forbidden in tests and remove all occurrences to be consistent with the non-test code. Also, various changes to files affected by above: - Reformat to adhere to druid style guide - Fix various IntelliJ warnings - Fix various SonarLint warnings (e.g., the expected/actual args to Assert.assertEquals() were flipped)	2019-07-06 09:33:12 -07:00
Roman Leventov	782863ed0f	Fix some problems reported by PVS-Studio (#7738 ) * Fix some problems reported by PVS-Studio * Address comments	2019-05-29 11:20:45 -07:00
Fokko Driesprong	2aa9613bed	Bump Checkstyle to 8.20 (#7651 ) * Bump Checkstyle to 8.20 Moderate severity vulnerability that affects: com.puppycrawl.tools:checkstyle Checkstyle prior to 8.18 loads external DTDs by default, which can potentially lead to denial of service attacks or the leaking of confidential information. Affected versions: < 8.18 * Oops, missed one * Oops, missed a few	2019-05-14 11:53:37 -07:00
陈春斌	624f328ea1	lazy create descriptor in ProtobufInputRowParser (#6678 )	2018-11-28 21:59:29 -08:00
Mingming Qiu	93b0d58571	optimize input row parsers (#6590 ) * optimize input row parsers * address comments	2018-11-16 11:48:32 +08:00
Roman Leventov	8f3fe9cd02	Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies (#6607 ) * Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies * Fix bug * Replace checkstyle regexp with IntelliJ inspection	2018-11-15 13:21:34 -08:00
David Lim	afb239b17a	add missing license headers, in particular to MD files; clean up RAT … (#6563 ) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg	2018-11-13 09:38:37 -08:00
Roman Leventov	54351a5c75	Fix various bugs; Enable more IntelliJ inspections and update error-prone (#6490 ) * Fix various bugs; Enable more IntelliJ inspections and update error-prone * Fix NPE * Fix inspections * Remove unused imports	2018-11-06 14:38:08 -08:00
Gian Merlino	431d3d8497	Rename io.druid to org.apache.druid. (#6266 ) * Rename io.druid to org.apache.druid. * Fix META-INF files and remove some benchmark results. * MonitorsConfig update for metrics package migration. * Reorder some dimensions in inner queries for some reason. * Fix protobuf tests.	2018-08-30 09:56:26 -07:00
Benedict Jin	331a0afb98	Remove redundant type parameters and enforce some other style and inspection rules (#5980 ) * Various changes about druid-services module * Patch improvements from reviewer * Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile * Fix ArraysAsListWithZeroOrOneArgument * Fix conflict * Fix ToArrayCallWithZeroLengthArrayArgument * Fix AliEqualsAvoidNull * Remove blank line * Remove unused import clauses * Fix code style in TopNQueryRunnerTest * Fix conflict * Don't use Collections.singletonList when converting the type of array type * Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase * Roll back the latest commit * Add java.io.File#toURL() into druid-forbidden-apis * Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord * Add a new regexp element into stylecode xml file * Fix style error for new regexp * Set the level of ArraysAsListWithZeroOrOneArgument as WARNING * Fix style error for new regexp * Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile * Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR * Add toArray(new Object[0]) regexp into checkstyle config file & fix them * Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it * Add a comment for string equals regexp in checkstyle config * Fix code format * Add RedundantTypeArguments as ERROR level inspection * Fix cannot resolve symbol datasource	2018-07-27 16:56:49 -05:00
Gian Merlino	04ea3c9f8c	Update license headers. (#5976 ) * Update license headers. For compliance with http://www.apache.org/legal/src-headers.html. * More license adjustments. * Fix mistakenly edited package line.	2018-07-11 09:55:18 -07:00
Roman Leventov	693e3575f9	Remove unused code and exception declarations (#5461 ) * Remove unused code and exception declarations * Address comments * Remove redundant Exception declarations * Make FirehoseFactoryV2.connect() to throw IOException again	2018-03-16 22:11:12 +01:00
Parag Jain	7c01f77b04	Parse Batch support (#5081 ) * add parseBatch and deprecate parse method in InputRowParser add addAll method, skip max rows in memory check for it remove parse method from implemetations transform transformers add string multiplier input row parser fix withParseSpec fix kafka batch indexing fix isPersistRequired comments * add unit test * make persist async * review comments	2017-12-04 16:06:16 -06:00
Roman Leventov	3541b7544b	Prohibit and remove unused declarations in the processing module (#4930 ) * Prohibit and remove unused declarations in the processing module * Fix tests * Fix integration tests * Suppress unused * Try to remove SuppressWarnings unused in VirtualColumn * Remove reset 'false positives' * Annotate CliCommandCreator as ExtensionPoint * Unused import warning instead of error in IntelliJ * Fixes * Add comment * Fix AzureBlob * Fix CloudFilesBlob * Address comments * Add Project SDK section to INTELLIJ_SETUP.md * Fix image	2017-11-09 09:27:27 -08:00
Roman Leventov	dc7cb117a1	Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism (#4886 ) * Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism * Fix MapVirtualColumn.makeColumnValueSelector() * Minor fixes * Fix IndexGeneratorCombinerTest * DimensionSelector to return zeros when treated as numeric ColumnValueSelector * Fix IncrementalIndexTest * Fix IncrementalIndex.makeColumnSelectorFactory() * Optimize MapBasedRow.getMetric() * Fix VarianceAggregatorTest * Simplify IncrementalIndex.makeColumnSelectorFactory() * Address comments * More comments * Test	2017-10-13 21:44:17 -05:00
Jihoon Son	675c6c00dd	Add checkstyle and intellij rule to prohibit unnecessary qualifiers in interfaces (#4958 ) * add checkstyle and intellij rule * fix tc fail	2017-10-13 07:56:19 -07:00
Jihoon Son	56fb11ce0b	Lazy initialization for JavaScript functions (#4871 ) * Lazy initialization of JavaScript functions * Fix test failure * Fix thread-safety and postpone js conf check * Fix test fail * Fix test * Fix KafkaIndexTaskTest * Move config check	2017-10-10 21:52:42 -07:00
Gian Merlino	bf8fd4c203	Add flattenSpec support to the Avro parser. (#4832 ) * Add flattenSpec support to the Avro parser. Also: - Refactor the JSONPathParser a bit so it can share flattening code with Avro (see ObjectFlatteners). - Remove the JSONParser. It was only used in two places: by UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated the former to JSONPathParser and made the latter a standalone. - Move GenericRecordAsMap to the Parquet extension, since the Avro extension no longer uses it. * Fix indentation. * Fix equals/hashCode.	2017-09-26 09:26:06 -07:00
Roman Leventov	cbd1902db8	Add forbidden-apis plugin; prohibit using system time zone (#4611 ) * Forbidden APIs WIP * Remove some tests * Restore io.druid.math.expr.Function * Integration tests fix * Add comments * Fix in SimpleWorkerProvisioningStrategy * Formatting * Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest * Address comments * Fix GroupByMultiSegmentTest	2017-08-21 13:02:42 -07:00
Roman Leventov	aa7e4ae5e4	Enforce correct spacing with Checkstyle (#4651 )	2017-08-05 10:18:25 -07:00
Roman Leventov	c0beb78ffd	Enforce brace formatting with Checkstyle (#4564 )	2017-07-21 10:26:59 -05:00
Roman Leventov	9ae457f7ad	Avoid using the default system Locale and printing to System.out in production code (#4409 ) * Avoid usages of Default system Locale and printing to System.out or System.err in production code * Fix Charset in DruidKerberosUtil * Remove redundant string format in GenericIndexed * Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format() * Fix testSafeFormat() * More fixes of redundant StringUtils.format() inside ISE * Rename unimportantSafeFormat() to nonStrictFormat()	2017-06-29 14:06:19 -07:00
Kenji Noguchi	3400f601db	Protobuf extension (#4039 ) * move ProtoBufInputRowParser from processing module to protobuf extensions * Ported PR #3509 * add DynamicMessage * fix local test stuff that slipped in * add license header * removed redundant type name * removed commented code * fix code style * rename ProtoBuf -> Protobuf * pom.xml: shade protobuf classes, handle .desc resource file as binary file * clean up error messages * pick first message type from descriptor if not specified * fix protoMessageType null check. add test case * move protobuf-extension from contrib to core * document: add new configuration keys, and descriptions * update document. add examples * move protobuf-extension from contrib to core (2nd try) * touch * include protobuf extensions in the distribution * fix whitespace * include protobuf example in the distribution * example: create new pb obj everytime * document: use properly quoted json * fix whitespace * bump parent version to 0.10.1-SNAPSHOT * ignore Override check * touch	2017-05-30 13:11:58 -07:00

41 Commits