Commit Graph

108 Commits

Author SHA1 Message Date
Mark Payne e06afbdd22
NIFI-8609: Optimized AvroTypeUtil Record creation and conversion
Added unit test that is ignored so that it can be manually run for testing performance before/after changes to AvroTypeUtil. Updated AvroTypeUtil to be more efficient by not using Record.getValue() and instead iterating over the Map of values directly. getValue() is less efficient here because we know the RecordField's we are iterating over exist in the schema since they are retrieved from there directly; as a result, any null values still have be looked up by aliaases, but that step can be skipped in this situation. Also avoided looking for fields that exist in Avro Schema and not in RecordSchema just to set default values on GenericRecord - there's no need to set them if they are default values.

This closes #5080

Signed-off-by: David Handermann <exceptionfactory@apache.org>
2021-06-15 09:05:38 -05:00
Tamas Palfy 5108d7cdd0 NIFI-8439 Update parquet-avro to allow reading parquet INT96 timestamps as byte arrays (instead of throwing an exception).
Also allow to write them as such (byte-arrays) - again, instead of throwing an exception.

NIFI-8439 Fixed unit tests.
NIFI-8439 Allow writing parquet INT96 timestamps if they were read by the same parquet-avro library.

This closes #5006.

Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
2021-05-25 13:18:46 +02:00
exceptionfactory 6776765a92
NIFI-8538 Upgraded Apache Commons IO to 2.8.0
- Upgraded direct dependencies from 2.6 to 2.8.0
- Added dependency management configuration to use 2.8.0 for some modules
- Updated scripted Groovy tests to avoid copying unnecessary files

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #5073
2021-05-14 09:31:48 -04:00
Mark Payne 2895bac2c0
NIFI-8512: When converting to/from Avro UNION type, we can be more efficient when the UNION consists of a Null type and one other type by determinine the non-null type and just using that. Also eliminated a call to List.stream() and related .collect() call by using an existing method that performs the logic without the very expensive call to stream()
Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #5051
2021-05-06 16:06:23 -04:00
Tamas Palfy a50957161c NIFI-8365 Fix JSON AbstractJsonRowRecordReader to handle deep CHOICE-typed records properly: change the logic that selects the first compatible schema which can have missing fields compared to the real value and search for a more strict match first and fallback to the existing logic only if not one found.
- AbstractJsonRowRecordReader - Handle (meaning log a warning and not fail completely) multi-array CHOICE type when data has extra fields (not defined by the schema) and can't determine correct type.
- AvroTypeUtil - Allow multiple different record types in avro union type. Minor refactors. Added documentation fro EqualsWrapper.
2021-04-19 12:56:09 -04:00
Koji Kawamura 68d38dd0a6
NIFI-6752 Add ASN.1 RecordReader
NIFI-6752 Refactored type and value conversion logic. Added support for more types. Added more tests.
Removed 'parent' from 'Recursive'. (Caused issues. The recursive nature is still there as it has a child with the same type).
Updated jasn1 1.11.2 to asn1bean 1.12.0. If an asn field name is a Java reserved keyword, the field gets a trailing "_" but the getter remains normal. In JASN1Utils adjusted logic when looking for the getter.
Added support for inherited types. OctetStrings are converted to Strings instead of byte arrays.
Service takes care of the compilation of the ASN files. Test sources are generated and removed from source control.

NIFI-6752 Removed obsolete TODOs.

NIFI-6752 Updated nifi-asn1-nar version to 1.13.0-SNAPSHOT. Fixed checkstyle violations (unused imports).

NIFI-6752 ASN.1 reader - ASN.1 bundle requires 'include-asn1' profile to be active to be part of assembly.

NIFI-6752 ASN.1 reader - Updated ASN1.xml template.

NIFI-6752 ASN.1 reader - Updated versions.

NIFI-6752 ASN.1 reader - Update example generator. Updated ASN1.xml template. Updated (fixed) nifi-asn1-nar version in pom.xml.

NIFI-6752 ASN.1 reader - Added missing license for ASN1.xml.

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #4577
2021-02-25 12:58:05 -05:00
Joe Witt 88fab00e29
NIFI-7873 merging release branch to latest and updating to 1.14.0-SNAPSHOT 2021-02-15 12:09:32 -07:00
Joe Witt 4afb2ba743
NIFI-7873-RC4 prepare for next development iteration 2021-02-15 12:09:31 -07:00
Joe Witt 487280bee9
NIFI-7873-RC4 prepare release nifi-1.13.0-RC4 2021-02-15 12:09:30 -07:00
Peter Turcsanyi 67d06003b7 NIFI-8023: Convert java.sql.Date between UTC/local time zone normalized forms before/after database operations
This closes #4781

Signed-off-by: David Handermann <exceptionfactory@gmail.com>
2021-01-26 14:39:02 -06:00
Denes Arvay 83948db989
NIFI-7996 Conversion with ConvertRecord to avro results in invalid date
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #4666.
2020-11-16 17:40:40 +01:00
Pierre Villard 14ec02f21d
NIFI-7981 - add support for enum type in avro schema
This closes #4648

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
2020-11-05 18:19:55 -05:00
Bryan Bende d773521ee0
NIFI-1121 Fix Schema Name and Schema Branch properties
This closes #4512.

Signed-off-by: Bryan Bende <bbende@apache.org>
2020-11-03 15:38:41 -05:00
Mark Payne 4b9014b959
NIFI-1121: Updated backend to perform appropriate validation. Added tests. Updated documentation writer. Updated dev guide to explain how PropertyDescriptor.Builder#dependsOn affects validation. Updated JavaDocs for PropertyDescriptor.Builder#dependsOn
Signed-off-by: Bryan Bende <bbende@apache.org>
2020-11-03 15:37:42 -05:00
Mark Payne f7f336a4b0
NIFI-1121: Added API changes for having one Property depend on another
Signed-off-by: Bryan Bende <bbende@apache.org>
2020-11-03 15:37:08 -05:00
Denes Arvay d05d0c6240 NIFI-7925: ValidateRecord reports false positive for avro arrays with null elements 2020-10-21 09:17:29 -04:00
Denes Arvay f73a019f36 NIFI-7843 Recursive avro schemas fail to write with RecordWriter
NIFI-7843 Recursive avro schemas fail to write with RecordWriter
Add new test case to TestSimpleRecordSchema to test the scenario
when schema name and schema namespace match.

This closes #4550.

Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
2020-09-28 19:52:41 +02:00
Joe Witt 8baa5c9940
NIFI-7692 updating for next dev release 1.13.0 2020-08-18 14:48:02 -07:00
Joe Witt fb57bcbc11
NIFI-7692-RC1 prepare for next development iteration 2020-08-13 09:20:39 -07:00
Joe Witt 303d6c59ba
NIFI-7692-RC1 prepare release nifi-1.12.0-RC1 2020-08-13 09:20:36 -07:00
Bence Simon 5c2bfcf7d3 NIFI-7369 Adding decimal support for record handling in order to avoid missing precision when reading in records
Signed-off-by: Mark Payne <markap14@hotmail.com>
2020-06-02 15:13:14 -04:00
Bence Simon 66b175f405
NIFI-7390 Covering Avro type conversion in case of map withing Record
Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #4256
2020-05-08 13:11:19 -04:00
Bryan Bende 2feeb57159
NIFI-7221 Support v2 and v3 protocol version for Hortonworks Schema Registry
- Update nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/serialization/SchemaRegistryRecordSetWriter.java
- Addressing review feedback

This closes #4120.
2020-03-20 10:56:47 -04:00
Joe Witt f694e6464f NIFI-7187 adding missing version strings from accumulo bundle pom
- Removed Cat X JSON.org dep inclusion which seems to not be necessary
- Updated a ton of easier/safer looking deps
- Updated tika due to CVE

This closes #4086

Signed-off-by: Mark Payne <markap14@hotmail.com>
2020-03-20 10:07:56 -04:00
Matthew Burgess 798a8eeb50
NIFI-7249: Force String keys in maps in DataTypeUtils.inferDataType()
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #4139.
2020-03-13 17:20:45 +01:00
Joe Witt 3de77ebacc
NIFI-7021-RC3 prepare for next development iteration 2020-01-19 14:14:40 -05:00
Joe Witt 633408bce7
NIFI-7021-RC3 prepare release nifi-1.11.0-RC3 2020-01-19 14:14:38 -05:00
Matthew Burgess f4ef45678e
NIFI-6963: Fix ValidateRecord handling of missing required array fields
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #3948.
2020-01-08 02:02:04 +07:00
Joe Witt f8c3d877cf
NIFI-6733 updating to next release version for master branch 2019-11-04 13:31:39 -05:00
Joe Witt 418179f5b2
NIFI-6733-RC3 prepare for next development iteration 2019-10-28 15:13:13 -07:00
Joe Witt b217ae20ad
NIFI-6733-RC3 prepare release nifi-1.10.0-RC3 2019-10-28 15:12:57 -07:00
Tamas Palfy 34112519c2 NIFI-6640 - UNION/CHOICE types not handled correctly
3 important changes:
1. FieldTypeInference had a bug when dealing with multiple datatypes for
 the same field where some (but not all) were in a wider-than-the-other
 relationship.
 Before: Some datatypes could be lost. String was wider than any other.
 After: Consistent behaviour. String is NOT wider than any other.
2. Choosing a datatype for a value from a ChoiceDataType:
 Before it chose the first compatible datatype as the basis of conversion.
 After change it tries to find the most suitable datatype.
3. Conversion of a value of avro union type:
 Before it chose the first compatible datatype as the basis of conversion.
 After change it tries to find the most suitable datatype.

Change: In the RecordFieldType enum moved TIMESTAMP ahead of DATE.

This closes #3724.

Signed-off-by: Mark Payne <markap14@hotmail.com>
2019-09-19 11:02:30 -04:00
Bryan Bende 76c2c3fee2
NIFI-6089 Add Parquet record reader and writer
NIFI-5755 Allow PutParquet processor to specify avro write configuration
Review feedback
Additional review feedback

This closes #3679

Signed-off-by: Mike Thomsen <mthomsen@apache.org>
2019-09-04 08:52:17 -04:00
Bryan Bende cc88dd428f
NIFI-6158 Fix conversion of Avro fixed type with logicalType decimal
This closes #3665.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
2019-08-26 10:57:28 +09:00
archon 3bfef3356e
NIFI-6442: ExecuteSQL/ExecuteSQLRecord convert to Avro date type incorrectly when set 'Use Avro Logical Types' to true
This closes #3584.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
2019-07-19 14:47:07 +09:00
Andy LoPresto e6c843f465
NIFI-6323 Changed URLs for repositories, project description, and mailing lists to use HTTPS.
NIFI-6323 Changed URLs for splunk.artifactoryonline.com to use HTTPS (certificate validity warning in browsers, but command-line connection using openssl s_client is successful).
NIFI-6323 Changed URLs for XMLNS schema locations to use HTTPS (the XMLNS and schema identifier remain http:// because they are not designed to be resolvable).
NIFI-6323 Fixed Maven XML schema descriptor URLs.

This closes #3497
2019-05-29 14:36:40 -04:00
Tomer Ghelber ba642485cc NIFI-5933 Stopped using sun's IOUtils. Using apahce's instead.
NIFI-5933 Removed unused file with sun.* import
NIFI-5933 Removed unsafe of sun.*.
NIFI-5933 Added openjdk8 to test

This closes #3234

Signed-off-by: Mike Thomsen <mikerthomsen@gmail.com>
2019-05-07 21:58:16 -04:00
Dustin Rodrigues 161e4b5763 NIFI-6231 - fix source code permissions to be non-executable
This closes #3449.

Signed-off-by: Koji Kawamura <ijokarumawak@apache.org>
2019-04-22 10:13:07 +09:00
Matthew Burgess 32bd7ed8b4 NIFI-6062: Add support for BLOB, CLOB, NCLOB in record handling
This closes #3329

Signed-off-by: Mike Thomsen <mikerthomsen@gmail.com>
2019-02-28 08:59:37 -05:00
joewitt 0e204f3576
NIFI-6029-RC2 prepare for next development iteration 2019-02-16 21:50:35 -05:00
joewitt 45bb53d2aa
NIFI-6029-RC2 prepare release nifi-1.9.0-RC2 2019-02-16 21:50:15 -05:00
Mark Payne 82f44155f6
NIFI-6044: This closes #3314. Retain the input data's order in the CSV Reader's inferred schema
Signed-off-by: joewitt <joewitt@apache.org>
2019-02-16 20:57:42 -05:00
Alex Savitsky e7ae97797e NIFI-5943 support conversions from List to Avro ARRAY and from Map to Avro RECORD
NIFI-5943 Added another unit test to verify list + map conversion to list of records. (Mike Thomsen)

This closes #3267

Signed-off-by: Mike Thomsen <mikerthomsen@gmail.com>
2019-02-14 08:31:42 -05:00
Mark Payne 36c0a99e91 NIFI-5938: Added ability to infer record schema on read from JsonTreeReader, JsonPathReader, XML Reader, and CSV Reader.
- Updates to make UpdateRecord and RecordPath automatically update Record schema when performing update and perform the updates on the first record in UpdateRecord before obtaining Writer Schema. This allows the Writer to  to inherit the Schema of the updated Record instead of the Schema of the Record as it was when it was read.
 - Updated JoltTransformRecord so that schema is inferred on the first transformed object before passing the schema to the Record Writer, so that if writer inherits schema from record, the schema that is inherited is the trans transformed schema
 - Updated LookupRecord to allow for Record fields to be arbitrarily added
 - Implemented ContentClaimInputStream
 - Added controller service for caching schemas
 - UpdatedQueryRecord to cache schemas automatically up to some number of schemas, which will significantly inprove throughput in many cases, especially with inferred schemas.

NIFI-5938: Updated AvroTypeUtil so that if creating an Avro Schema using a field name that is not valid for Avro, it creates a Schema that uses a different, valid field name and adds an alias for the given field name so that the fields still are looked up appropriately. Fixed a bug in finding the appropriate Avro field when aliases are used. Updated ContentClaimInputStream so that if mark() is called followed by multiple calls to reset(), that each reset() call is successful instead of failing after the first one (the JavaDoc for InputStream appears to indicate that the InputStream is free to do either and in fact the InputStream is even free to allow reset() to reset to the beginning of file if mark() is not even called, if it chooses to do so instead of requiring a call to mark()).

NIFI-5938: Added another unit test for AvroTypeUtil

NIFI-5938: If using inferred schema in CSV Reader, do not consider first record as a header line. Also addressed a bug in StandardConfigurationContext that was exposed by CSVReader, in which calling getProperty(PropertyDescriptor) did not properly lookup the canonical representation of the Property Descriptor from the component before attempting to get a default value

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #3253
2019-02-11 12:56:50 -05:00
gkkorir 023f0c41ce NIFI-5662 - Support for generic fixed when using decimal logical type
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #3175.
2018-11-17 16:36:04 +01:00
Arek Burdach 765df67817 NIFI-5757 Using Caffeine instead of slow synchronization on LinkedHashMap for caches - mainly avro schema caches
This closes #3111.

Signed-off-by: Mark Payne <markap14@hotmail.com>
2018-11-09 14:50:24 -05:00
Ed B 2812fe60a2 NIFI-5728 - XML Writer to populate record tag name properly
Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #3098.
2018-11-02 09:46:45 +01:00
Jeff Storck c0182294ed NIFI-5720-RC3 prepare for next development iteration 2018-10-22 22:16:43 -04:00
Jeff Storck 98aabf2c50 NIFI-5720-RC3 prepare release nifi-1.8.0-RC3 2018-10-22 22:16:23 -04:00
Mark Payne c425bd2880 NIFI-5533: Be more efficient with heap utilization
- Updated FlowFile Repo / Write Ahead Log so that any update that writes more than 1 MB of data is written to a file inside the FlowFile Repo rather than being buffered in memory
 - Update SplitText so that it does not hold FlowFiles that are not the latest version in heap. Doing them from being garbage collected, so while the Process Session is holding the latest version of the FlowFile, SplitText is holding an older version, and this results in two copies of the same FlowFile object

NIFI-5533: Checkpoint

NIFI-5533: Bug Fixes

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #2974
2018-10-09 09:18:02 -04:00