Added unit test that is ignored so that it can be manually run for testing performance before/after changes to AvroTypeUtil. Updated AvroTypeUtil to be more efficient by not using Record.getValue() and instead iterating over the Map of values directly. getValue() is less efficient here because we know the RecordField's we are iterating over exist in the schema since they are retrieved from there directly; as a result, any null values still have be looked up by aliaases, but that step can be skipped in this situation. Also avoided looking for fields that exist in Avro Schema and not in RecordSchema just to set default values on GenericRecord - there's no need to set them if they are default values.
This closes#5080
Signed-off-by: David Handermann <exceptionfactory@apache.org>
Also allow to write them as such (byte-arrays) - again, instead of throwing an exception.
NIFI-8439 Fixed unit tests.
NIFI-8439 Allow writing parquet INT96 timestamps if they were read by the same parquet-avro library.
This closes#5006.
Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
- Upgraded direct dependencies from 2.6 to 2.8.0
- Added dependency management configuration to use 2.8.0 for some modules
- Updated scripted Groovy tests to avoid copying unnecessary files
Signed-off-by: Matthew Burgess <mattyb149@apache.org>
This closes#5073
- AbstractJsonRowRecordReader - Handle (meaning log a warning and not fail completely) multi-array CHOICE type when data has extra fields (not defined by the schema) and can't determine correct type.
- AvroTypeUtil - Allow multiple different record types in avro union type. Minor refactors. Added documentation fro EqualsWrapper.
NIFI-6752 Refactored type and value conversion logic. Added support for more types. Added more tests.
Removed 'parent' from 'Recursive'. (Caused issues. The recursive nature is still there as it has a child with the same type).
Updated jasn1 1.11.2 to asn1bean 1.12.0. If an asn field name is a Java reserved keyword, the field gets a trailing "_" but the getter remains normal. In JASN1Utils adjusted logic when looking for the getter.
Added support for inherited types. OctetStrings are converted to Strings instead of byte arrays.
Service takes care of the compilation of the ASN files. Test sources are generated and removed from source control.
NIFI-6752 Removed obsolete TODOs.
NIFI-6752 Updated nifi-asn1-nar version to 1.13.0-SNAPSHOT. Fixed checkstyle violations (unused imports).
NIFI-6752 ASN.1 reader - ASN.1 bundle requires 'include-asn1' profile to be active to be part of assembly.
NIFI-6752 ASN.1 reader - Updated ASN1.xml template.
NIFI-6752 ASN.1 reader - Updated versions.
NIFI-6752 ASN.1 reader - Update example generator. Updated ASN1.xml template. Updated (fixed) nifi-asn1-nar version in pom.xml.
NIFI-6752 ASN.1 reader - Added missing license for ASN1.xml.
Signed-off-by: Matthew Burgess <mattyb149@apache.org>
This closes#4577
NIFI-7843 Recursive avro schemas fail to write with RecordWriter
Add new test case to TestSimpleRecordSchema to test the scenario
when schema name and schema namespace match.
This closes#4550.
Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>
- Removed Cat X JSON.org dep inclusion which seems to not be necessary
- Updated a ton of easier/safer looking deps
- Updated tika due to CVE
This closes#4086
Signed-off-by: Mark Payne <markap14@hotmail.com>
3 important changes:
1. FieldTypeInference had a bug when dealing with multiple datatypes for
the same field where some (but not all) were in a wider-than-the-other
relationship.
Before: Some datatypes could be lost. String was wider than any other.
After: Consistent behaviour. String is NOT wider than any other.
2. Choosing a datatype for a value from a ChoiceDataType:
Before it chose the first compatible datatype as the basis of conversion.
After change it tries to find the most suitable datatype.
3. Conversion of a value of avro union type:
Before it chose the first compatible datatype as the basis of conversion.
After change it tries to find the most suitable datatype.
Change: In the RecordFieldType enum moved TIMESTAMP ahead of DATE.
This closes#3724.
Signed-off-by: Mark Payne <markap14@hotmail.com>
NIFI-6323 Changed URLs for splunk.artifactoryonline.com to use HTTPS (certificate validity warning in browsers, but command-line connection using openssl s_client is successful).
NIFI-6323 Changed URLs for XMLNS schema locations to use HTTPS (the XMLNS and schema identifier remain http:// because they are not designed to be resolvable).
NIFI-6323 Fixed Maven XML schema descriptor URLs.
This closes#3497
NIFI-5933 Removed unused file with sun.* import
NIFI-5933 Removed unsafe of sun.*.
NIFI-5933 Added openjdk8 to test
This closes#3234
Signed-off-by: Mike Thomsen <mikerthomsen@gmail.com>
NIFI-5943 Added another unit test to verify list + map conversion to list of records. (Mike Thomsen)
This closes#3267
Signed-off-by: Mike Thomsen <mikerthomsen@gmail.com>
- Updates to make UpdateRecord and RecordPath automatically update Record schema when performing update and perform the updates on the first record in UpdateRecord before obtaining Writer Schema. This allows the Writer to to inherit the Schema of the updated Record instead of the Schema of the Record as it was when it was read.
- Updated JoltTransformRecord so that schema is inferred on the first transformed object before passing the schema to the Record Writer, so that if writer inherits schema from record, the schema that is inherited is the trans transformed schema
- Updated LookupRecord to allow for Record fields to be arbitrarily added
- Implemented ContentClaimInputStream
- Added controller service for caching schemas
- UpdatedQueryRecord to cache schemas automatically up to some number of schemas, which will significantly inprove throughput in many cases, especially with inferred schemas.
NIFI-5938: Updated AvroTypeUtil so that if creating an Avro Schema using a field name that is not valid for Avro, it creates a Schema that uses a different, valid field name and adds an alias for the given field name so that the fields still are looked up appropriately. Fixed a bug in finding the appropriate Avro field when aliases are used. Updated ContentClaimInputStream so that if mark() is called followed by multiple calls to reset(), that each reset() call is successful instead of failing after the first one (the JavaDoc for InputStream appears to indicate that the InputStream is free to do either and in fact the InputStream is even free to allow reset() to reset to the beginning of file if mark() is not even called, if it chooses to do so instead of requiring a call to mark()).
NIFI-5938: Added another unit test for AvroTypeUtil
NIFI-5938: If using inferred schema in CSV Reader, do not consider first record as a header line. Also addressed a bug in StandardConfigurationContext that was exposed by CSVReader, in which calling getProperty(PropertyDescriptor) did not properly lookup the canonical representation of the Property Descriptor from the component before attempting to get a default value
Signed-off-by: Matthew Burgess <mattyb149@apache.org>
This closes#3253
- Updated FlowFile Repo / Write Ahead Log so that any update that writes more than 1 MB of data is written to a file inside the FlowFile Repo rather than being buffered in memory
- Update SplitText so that it does not hold FlowFiles that are not the latest version in heap. Doing them from being garbage collected, so while the Process Session is holding the latest version of the FlowFile, SplitText is holding an older version, and this results in two copies of the same FlowFile object
NIFI-5533: Checkpoint
NIFI-5533: Bug Fixes
Signed-off-by: Matthew Burgess <mattyb149@apache.org>
This closes#2974