Commit Graph

7 Commits

Author SHA1 Message Date
Mark Payne fd068fe978
NIFI-7557: uses a canonical representation of strings when recovering data from FlowFile Repository in order to avoid using huge amounts of heap when not necessary
- Fixed some problems with unit/integration tests

This closes #4507.

Signed-off-by: Bryan Bende <bbende@apache.org>
2020-09-03 10:21:50 -04:00
Mark Payne c425bd2880 NIFI-5533: Be more efficient with heap utilization
- Updated FlowFile Repo / Write Ahead Log so that any update that writes more than 1 MB of data is written to a file inside the FlowFile Repo rather than being buffered in memory
 - Update SplitText so that it does not hold FlowFiles that are not the latest version in heap. Doing them from being garbage collected, so while the Process Session is holding the latest version of the FlowFile, SplitText is holding an older version, and this results in two copies of the same FlowFile object

NIFI-5533: Checkpoint

NIFI-5533: Bug Fixes

Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #2974
2018-10-09 09:18:02 -04:00
Mark Payne 9f95a10df9 NIFI-4794: Updated event writers to avoid creating a lot of byte[] by reusing buffers. Also removed synchronization on EventWriter when rolling over the writer and just moved the writing of the header to happen before making the writer available to any other threads. This reduces thread contention during rollover.
Signed-off-by: Matthew Burgess <mattyb149@apache.org>

This closes #2437
2018-02-19 09:31:11 -05:00
Mark Payne 96ed405d70 NIFI-3356: Initial implementation of writeahead provenance repository
- The idea behind NIFI-3356 was to improve the efficiency and throughput of the Provenance Repository, as it is often the bottleneck. While testing the newly designed repository,
  a handful of other, fairly minor, changes were made to improve efficiency as well, as these came to light when testing the new repository:

- Use a BufferedOutputStream within StandardProcessSession (via a ClaimCache abstraction) in order to avoid continually writing to FileOutputStream when writing many small FlowFiles
- Updated threading model of MinimalLockingWriteAheadLog - now performs serialization outside of lock and writes to a 'synchronized' OutputStream
- Change minimum scheduling period for components from 30 microseconds to 1 nanosecond. ScheduledExecutor is very inconsistent with timing of task scheduling. With the bored.yield.duration
  now present, this value doesn't need to be set to 30 microseconds. This was originally done to avoid processors that had no work from dominating the CPU. However, now that we will yield
  when processors have no work, this results in slowing down processors that are able to perform work.
- Allow nifi.properties to specify multiple directories for FlowFile Repository
- If backpressure is engaged while running a batch of sessions, then stop batch processing earlier. This helps FlowFiles to move through the system much more smoothly instead of the
  herky-jerky queuing that we previously saw at very high rates of FlowFiles.
- Added NiFi PID to log message when starting nifi. This was simply an update to the log message that provides helpful information.

NIFI-3356: Fixed bug in ContentClaimWriteCache that resulted in data corruption and fixed bug in RepositoryConfiguration that threw exception if cache warm duration was set to empty string

NIFI-3356: Fixed NPE

NIFI-3356: Added debug-level performance monitoring

NIFI-3356: Updates to unit tests that failed after rebasing against master

NIFI-3356: Incorporated PR review feedback

NIFI-3356: Fixed bug where we would delete index directories that are still in use; also added additional debug logging and a simple util class that can be used to textualize provenance event files - useful in debugging

This closes #1493
2017-02-22 12:40:06 -05:00
Joe Skora 41ad032151 NIFI-3055 StandardRecordWriter Can Throw UTFDataFormatException (1.x)
* Remove function based on JDK source.
* Add new function to find bytes based on RFC3629.
* Add field name to log entry when field is truncated.

Signed-off-by: Mike Moser <mosermw@apache.org>
This closes #1475
2017-02-13 20:15:59 +00:00
Joe Skora 376af83a3d NIFI-3055 StandardRecordWriter Can Throw UTFDataFormatException
* Updated StandardRecordWriter, even though it is now deprecated to consider the encoding behavior of java.io.DataOutputStream.writeUTF() and truncate string values such that the UTF representation will not be longer than that DataOutputStream's 64K UTF format limit.
* Updated the new SchemaRecordWriter class to similarly truncate long Strings that will be written as UTF.
* Add tests to confirm handling of large UTF strings and various edge conditions of UTF string handling.

Signed-off-by: Mike Moser <mosermw@apache.org>

This closes #1469.
2017-02-03 20:52:32 +00:00
Mark Payne 1be0871473 NIFI-2854: Refactor repositories and swap files to use schema-based serialization so that nifi can be rolled back to a previous version after an upgrade.
NIFI-2854: Incorporated PR review feedback

NIFI-2854: Implemented feedback from PR Review

NIFI-2854: Ensure that all resources are closed on CompressableRecordReader.close() even if an IOException is thrown when closing one of them

This closes #1202
2016-11-18 14:53:13 -05:00