* NIFI-11443 Routed Python Framework Logging to SLF4J
- Changed Python logging to use standard output stream
- Adjusted Python logging format for simplified processing
- Updated PythonProcess to pipe standard error and standard output streams to reader
- Added Log Reader command with Virtual Thread for each Python Process
- Removed Python log properties from NiFi Properties configuration
Fixed issue in NiFiPythonGateway that stems from the fact that the thread adding an object to the JavaObjectBindings was not necessarily the thread removing them. The algorithm that was in place assumed that the same thread would be used, in order to ensure that an object could be unbound before being accessed. The new algorithm binds each new object to all active method invocations and only unbinds the objects after all method invocations complete, regardless of thread. Additionally, found that many method calls could create new proxies on the Python side, just for getter methods whose values don't change. This is very expensive, so introduced a new @Idempotent annotation that can be added to interface methods such that we can cache the value and avoid the expensive overhead.
This closes#8456
Signed-off-by: David Handermann <exceptionfactory@apache.org>
- Moved StandardValidators to nifi-api
- Moved URL creation method from UriUtils to URLValidator
- Separated FormatUtils into FormatUtils and DurationFormat classes
- Added DurationFormat to nifi-api
This closes#8442
Signed-off-by: David Handermann <exceptionfactory@apache.org>
Refactored so that when a Python Process dies, NiFi will detect this, respawn the process, recreate the Processors that exist in the process, re-initialize them, and restart them. In testing, found the PythonControllerInteractionIT had bugs that were causing Python Processors to be re-initialized many times; this resulted in threading issues that caused processors to be invalid, indicating that property descriptors didn't exist, etc. Addressed these concerns in the same commit, since they were necessary to properly run tests
Ensure that ClassLoader is consistently established for python processor proxies; ensure that if we re-initialize python processor the old initialization thread is stopped
This closes#8363
Signed-off-by: David Handermann <exceptionfactory@apache.org>
* NIFI-12739 Import ProcessPoolExecutor to fix bug in python 3.9+ that causes
Exceptions to be raised incorrectly in multi-threaded applications
(https://bugs.python.org/issue42647)
* Removed extraneous whitespace.
Fixed a threading bug where Java makes a call to Python, Python create a Java object to return (ArrayList), Python returns the object and then cleans it up, notifying Java to remove it from 'bound objects', and then Java process the response and returns a null object because it's no longer bound
This closes#8356
Signed-off-by: David Handermann <exceptionfactory@apache.org>
Fixed bug that caused custom Relationships not to work on Python Processors. Added unit test to verify. Also addressed issue in PythonControllerInteractionIT where it did not wait for Processors to become valid (originally this wasn't necessary but when we refactored Processors to initialize in the background this was overlooked).
This closes#8316
Signed-off-by: David Handermann <exceptionfactory@apache.org>
valid because it had cached property descriptors before the processor was fully initialized, so updated code to ensure that we do not cache values before initialization is completed.
This closes#8315
Signed-off-by: David Handermann <exceptionfactory@apache.org>
- Added some Use Case docs for Python processors and updated Runtime Manifests to include Python based processors as well as Use Case/MultiProcessorUseCase documentation elements. Refactored/cleaned up some of the Python code and added unit tests.
- Added python-unit-tests profile and enabled on Ubuntu and macOS GitHub workflows
This closes#8253
Signed-off-by: David Handermann <exceptionfactory@apache.org>
- Updated StandardPythonBridge.onProcessorRemoved to avoid throwing exceptions when called with a Processor Type and Version that is not registered
- Updated system-tests workflow to run on changes in the nifi-py4j-bundle
* NIFI-12310 Validated Python Component Type for Processes
- Updated StandardPythonBridge to validate requested Component Type and Version against available Python Processors
Created new python processors for text embeddings, inserting into Chroma, querying Chroma, querying ChatGPT, inserting into and querying Pinecone. Fixed some bugs in the Python framework. Added Python extensions to assembly. Also added ability to load dependencies from a requirements.txt as that was important for making the different vectorstore implementations play more nicely together.
Excluded nifi-python-extensions-bundle from GitHub build because it requires Maven to use unpack-resources goal, which will not work in GitHub because it uses mvn compile instead of mvn install
- ParseDocument
- ChunkDocument
- PromptChatGPT
- PutChroma
- PutPinecone
- QueryChroma
- QueryPinecone
NIFI-12195 Added support for requirements.txt to define Python dependencies
This closes#7894
Signed-off-by: David Handermann <exceptionfactory@apache.org>
The py4j.url property allows the URL for downloading py4j to be specified by a Maven command-line option
This closes#7946
Signed-off-by: David Handermann <exceptionfactory@apache.org>
Added documentation to indicate how to debug Python side of nifi framework, as well as debugging Python processors themselves using VSCode's Remote debugger.
This also provides the ability to launch the Controller process in such a way that it will listen to incoming remote debug connections.
This closes#7469
Signed-off-by: David Handermann <exceptionfactory@apache.org>
- Updated GitHub workflow so that system tests include Python 3.9
- Updated GitHub actions to build necessary modules for system tests
This closes#7003
Co-authored-by: David Handermann <exceptionfactory@apache.org>
Signed-off-by: David Handermann <exceptionfactory@apache.org>