mirror of https://github.com/apache/nifi.git
fd92999daf
Currently, NiFi Kafka consumer processors have following issue. While downstream connections are full, ConsumeKafka is not scheduled to run onTrigger. It stopps executing poll to tell Kafka server that this client is alive. Thus, after a while in that situation, Kafka server rebalances the client. When downstream connections back to normal, although ConsumeKafka is scheduled again, the client is no longer a part of a consumer group. If this happens, Kafka client succeeds polling messages when ConsumeKafka processor resumes, but fails to commit offset. Received messages are already committed into NiFi flow, but since consumer offset is not updated, those will be consumed again, duplicated. In order to address above issue: - For ConsumeKafka_0_10, use latest client library Above issue has been addressed by KIP-62. The latest Kafka consumer poll checks if the client instance is still valid, and rejoin the group if not, before consuming messages. - For ConsumeKafka (0.9), added manual retention logic using pause/resume Kafka client 0.9 doesn't have background thread heartbeat, so similar machanism is added manually. Use Kafka pause/resume consumer API to tell Kafka server that the client stops consuming messages but is still alive. Another internal thread is used to perform paused poll periodically based on the time passed since the last onTrigger(poll) is executed. This closes #1527. Signed-off-by: Bryan Bende <bbende@apache.org> |
||
---|---|---|
.. | ||
src | ||
pom.xml |