-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
Issue:
We observer random timeouts of the consumer. We have contacted confluent on that - and they do not see anything wrong on broker side. CPU/MEM of the consumer is not high and limited during those events. The only correlation we observe is when we stat a lot (50) consumers per pod - it can happen more often then. Increasing timeout is rather a not-go as it is already 5m. It is not corellating with high load.
Sample logs
warn | Jul 24 20:35:17.924 | i-094c46e288d48e5ce | aidp-api | Marking the coordinator dead (node coordinator-5) for group aidp-chat_completion-request-consumer: [Error 7] RequestTimedOutError: Request timed out after 305000.0 ms.
-- | -- | -- | -- | --
error | Jul 24 20:35:17.924 | i-094c46e288d48e5ce | aidp-api | Error sending JoinGroupRequest_v4 to node coordinator-5 [[Error 7] RequestTimedOutError: Request timed out after 305000.0 ms]
error | Jul 24 20:35:17.924 | i-094c46e288d48e5ce | aidp-api | [IPv4 ('34.211.165.150', 9092)]>: Closing connection. [Error 7] RequestTimedOutError: Request timed out after 305000.0 ms
warn | Jul 24 20:35:17.924 | i-094c46e288d48e5ce | aidp-api | [IPv4 ('34.211.165.150', 9092)]> timed out after 305000.0 ms. Closing connection.
- Stack
- kafka-pytho: 2.2.9
- python 3.12
- agains confluent cloud
- multiple consumers in separate threads - not sharing consumer instances, every thread has a separate consumer
- 200 partitions on the topic
- up to 50 listener threads
- poll interface with timeout
- consumer config:
Creating Kafka consumer for topic: pwell_dev_us-west-2_aidp-chat_completion-result with config: {'client.id': 'aidp-sdk', 'bootstrap.servers': '', 'sasl.plain.username': '', 'sasl.plain.password': '', 'enable.auto.commit': False, 'partition.assignment.strategy': [<class 'kafka.coordinator.assignors.sticky.sticky_assignor.StickyPartitionAssignor'>], 'enable.incremental.fetch.sessions': False, 'api.version.auto.timeout.ms': 60000} and extra config: {'group.id': 'aidp-chat_completion-result-consumer', 'auto.offset.reset': 'earliest', 'enable.auto.commit': False, 'key.deserializer': <function CallbackContainer.<lambda> at 0x7f5367a41c60>, 'value.deserializer': <function CallbackContainer.<lambda> at 0x7f5367a41f80>, 'max.poll.records': 150, 'max.poll.interval.ms': 30000000, 'max.partition.fetch.bytes': 100}
It is hard to provide more details - if i could enable some debuging i could try, pls provide instructions for it. When that happens we need to restart pods to make things work.
Metadata
Metadata
Assignees
Labels
No labels