-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splunk_Hec_Logs Sink Error: "Service call failed. No retries or retries exhausted. / Events Dropped" error messages #20338
Comments
Thanks for filing this @csongpaxos . Normally there is another |
Hi @jszwedko, no we have tried the "debug" and "trace" log levels, enabling internal metrics / internal logs, and tweaking buffering/batching settings, but haven't been able to get any additional errors indicating the "real" error. Just stuck now with how to proceed / further debug without anything to work with Not sure if you have seen this before with the splunk sink specifically when log volume is high? |
I have same error on nginx log->vector->vector->clickhouse
here is error msg:
here is the vector1 and vector2 config
|
fixed by adding following
|
@jszwedko.
Here's a link to the error in trace logs |
Adding the
to my Splunk sink appears to not make a difference - still seeing the same service call failed / retries exhausted error with no additional errors.
|
Adding more info in case it's helpful: I was seeing this error with our setup, which is almost identical to OP... k8s pods → DataDog agent → Vector → Splunk HEC. I was seeing some events flow into splunk, but wasn't able to figure out any kind of pattern for the errors. In playing around with the settings, the error has disappeared when I disabled acknowledgements on the sink: ...
splunk_eks:
type: splunk_hec_logs
endpoint: "${SPLUNK_CLOUD_HTTP_ENDPOINT}"
default_token: "${SPLUNK_CLOUD_TOKEN}"
acknowledgements:
enabled: false
indexer_acknowledgements_enabled: false
... perhaps the service call is related to the acknowledgement piece? Best I can tell, our volume of events is the same if ACKs are on or off... so either the events were always getting there (and the error is on the ACK), or they were never getting there. To add another data point, we're only seeing this error on 1 of our 7 clusters. Vector is set up identically everywhere, with the only difference being the |
A note for the community
Problem
We are trying to send our
kubernetes_logs
into Splunk via thesplunk_hec_logs
sink. Some of these logs are sending correctly and arriving in Splunk. However, we are seeing pod logs with error messages related to a service call failing / no retries or retries exhausted error, and events being dropped when sending to the Splunk HEC endpoint.error message:
There are no other error messages in the pod logs prior to this one that could give us a clue as to why the service call keeps on failing. From the Splunk side of things, there are no errors for this HEC token, so there are no clues there. We have hit a wall with debugging since there are actually no logs in any of the pods, even with log level set to DEBUG / TRACE, and with the RUST_BACKTRACE set to full.
We would like to see how else to troubleshoot this and if it's a known problem with the Splunk sink. Other things we've tried are increasing batch / buffer size, retry timeout, ack timeout, etc. but none of these config settings appear to resolve the problem.
Configuration
Version
0.37.1-debian
Debug Output
Example Data
No response
Additional Context
Vector is running in our EKS cluster on AWS. The s3 sink works fine with no errors, but the splunk sink is showing periodic errors with no additional detailed messages. I've reached out in the Vector discord channel and a developer mentioned "I would have expected to see another error before the retries exhusted error" - however we are not seeing anything to work with in debugging.
References
No response
The text was updated successfully, but these errors were encountered: