-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conceptual issue with input.exec (interval vs timeout due to execution time) #2629
Comments
If the output takes 0.1s, then the binary runs for 0.1s. The timeout could be any number larger than this. I would time it out every 5s if you are targeting a 10s interval. |
@danielnelson The tool takes 10 seconds to gather data. After gather it only takes 0.1s to process the data and dump it out. Also, see #2087 which would allow you to run a process which just stays running indefinitely, and emits whenever it wants to. Sounds exactly like what is being asked for. |
The conceptual thing I'm doing wrong is, that I want to do some pre-processing instead of simply dumping raw data like system counters into the TICK stack. This pre-processing takes time (0.1s) and is based on data I first have to gather (10s interval). So @danielnelson the processing takes 0.1s but the execution time of my binary (will call it adaptor now) takes (I will repeat myself) 10.1s. The question towards the telegraf developers is imho, if such pre-processing tasks are allowed or should not happen at all - since the raw data can be post-processed via InfluxDB. If telegraf want's to support such adaptors with pre-processing, I see two approaches:
|
On the topic of if you should preprocess data, Telegraf doesn't really have a opinion. It is your data and you know it best. That said, if you can do the queries in InfluxDB it might be more flexible for querying, we usually favor this when writing input plugins. We have some support for processors and aggregators, but there are very few currently implemented, and they run against all collected points. Another great place to perform this type of action is in Kapacitor. You can position Kapacitor either before or after InfluxDB and it can perform advanced processing. Otherwise, maybe you could split your executable into two stages, and pass information from the collector stage to the processor stage via file or other persistent storage. I don't think we want to implement overlapping executions. #2087 sounds good but of course it's not implemented. |
I will have a look at Kapacitor. Anyway, my binary should not be limited to the TICK stack, why I personally think a pre-processing inside an adaptor executed with input.exec should be supported. Having a two staged execution will require a persisted, shared state between runs (e.g. on a file system) - something I wanted to avoid, to keep the logic simple and the required sources low. Could you @danielnelson please elaborate a bit more on the statement "I don't think we want to implement overlapping executions."? Why do you think overlapping executions are not useful? |
While I can't speak for @danielnelson (and I'm not a project member, so take my opinion with a grain of salt), I would agree with the idea that we shouldn't support overlapping runs. I'll try to explain my thoughts, but it's kinda hard to articulate. No plugin currently supports it, and just conceptually I think it would be a bad idea. The only use case I can think for it is something like this, where the plugin has finished measuring, but takes some time to process data. But telegraf can't know that. You'd essentially be saying it's OK for telegraf to gather metrics for the same period twice (even though that's not what this specific example is doing). I think in this specific example, a long running execution is better. It's entirely possible that due to simple CPU scheduling jitter, each time the external app is launched, it starts gathering data just slightly before, or just slightly after the 10s mark. This would result in a tiny overlap, or a tiny gap. To properly solve this the plugin would need to stay running, and handle some sort of atomic cutoff internally. Some way of ensuring that one monitoring period starts at the exact moment the previous one ends. In your case the previous one can still stay running after the cutoff, doing the data processing and output, but at the same time it's already started monitoring for the new period. |
Yeah, I think overlapping executions sounds complicated to explain and implement, and not something that many people would use. The long running subprocess idea makes more sense to me, and reminds me of FCGI a bit. My way of thinking about Telegraf is that it is not meant to be an advanced data processor. We try to hit the basics that most people will need, but for more advanced operations we recommend Kapacitor which excels at processing and can run a long lived process[1]. [1] https://github.com/influxdata/kapacitor/tree/master/udf/agent/ |
@cha87de I hope one of the ideas we discussed will work for you, let me know if you have any more questions. |
Yes, thank you for your feedback! |
I'm trying to add metrics to the TICK stack. To avoid compile dependencies I'm using the input.exec plugin. This way, I want to call an external binary every 10s, this binary makes observations, and concludes with some calculations and an output according to the influx data format.
Here comes my conceptual problem: if I want to avoid gaps in the monitoring, I have to run this binary every 10s, which observes for 10s and produces the output. Producing the output takes 0.1s. Hence, the binary runs for 10.1s and hence needs a timeout of 10.1s.
Unfortunately, telegraf doesn't allow me to set the timeout larger than the interval. So, I probably have some conceptual mistake in my approach of using the input.exec plugin. I understand that telegraf want's to avoid that one plugin runs twice for a time frame, but I'm not sure how to avoid that.
So, the question I would have is: what is the intended way by telegraf to integrate an external monitoring tool which which needs time to a) observe and b) compile output in the correct way, without having monitoring gaps?
The text was updated successfully, but these errors were encountered: