Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tcp Forwarder Output plugin. #1526

Closed
wants to merge 1 commit into from

Conversation

tuier
Copy link
Contributor

@tuier tuier commented Jul 20, 2016

This can be use to forward metric to a centralized endpoint (telegraf or not)

Sample use case:

Considering a Pool of server with telegraf and tcp_forwarder
enabled, who will send data to some other sever with tcp_listener.

This allow to have a better security, as credential for "real"
output is not in every server of the Pool, and will allow to rotate
credential much easier.


This plugin will send all metrics through TCP in the chosen format, this can be
use by example with tcp listener input plugin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put a configuration example here

@sparrc
Copy link
Contributor

sparrc commented Jul 25, 2016

you should put a sync.Mutex on the plugin struct, lock it while gathering and closing so that they can't be called at the same time.

@tuier
Copy link
Contributor Author

tuier commented Aug 5, 2016

@sparrc any more work needed on this PR ?

@sparrc
Copy link
Contributor

sparrc commented Aug 5, 2016

I still need to do a final review and test it. There is currently a large backlog of PRs and this may take a while, so please be patient.

sync.Mutex

Server string
Timeout string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the type of Timeout to internal.Duration, then you don't need to parse it, the toml parsing will deal with verifying that it's a valid timestamp.

in the config file it will still look like timeout = "5s"

@theist
Copy link
Contributor

theist commented Sep 4, 2016

Hi I'm testing this plugin searching for ways to "proxy" telegraf in firewalled servers (I'm testing it along other solutions like use of haproxy and the http service input plugin done in #1407)

I'm using this config:

[[outputs.tcp_forwarder]]
  server = "localhost:8186"
  timeout = "5s"
  reconnect = false
  data_format = "influx"

I placed recconect to false because it feels a bit of overhead according the help text ## force reconnection before every push

When I setup a pair of tcp_input / tcp_output two things happen

First of all I cannot start the "tcp_output_ed" if I don't start the "tcp_input_ed" telegraf ends without gather any metrics. Perhaps will we great that telegraf start collecting data and add it to the buffer even if there's no tcp endpoint listening on the other side. But I can workarround that and it is not my main concern

My problem is that if I drop the "telegraf proxy" the proxied with the tcp_output starts to accumulate data in buffer as expected, but when the proxy with the tcp_input restarts again it seems that the conection is never retried and the proxied keeps logging

ERROR: write tcp 127.0.0.1:59778->127.0.0.1:8186: use of closed network connection
2016/09/04 12:53:40 Error writing to output [tcp_forwarder]: Could not write to tcp endpoint

Is the connection ever retried? Sorry I tried to look at the code but golang seems Greek to me

UPDATE: Tried to switch to reconnect = true but then telegraf does not even start

$ ./telegraf_custom -debug -config telegraf_tcp_output.cfg 
2016/09/04 13:40:47 Attempting connection to output: tcp_forwarder

and dies

@sparrc sparrc mentioned this pull request Sep 5, 2016
@tuier
Copy link
Contributor Author

tuier commented Sep 5, 2016

For the first issues: when starting the plugin a connect function is called to make sure it can connect this is a default behavior of telegraf

@sparrc Should we change that to the suggestion ? gathering metric before we can reach a endpoint, by not passing the first connection error ?

@theist Can you share the whole configuration file for both side ?

t.Lock()
defer t.Unlock()

var bp []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than making a slice of strings and then joining it, you should just create the string of metrics directly as you loop over the metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here was to leave a possibility to create/add a Join function in the serialiser interface (see line 143)

@sparrc
Copy link
Contributor

sparrc commented Sep 6, 2016

@sparrc Should we change that to the suggestion ? gathering metric before we can reach a endpoint, by not passing the first connection error ?

I don't quite understand this question, can you rephrase?

return err
}
// Prepare data
t.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you locking this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure that nothing will change the connection while trying to Write.

@theist
Copy link
Contributor

theist commented Sep 6, 2016

@tuier Hi sorry for the noise :) the behaviour I expect does not necessarily represent the views of the telegraf dev team, I'm only a sysadm doing some research for my tick monitoring stack and a tech blog post

I placed my findings and the config for the two telegrafs in this gist

The telegraf I'm using is a local telegraf compiled from master and with this PR merged, in a clean official golang docker container for linux 64.

All tests are made on a 4.4.0-34-generic #53-Ubuntu SMP x86_64 GNU/Linux, on localhost launching the same telegraf on foreground with different config files on different terminals

Hope it helps

@sparrc
Copy link
Contributor

sparrc commented Sep 7, 2016

@tuier Can you explain the current behavior of the output plugin? What does it do if it can't connect? I generally agree with @theist that this plugin should be able to handle connection failures/restarts, whether that be in the middle of the run or at startup.

I'm a little confused by why the plugin would need to "gather" metrics, since this is an output plugin. If the connection fails then it should return an error, and telegraf's accumulator will handle buffering the points until they can successfully be written.

@tuier
Copy link
Contributor Author

tuier commented Sep 13, 2016

@sparrc When starting up the plugin the Connect function is call, this will return an error.
But that shouldn't stop telegraf.

I will be looking in @theist issues to see if i can find where it's coming from

@tuier
Copy link
Contributor Author

tuier commented Sep 18, 2016

@theist your issues should be fixed now.

@sparrc to be more precise, When telegraf start, it will call the Connect function from an output plugin. If the plugin return an error it will try a second time (after 15 second) and if it return an error again, telegraf will "fatal" and because of that stop. (https://github.com/influxdata/telegraf/blob/master/agent/agent.go#L61-L70 ,https://github.com/influxdata/telegraf/blob/master/cmd/telegraf/telegraf.go#L240-L243 )
In the Case that the TCP endpoint is not up When calling that Connect function, this plugin will return an error.

Because of this two behavior, telegraf will stop if the TCP endpoint is not set before running this plugin.

@sparrc would you prefer to ignore that error in the first Connect function ? in that case why would that function return a error, as the "fix" is to ignore that error.
Should we change the way telegraf output failure at startup and doing something like Error(print in log) but not Fatal ?

@st0ckface
Copy link

Is there any traction on this?

@tuier
Copy link
Contributor Author

tuier commented Oct 25, 2016

Every change requested has been made, we are just waiting for more review or a green light !

@sparrc
Copy link
Contributor

sparrc commented Nov 3, 2016

@tuier can you clean up this PR? something must have gone wrong with a merge conflict, there shouldn't be 117 commits included in this PR

I would like to try to have this merged for 1.2

@sparrc sparrc added this to the 1.2.0 milestone Nov 3, 2016
This can be use to forward metric to a centralised endpoint (telegraf or not)

Sample use case:
    Considering a Pool of server with telegraf and tcp_forwarder
    enabled, who will send data to some other sever with tcp_listener.
    This allow to have a better security, as credential for "real"
    output is not in every server of the Pool, and will allow to rotate
    credential much easier.
@sparrc
Copy link
Contributor

sparrc commented Dec 5, 2016

thank you very much for this PR @tuier, but I'm closing this out in favor of #2094

@sparrc sparrc closed this Dec 5, 2016
sparrc pushed a commit that referenced this pull request Feb 2, 2017
@sparrc sparrc removed this from the 1.2.0 milestone Feb 2, 2017
mlindes pushed a commit to Comcast/telegraf that referenced this pull request Feb 6, 2017
maxunt pushed a commit that referenced this pull request Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants