JSON output data format, support nanosecond timestamps. #2124

tjmcs · 2016-12-04T09:47:21Z

Bug report

In the current released version of telegraf and the latest version on the master branch, the precision value declared for metrics generated by the agent is not honored for all Serializers. Specifically, when we select a file output filter and request an influx data_format, with the precision value for the agent set to 1ns (for nanosecond precision), we see values that look like this in the output:

system,host=192-168-1-7.tpgi.com.au load1=1.93,load5=2.11,load15=2.14,n_users=7i,n_cpus=8i 1480843620054960352
system,host=192-168-1-7.tpgi.com.au uptime=2339551i,uptime_format="27 days,  1:52" 1480843620054996864

On the other hand, when we change that output data_format value to json, the values that we see look like this:

{"fields":{"load1":2.16,"load15":2.14,"load5":2.13,"n_cpus":8,"n_users":7},"name":"system","tags":{"host":"192-168-1-7.tpgi.com.au"},"timestamp":1480843720}
{"fields":{"uptime":2339651,"uptime_format":"27 days,  1:54"},"name":"system","tags":{"host":"192-168-1-7.tpgi.com.au"},"timestamp":1480843720}

It's obvious that the timestamp values shown in the influx data_format example shown above are actual nanosecond timestamp values, while the timestamps shown for the json data_format example have been truncated to the nearest second.

It's also obvious where this occurs. While the precision for the agent does set the precision of the time stamp for the metrics that are collected, when converting that data to JSON in this line of code, that timestamp is converted to a time in seconds. The influx data_format is actually generated in a completely different fashion than the json data_format; by calling a simple m.Serialize() method in the current commit on the master branch of the repository or []string{metric.String()} method in the v1.1.1 release of telegraf.

Relevant telegraf.conf:

This is the telegraf.conf file that was used for the influx data_format example:

# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply prepend
# them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),
# for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)


# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default, precision will be set to the same timestamp order as the
  ## collection interval, with the maximum being 1s.
  ## Precision will NOT be used for service inputs, such as logparser and statsd.
  ## Valid values are "ns", "us" (or "µs"), "ms", "s".
  precision = "1ns"

  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = false
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = ""

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false


###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Send telegraf metrics to file(s)
[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout", "/tmp/metrics.out"]

  ## Data format to output.
  ## Each data format has it's own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  data_format = "influx"



###############################################################################
#                            PROCESSOR PLUGINS                                #
###############################################################################

# # Print all metrics that pass through this filter.
# [[processors.printer]]



###############################################################################
#                            AGGREGATOR PLUGINS                               #
###############################################################################

# # Keep the aggregate min/max of each metric passing through.
# [[aggregators.minmax]]
#   ## General Aggregator Arguments:
#   ## The period on which to flush & clear the aggregator.
#   period = "30s"
#   ## If true, the original metric will be dropped by the
#   ## aggregator and will not get sent to the output plugins.
#   drop_original = false



###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

# Read metrics about system load & uptime
[[inputs.system]]
  # no configuration

while this is the telegraf.conf file used for the json data_format example:

# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply prepend
# them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),
# for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)


# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default, precision will be set to the same timestamp order as the
  ## collection interval, with the maximum being 1s.
  ## Precision will NOT be used for service inputs, such as logparser and statsd.
  ## Valid values are "ns", "us" (or "µs"), "ms", "s".
  precision = "1ns"

  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = false
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = ""

  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false


###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Send telegraf metrics to file(s)
[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout", "/tmp/metrics.out"]

  ## Data format to output.
  ## Each data format has it's own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  data_format = "json"



###############################################################################
#                            PROCESSOR PLUGINS                                #
###############################################################################

# # Print all metrics that pass through this filter.
# [[processors.printer]]



###############################################################################
#                            AGGREGATOR PLUGINS                               #
###############################################################################

# # Keep the aggregate min/max of each metric passing through.
# [[aggregators.minmax]]
#   ## General Aggregator Arguments:
#   ## The period on which to flush & clear the aggregator.
#   period = "30s"
#   ## If true, the original metric will be dropped by the
#   ## aggregator and will not get sent to the output plugins.
#   drop_original = false



###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

# Read metrics about system load & uptime
[[inputs.system]]
  # no configuration

both were generated by using the telegraf -input-filter system -output-filter file config command and hand editing the resulting configuration file to set the precision to 1ns so that we would get timestamps to the nearest nanosecond in the metrics that were measured by the agent; the only difference is the data_format for the file output filter.

System info:

Tested using Telegraf version 1.1.1 and a version of Telegraf built from the latest commit to the master branch (504f4e6 ) under both CentOS 7.2 and OS X 10.12.7

Steps to reproduce:

create a configuration file with a file output filter using the telegraf -input-filter system -output-filter file config command
hand edit that file to set the precision to 1ns
create a copy of that file and change the data_format to json in the copy that you just created
run the command telegraf --config <filename> where the string <filename> is the configuration that you want to test (the first file and the second file)

Expected behavior:

I expected to see that the timestamps for both the influx and json data_format cases would have the same precision; the precision declared in the configuration of the telegraf agent.

Actual behavior:

The precision of the metric timestamps depends not only on the precision declared in the configuration of the telegraf agent but also on the data_format declared for the output filter, specifically:

In the influx data_format case, the nanosecond precision that was declared for the agent was honored, and the timestamps shown in the output generated by the agent were to the nearest nanosecond
In the json data_format case the precision declared for the agent was not honored, instead the timestamps associated with the metric (which were to the nearest nanosecond) were arbitrarily converted to timestamp values in seconds (by dividing the timestamp associated with each metric value, converted to nanoseconds, by 1000000000).

The text was updated successfully, but these errors were encountered:

sparrc · 2016-12-05T11:40:21Z

The agent precision refers more to the precision at the metric collection point. InfluxDB is specifically a data protocol which supports nanosecond precision.

So this is just a request for the JSON output format to support nanosecond timestamps.

tjmcs · 2016-12-05T21:37:15Z

If you'd rather call this a feature request than a bug fix I'm open to that; I'm just having difficulty with the fact that the timestamp in the influx data_format is returned in the units that were used when the metric was collected (nanosecond, milliseconds, microseconds or seconds) but the timestamp returned in the json data_format is always in seconds, regardless of the precision that was used when the metric was collected.

It just seemed to me that the timestamp should be consistent, regardless of the data_format that is defined (so either both should always return seconds or the timestamp returned for both should be the timestamp that was generated when the metric was collected, not a truncated timestamp for one and a full-fidelity timestamp for the other.

sparrc · 2016-12-05T22:04:26Z

I disagree, I think that the output format defines the precision. Influx is defined to be nanosecond, graphite is second, and JSON in this case is second as well.

tjmcs · 2016-12-05T22:40:05Z

OK, I see that now. If I change the precision to 1us, 1ms, or 1s for the influx data format it continues to return a time in nanoseconds, regardless of the precision of the data collected.

Is that documented somewhere (that the time returned for the influx data format is always in nanoseconds and the time returned for the graphite and json data formats is always in seconds), other than in the statements used in the code to display the event's timestamp, and I just missed it?

Given that behavior, then I agree that this is a feature request. That said, I don't know how the change would be supported since it would be a breaking change for people who are currently collecting json and graphite data formats and are expecting the timestamp to be in seconds. To make the change I guess we'd have to add some sort of output_precision parameter to the configuration file and have it default to the current behavior for those data formats if the parameter wasn't specified.

sparrc · 2016-12-05T22:54:56Z

Yes it would need to be an argument for the json output format, graphite doesn't support anything other than s.

tjmcs · 2016-12-05T23:13:22Z

That actually makes things much simpler than I thought it would be; let me see if I can throw together a pull request and submit something in short order on this. I was afraid I'd have to pull the precision from the Agent configuration somehow; having it as a configuration parameter in the json data_format output simplifies things greatly.

Look for a CLA signature from me and a pull request sometime this week :)

sparrc added this to the Future Milestone milestone Dec 5, 2016

sparrc changed the title ~~The agent precision should be honored for all Serializers~~ JSON output data format, support nanosecond timestamps. Dec 5, 2016

This was referenced Dec 8, 2016

Adds an 'output_precision' configuration parameter #2139

Closed

Adds a 'timestamp_units' configuration parameter #2143

Closed

tjmcs mentioned this issue Mar 29, 2017

Adds a new json_timestamp_units configuration parameter #2587

Merged

3 tasks

danielnelson closed this as completed in #2587 Mar 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON output data format, support nanosecond timestamps. #2124

JSON output data format, support nanosecond timestamps. #2124

tjmcs commented Dec 4, 2016

sparrc commented Dec 5, 2016

tjmcs commented Dec 5, 2016 •

edited

Loading

sparrc commented Dec 5, 2016

tjmcs commented Dec 5, 2016 •

edited

Loading

sparrc commented Dec 5, 2016

tjmcs commented Dec 5, 2016

JSON output data format, support nanosecond timestamps. #2124

JSON output data format, support nanosecond timestamps. #2124

Comments

tjmcs commented Dec 4, 2016

Bug report

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

sparrc commented Dec 5, 2016

tjmcs commented Dec 5, 2016 • edited Loading

sparrc commented Dec 5, 2016

tjmcs commented Dec 5, 2016 • edited Loading

sparrc commented Dec 5, 2016

tjmcs commented Dec 5, 2016

tjmcs commented Dec 5, 2016 •

edited

Loading

tjmcs commented Dec 5, 2016 •

edited

Loading