Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.6] "no fields" error when doing post to influxdb #4039

Closed
albundy83 opened this issue Apr 18, 2018 · 13 comments · Fixed by #4048
Closed

[1.6] "no fields" error when doing post to influxdb #4039

albundy83 opened this issue Apr 18, 2018 · 13 comments · Fixed by #4048
Labels
bug unexpected problem or unintended behavior regression something that used to work, but is now broken
Milestone

Comments

@albundy83
Copy link
Contributor

Bug report

After upgrading to 1.6.0 release, cassandra metrics are not sent anymore.
I have the following error messages:

2018-04-18T14:14:10Z E! [outputs.influxdb]: when writing to [http:/my_influxdb:8086]: Post http://my_influxdb:8086/write?consistency=any&db=telegraf: no fields

Relevant telegraf.conf:

We use jolokia and cassandra to download JMX informations as written in:

https://github.com/influxdata/telegraf/tree/master/plugins/inputs/cassandra

The configuration file is quite simple, the part that is changed is on cassandra inputs:
(it's just an extract)

[[inputs.cassandra]]
  context = "/jolokia/read"
  servers = ["localhost:8778"]
  metrics = [
    "/java.lang:type=Memory/HeapMemoryUsage/used",
    "/java.lang:type=Memory/NonHeapMemoryUsage/used",
    "/java.lang:type=GarbageCollector,name=ConcurrentMarkSweep/CollectionTime",
    "/java.lang:type=GarbageCollector,name=ConcurrentMarkSweep/CollectionCount",
    "/java.lang:type=GarbageCollector,name=ParNew/CollectionTime",
    "/java.lang:type=GarbageCollector,name=ParNew/CollectionCount",
    "/org.apache.cassandra.metrics:type=Cache,scope=CounterCache,name=Capacity",
    "/org.apache.cassandra.metrics:type=Cache,scope=CounterCache,name=Entries",
    "/org.apache.cassandra.metrics:type=Cache,scope=CounterCache,name=HitRate",
    "/org.apache.cassandra.metrics:type=Cache,scope=CounterCache,name=Hits",
    "/org.apache.cassandra.metrics:type=Cache,scope=CounterCache,name=Requests",
    "/org.apache.cassandra.metrics:type=Cache,scope=CounterCache,name=Size",
    "/org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Capacity",
    "/org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries",
    "/org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate"
 ]

System info:

Telegraf 1.6.0
Tested on 2 Ubuntu 14.04 with the following kernels:
4.4.0-112-generic
3.13.0-137-generic

Steps to reproduce:

Upgrade to telegraf 1.6.0
Check that you don't have any cassandra metrics
See the error messages:

2018-04-18T14:14:10Z E! [outputs.influxdb]: when writing to [http:/my_influxdb:8086]: Post http://my_influxdb:8086/write?consistency=any&db=telegraf: no fields

Rollback to telegraf 1.5.3
Metrics are back

In debug mode, I don't have more details.

@albundy83
Copy link
Contributor Author

On influxdb side, I have the following errors:

Apr 18 16:56:33 my_influxdb influxd[7503]: [httpd] 10.234.151.183 - telegraf [18/Apr/2018:16:56:33 +0200] "POST /write?consistency=any&db=telegraf HTTP/1.1" 400 27 "-" "telegraf" af58b217-4318-11e8-a196-000000000000 9645
Apr 18 16:56:36 my_influxdb influxd[7503]: [httpd] 10.234.151.183 - telegraf [18/Apr/2018:16:56:36 +0200] "POST /write?consistency=any&db=telegraf HTTP/1.1" 400 27 "-" "telegraf" b1461478-4318-11e8-a1af-000000000000 585

@russorat russorat added bug unexpected problem or unintended behavior regression something that used to work, but is now broken labels Apr 18, 2018
@danielnelson
Copy link
Contributor

Thank you for the report, could you run telegraf --input-filter cassandra --test in 1.5.3 and add the output?

@albundy83
Copy link
Contributor Author

albundy83 commented Apr 19, 2018

Here the output with telegraf 1.5.3:
https://gist.github.com/albundy83/b0fb9484ba666f68112013811cdc0a21

@danielnelson
Copy link
Contributor

Lots of "interesting" metrics in there. I see a lot of map conversions that I know are going to be removed in 1.6, since I'm assuming you are not interested in storing measurements with fields like this:

Hits_RateUnit="map[declaringClass:map[interfaces:[] name:java.util.concurrent.TimeUnit]]"

I'm still having a hard time tracking down the error sending to InfluxDB though, could you also run that same command with 1.6.0? Also, you mentioned that the cassandra config was an extract, do you have more metrics specified and if so can you add the full config for the cassandra plugin?

@danielnelson danielnelson added this to the 1.6.1 milestone Apr 19, 2018
@albundy83
Copy link
Contributor Author

Here the same output with telegraf 1.6.0:
https://gist.github.com/albundy83/a09c99780c7fb514d866f565643211c3

And here the config we use with cassandra:
https://gist.github.com/albundy83/f3d86f130d9aa38f92c2c78c2c557502

@albundy83
Copy link
Contributor Author

And yes, the measurements with java classes etc ... are gone in 1.6, here a comparison:
in 1.5.3

cassandraCache,scope=CounterCache,mname=Hits,cassandra_host=localhost,rack=rack1,env=prod,anneau=cluster01,host=my_hostname.com,dc=dc1 Hits_Count=0,Hits_FifteenMinuteRate=0,Hits_FiveMinuteRate=0,Hits_MeanRate=0,Hits_RateUnit="map[declaringClass:map[interfaces:[] name:java.util.concurrent.TimeUnit]]",Hits_OneMinuteRate=0,Hits_EventType="hits" 1524120274000000000

in 1.6.0

cassandraCache,anneau=cluster01,cassandra_host=localhost,dc=dc1,env=prod,host=my_hostname.com,mname=Hits,rack=rack1,scope=CounterCache Hits_FifteenMinuteRate=0,Hits_FiveMinuteRate=0,Hits_MeanRate=0,Hits_OneMinuteRate=0,Hits_EventType="hits",Hits_Count=0 152412190800000000

@danielnelson
Copy link
Contributor

Thank you, with the files you provided I was able to reproduce the bug, so it shouldn't be a problem to put together a fix.

Just to clarify, are you okay if these fields with java classes are gone? I believe their inclusion was a bug.

@albundy83
Copy link
Contributor Author

Yes sure, we can't do nothing with it :-)

@albundy83
Copy link
Contributor Author

Can you give me some information about the issue if you have time please ?

@danielnelson
Copy link
Contributor

The error is caused by the metric having no fields, which is caused by the removal of these junk fields. When we serialize it to line protocol we return this error, and we are failing the entire batch because of it. What complicates it in this case is that we are serializing while sending the HTTP request to InfluxDB, instead of doing it all in memory before making the request. This causes the error to look as if it may be a HTTP issue, when actually it is the serializer.

@danielnelson
Copy link
Contributor

All should be well in 1.6.1, but after taking a fresh look at the cassandra plugin, I believe we should deprecate the cassandra plugin in Telegraf 1.7 in favor of the jolokia2 plugin. The jolokia2 plugin does general purpose Jolokia, and is much more flexible to configure and much, much faster. In your case it should be about 280 times faster since it collects everything in a single bulk request.

The configuration is slightly more verbose, and the output will not be exactly the same, so you would need to update dashboards/alerts/etc. Here is how it would look:

[[inputs.jolokia2_agent]]
  urls = ["http://cassandra.example.org:8778/jolokia"]
  [[inputs.jolokia2_agent.metric]]
    name = "cassandra_counter_cache"
    mbean = "org.apache.cassandra.metrics:type=Cache,scope=CounterCache,name=*"
  [[inputs.jolokia2_agent.metric]]
    name = "cassandra_key_cache"
    mbean = "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=*"

@danielnelson
Copy link
Contributor

Issue to track cassandra plugin deprecation: #4049

Deprecate plugins will be removed in 2.0 (which is unscheduled).

@albundy83
Copy link
Contributor Author

Thanks a lot for your explanation, I will work on the rewrite to move to jolokia2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior regression something that used to work, but is now broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants