Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU steal value incorrect #206

Closed
jonc650 opened this issue Sep 17, 2015 · 3 comments
Closed

CPU steal value incorrect #206

jonc650 opened this issue Sep 17, 2015 · 3 comments
Labels
bug unexpected problem or unintended behavior

Comments

@jonc650
Copy link

jonc650 commented Sep 17, 2015

It seems that the cpu_steal value is being assigned the value of cpu_guest with telegraf 0.1.8 on Centos 7. This is on a physical host, and is not a VM.

Linux ops 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 01:06:18 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The output below shows cpu_steal seems to have the value associated with the cpu_guest column in /proc/stat

/opt/telegraf/telegraf -config telegraf.conf -test | grep cpu1 | grep steal
[cpu="cpu1"] cpu_steal value=339072
cat /proc/stat | grep cpu1
cpu1 352463 24 60865 25395928 4226 0 617 0 339072 0

And the value in influxdb is higher than would be expected (the guest percentage is always 0).

select * from cpu_percentageSteal limit 4
name: cpu_percentageSteal

time cpu host value
2015-09-17T13:32:10Z cpu3 ops 16.680567139295693
2015-09-17T13:32:10Z cpu-total ops 48.75336783131492
2015-09-17T13:32:10Z cpu2 ops 16.675931072799887
2015-09-17T13:32:10Z cpu0 ops 0

This is also resulting in the percentage idle being incorrect. The following shows the difference in measurements when compared with collectd on the same host.

screen shot 2015-09-17 at 16 43 15

@sparrc
Copy link
Contributor

sparrc commented Sep 17, 2015

Thank you for the detailed report, this was very helpful. I've had my suspicions that some of the CPU data we were collecting was not accurate but didn't have time to reproduce. I am going to investigate this and see if there is another cpu gathering library that may work better for us. On linux it may also be easiest to just cat /proc/stat to get CPU data, I'll investigate doing that as well.

@sparrc sparrc added the bug unexpected problem or unintended behavior label Sep 17, 2015
@sparrc sparrc mentioned this issue Sep 17, 2015
@sparrc
Copy link
Contributor

sparrc commented Sep 17, 2015

It looks like this was definitely an issue in the gopsutil library, that has long been fixed.

I've been meaning to refactor that code for a while, to actually properly vendor using godep, in which case we would just have needed to godep update to get the fixed code.

I've done that work in #209, as well as renaming some of the cpu metrics, since cpu_percentageIdle was a bit inconsistent with the rest of the metric names in telegraf. I will make a new build once I get that merged in.

@sparrc
Copy link
Contributor

sparrc commented Sep 18, 2015

Closing issue, this is available on HEAD and will be in 0.1.9 when it's released early next week

@sparrc sparrc closed this as completed Sep 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants