Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart can't get some disks status report #5740

Closed
vvershkov opened this issue Apr 17, 2019 · 11 comments · Fixed by #5765
Closed

Smart can't get some disks status report #5740

vvershkov opened this issue Apr 17, 2019 · 11 comments · Fixed by #5765
Assignees
Labels
area/smart bug unexpected problem or unintended behavior
Milestone

Comments

@vvershkov
Copy link

vvershkov commented Apr 17, 2019

Feature Request

Smart input plugin can't read some disks

Proposal:

Use smartctl -H for disk status

Current behavior:

no info about hitachi disks at all

Desired behavior:

at least I want smart overall status

Smart is looks like this one:

# smartctl -a /dev/sdc
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-46-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUH721212AL5204
Revision:             C3Q1
Compliance:           SPC-4
User Capacity:        12,000,138,625,024 bytes [12.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca2705ad68c
Serial number:        8HHLYPDH
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Apr 17 19:03:32 2019 MSK
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     26 C
Drive Trip Temperature:        85 C

Manufactured in week 35 of year 2018
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  6
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  37
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 2887258210304

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0        222      13598.283           0
write:         0        0         0         0        522          9.783           0
verify:        0        0         0         0       1375          0.000           0

Non-medium error count:        0

No self-tests have been logged

And with -H I can get a standart output:

# smartctl -H /dev/sdc
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-46-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
@glinton
Copy link
Contributor

glinton commented Apr 17, 2019

What telegraf version are you using?

@chrishoage
Copy link

chrishoage commented Apr 17, 2019

I am also having a problem with a disk not appearing in the telegraf output

› sudo smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device
/dev/sde -d scsi # /dev/sde, SCSI device
/dev/sdf -d scsi # /dev/sdf, SCSI device
/dev/sdg -d scsi # /dev/sdg, SCSI device
/dev/sdh -d scsi # /dev/sdh, SCSI device
/dev/sdi -d scsi # /dev/sdi, SCSI device
/dev/sdj -d scsi # /dev/sdj, SCSI device
/dev/sdk -d scsi # /dev/sdk, SCSI device
/dev/sdl -d scsi # /dev/sdl, SCSI device
/dev/sdm -d scsi # /dev/sdm, SCSI device
/dev/sdn -d scsi # /dev/sdn, SCSI device
› sudo smartctl --info --attributes --health -n standby --format=brief /dev/sdg -d scsi
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-145-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LB provisioning type: unreported, LBPME=0, LBPRZ=0
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca250ec3c9c
Serial number:        PK1334PEK49SBS
Device type:          disk
Local Time is:        Wed Apr 17 12:21:23 2019 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature:     34 C
› sudo telegraf --test --input-filter smart
2019-04-17T19:21:38Z I! Starting Telegraf 1.10.2
2019-04-17T19:21:38Z I! Using config file: /etc/telegraf/telegraf.conf
> smart_device,capacity=525112713216,device=sdj,enabled=Enabled,host=cortex,model=Crucial_CT525MX300SSD1,serial_no=16431465A85A,wwn=500a07511465a85a exit_status=0i,health_ok=true,read_error_rate=2i,temp_c=36i,udma_crc_errors=0i 1555528899000000000
> smart_device,capacity=525112713216,device=sdk,enabled=Enabled,host=cortex,model=Crucial_CT525MX300SSD1,serial_no=1651150FA577,wwn=500a0751150fa577 exit_status=0i,health_ok=true,read_error_rate=0i,temp_c=36i,udma_crc_errors=0i 1555528899000000000
> smart_device,capacity=4000787030016,device=sde,enabled=Enabled,host=cortex,model=WDC\ WD40EFRX-68WT0N0,serial_no=WD-WCC4EM0WN624,wwn=50014ee2b51b9d7f exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=30i,udma_crc_errors=0i 1555528899000000000
> smart_device,capacity=4000787030016,device=sdn,enabled=Enabled,host=cortex,model=WDC\ WD40EFRX-68WT0N0,serial_no=WD-WCC4EECRN58H,wwn=50014ee20a98bd99 exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=37i,udma_crc_errors=0i 1555528899000000000
> smart_device,capacity=4000787030016,device=sdl,enabled=Enabled,host=cortex,model=WDC\ WD40EFRX-68WT0N0,serial_no=WD-WCC4E4FKJ5DV,wwn=50014ee25fc65114 exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=29i,udma_crc_errors=0i 1555528899000000000
> smart_device,capacity=4000787030016,device=sdb,enabled=Enabled,host=cortex,model=WDC\ WD40EFRX-68WT0N0,serial_no=WD-WCC4EK8ZSK37,wwn=50014ee2b51c8ebd exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=31i,udma_crc_errors=0i 1555528899000000000
> smart_device,capacity=4000787030016,device=sdm,enabled=Enabled,host=cortex,model=WDC\ WD40EFRX-68WT0N0,serial_no=WD-WCC4E4FKJH1X,wwn=50014ee20a70d5a0 exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=29i,udma_crc_errors=0i 1555528899000000000
> smart_device,capacity=4000787030016,device=sdf,enabled=Enabled,host=cortex,model=HGST\ HDN724040ALE640,serial_no=PK2334PEJM9B3T,wwn=5000cca250e4f530 exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=34i,udma_crc_errors=0i 1555528900000000000
> smart_device,capacity=4000787030016,device=sdc,enabled=Enabled,host=cortex,model=HGST\ HDN724040ALE640,serial_no=PK2334PEK4AXTT,wwn=5000cca250ec4105 exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=34i,udma_crc_errors=0i 1555528900000000000
> smart_device,capacity=4000787030016,device=sda,enabled=Enabled,host=cortex,model=HGST\ HDN724040ALE640,serial_no=PK1334PEKDXVTS,wwn=5000cca250f02751 exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=32i,udma_crc_errors=0i 1555528900000000000
> smart_device,capacity=4000787030016,device=sdh,enabled=Enabled,host=cortex,model=HGST\ HDN724040ALE640,serial_no=PK1334PEJLL6NS,wwn=5000cca250e4a210 exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=31i,udma_crc_errors=0i 1555528900000000000
> smart_device,capacity=4000787030016,device=sdd,enabled=Enabled,host=cortex,model=HGST\ HDN724040ALE640,serial_no=PK1334PEKDNZ0S,wwn=5000cca250f009ad exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=35i,udma_crc_errors=0i 1555528900000000000
> smart_device,capacity=32017047552,device=sdi,enabled=Enabled,host=cortex,model=SATA\ SSD,serial_no=AF3407621C2400203590 exit_status=0i,health_ok=true,read_error_rate=0i,temp_c=30i 1555528903000000000

Note /dev/sdg is listed in smartctl --scan and reports data with sudo smartctl --info --attributes --health -n standby --format=brief /dev/sdg -d scsi but does not appear in sudo telegraf --test --input-filter smart using Telegraf 1.10.2 (git: HEAD 3303f5c3)

@danielnelson danielnelson added area/smart bug unexpected problem or unintended behavior labels Apr 17, 2019
@ddimick
Copy link

ddimick commented Apr 17, 2019

In my environment, I'm seeing this behavior specifically with SAS drives. SATA drives on the same HBA are fine.

@glinton
Copy link
Contributor

glinton commented Apr 17, 2019

Can you try this linux amd64 build and run it with --debug. I'd love to find where the failure is at. Thanks!

@ddimick
Copy link

ddimick commented Apr 17, 2019

Telegraf unknown (git: bugfix/5740 85b8a490)

2019-04-17T22:01:28Z I! Starting Telegraf
2019-04-17T22:01:28Z I! Using config file: /etc/telegraf/telegraf.conf
2019-04-17T22:01:28Z D! [inputs.smart] adding device: []string{"/dev/sda", "-d", "scsi", "#", "/dev/sda,", "SCSI", "device"}
2019-04-17T22:01:28Z D! [inputs.smart] adding device: []string{"/dev/sdb", "-d", "scsi", "#", "/dev/sdb,", "SCSI", "device"}
2019-04-17T22:01:28Z D! [inputs.smart] adding device: []string{"/dev/sdc", "-d", "scsi", "#", "/dev/sdc,", "SCSI", "device"}
2019-04-17T22:01:28Z D! [inputs.smart] skipping device: []string{""}
2019-04-17T22:01:28Z D! [inputs.smart] devices: []string{"/dev/sda", "/dev/sdb", "/dev/sdc"}
2019-04-17T22:01:28Z D! [inputs.smart] gatherDisk '/dev/sdb' output: "smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-12-pve] (local build)\nCopyright (C) 2002-16, Bruce Allen, Christian Franke, www.smar$montools.org\n\n=== START OF INFORMATION SECTION ===\nVendor:               HITACHI\nProduct:              HUC103030CSS600\nRevision:             J350\nCompliance:           SPC-4\nUser Capacity:        300,$00,000,000 bytes [300 GB]\nLogical block size:   512 bytes\nRotation Rate:        10020 rpm\nForm Factor:          2.5 inches\nLogical Unit id:      0x5000cca00a4f91bc\nSerial number:        PDWDSKNE\nDevicetype:          disk\nTransport protocol:   SAS (SPL-3)\nLocal Time is:        Wed Apr 17 15:01:28 2019 PDT\nSMART support is:     Available - device has SMART capability.\nSMART support is:     Enabled\nTemp$rature Warning:  Disabled or Not Supported\n\n=== START OF READ SMART DATA SECTION ===\nSMART Health Status: OK\n\nCurrent Drive Temperature:     35 C\nDrive Trip Temperature:        85 C\n\nManufactured in $eek 52 of year 2009\nSpecified cycle count over device lifetime:  50000\nAccumulated start-stop cycles:  47\nElements in grown defect list: 0\n\nVendor (Seagate) cache information\n  Blocks sent to initiator= 7601969522802688\n\n"> smart_device,capacity=300000000000,device=sdb,enabled=Enabled,host=pve-1 exit_status=0i 1555538488000000000
2019-04-17T22:01:28Z D! [inputs.smart] gatherDisk '/dev/sda' output: "smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-12-pve] (local build)\nCopyright (C) 2002-16, Bruce Allen, Christian Franke, www.smar$montools.org\n\n=== START OF INFORMATION SECTION ===\nVendor:               HITACHI\nProduct:              HUC103030CSS600\nRevision:             J350\nCompliance:           SPC-4\nUser Capacity:        300,$00,000,000 bytes [300 GB]\nLogical block size:   512 bytes\nRotation Rate:        10020 rpm\nForm Factor:          2.5 inches\nLogical Unit id:      0x5000cca00a4bdbc8\nSerial number:        PDWAR9GE\nDevicetype:          disk\nTransport protocol:   SAS (SPL-3)\nLocal Time is:        Wed Apr 17 15:01:28 2019 PDT\nSMART support is:     Available - device has SMART capability.\nSMART support is:     Enabled\nTemp$rature Warning:  Disabled or Not Supported\n\n=== START OF READ SMART DATA SECTION ===\nSMART Health Status: OK\n\nCurrent Drive Temperature:     36 C\nDrive Trip Temperature:        85 C\n\nManufactured in $eek 52 of year 2009\nSpecified cycle count over device lifetime:  50000\nAccumulated start-stop cycles:  47\nElements in grown defect list: 0\n\nVendor (Seagate) cache information\n  Blocks sent to initiator= 7270983270400000\n\n"> smart_device,capacity=300000000000,device=sda,enabled=Enabled,host=pve-1 exit_status=0i 1555538488000000000
2019-04-17T22:01:28Z D! [inputs.smart] gatherDisk '/dev/sdc' output: "smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-12-pve] (local build)\nCopyright (C) 2002-16, Bruce Allen, Christian Franke, www.smar$montools.org\n\n=== START OF INFORMATION SECTION ===\nModel Family:     Samsung based SSDs\nDevice Model:     Samsung SSD 850 PRO 256GB\nSerial Number:    S39KNX0J718036J\nLU WWN Device Id: 5 002538 d4218a3d$\nFirmware Version: EXM04B6Q\nUser Capacity:    256,060,514,304 bytes [256 GB]\nSector Size:      512 bytes logical/physical\nRotation Rate:    Solid State Device\nForm Factor:      2.5 inches\nDevice is:    In smartctl database [for details use: -P show]\nATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c\nSATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)\nLocal Time is:    Wed Apr 17 15:01:28 $019 PDT\nSMART support is: Available - device has SMART capability.\nSMART support is: Enabled\nPower mode is:    ACTIVE or IDLE\n\n=== START OF READ SMART DATA SECTION ===\nSMART overall-health self-assessm$nt test result: PASSED\n\nSMART Attributes Data Structure revision number: 1\nVendor Specific SMART Attributes with Thresholds:\nID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE\n  5 Re$llocated_Sector_Ct   PO--CK   100   100   010    -    0\n  9 Power_On_Hours          -O--CK   097   097   000    -    14738\n 12 Power_Cycle_Count       -O--CK   099   099   000    -    47\n177 Wear_Leveling$Count     PO--C-   086   086   000    -    893\n179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   100   100   010    -    0\n181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0\n182 Erase_Fail_Count_Total  -O-$CK   100   100   010    -    0\n183 Runtime_Bad_Block       PO--C-   100   100   010    -    0\n187 Uncorrectable_Error_Cnt -O--CK   100   100   000    -    0\n190 Airflow_Temperature_Cel -O--CK   069   052 000    -    31\n195 ECC_Error_Rate          -O-RC-   200   200   000    -    0\n199 CRC_Error_Count         -OSRCK   100   100   000    -    0\n235 POR_Recovery_Count      -O--C-   099   099   000    -    3$\n241 Total_LBAs_Written      -O--CK   099   099   000    -    36452846103\n                            ||||||_ K auto-keep\n                            |||||__ C event count\n                            |||$___ R error rate\n                            |||____ S speed/performance\n                            ||_____ O updated online\n                            |______ P prefailure warning\n\n"> smart_attribute,device=sdc,fail=-,flags=PO--CK,host=pve-1,id=5,name=Reallocated_Sector_Ct,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=10i,value=100i,worst=100i 1555$38488000000000
> smart_attribute,device=sdc,fail=-,flags=-O--CK,host=pve-1,id=9,name=Power_On_Hours,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=14738i,threshold=0i,value=97i,worst=97i 1555538488$00000000
> smart_attribute,device=sdc,fail=-,flags=-O--CK,host=pve-1,id=12,name=Power_Cycle_Count,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=47i,threshold=0i,value=99i,worst=99i 155553848$000000000
> smart_attribute,device=sdc,fail=-,flags=PO--C-,host=pve-1,id=177,name=Wear_Leveling_Count,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=893i,threshold=0i,value=86i,worst=86i 15555$8488000000000
> smart_attribute,device=sdc,fail=-,flags=PO--C-,host=pve-1,id=179,name=Used_Rsvd_Blk_Cnt_Tot,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=10i,value=100i,worst=100i 15$5538488000000000
> smart_attribute,device=sdc,fail=-,flags=-O--CK,host=pve-1,id=181,name=Program_Fail_Cnt_Total,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=10i,value=100i,worst=100i 1$55538488000000000
> smart_attribute,device=sdc,fail=-,flags=-O--CK,host=pve-1,id=182,name=Erase_Fail_Count_Total,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=10i,value=100i,worst=100i 1555538488000000000
> smart_attribute,device=sdc,fail=-,flags=PO--C-,host=pve-1,id=183,name=Runtime_Bad_Block,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=10i,value=100i,worst=100i 1555538488000000000
> smart_attribute,device=sdc,fail=-,flags=-O--CK,host=pve-1,id=187,name=Uncorrectable_Error_Cnt,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=0i,value=100i,worst=100i 1555538488000000000
> smart_attribute,device=sdc,fail=-,flags=-O--CK,host=pve-1,id=190,name=Airflow_Temperature_Cel,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=31i,threshold=0i,value=69i,worst=52i 1555538488000000000
> smart_attribute,device=sdc,fail=-,flags=-O-RC-,host=pve-1,id=195,name=ECC_Error_Rate,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=0i,value=200i,worst=200i 1555538488000000000
> smart_attribute,device=sdc,fail=-,flags=-OSRCK,host=pve-1,id=199,name=CRC_Error_Count,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=0i,threshold=0i,value=100i,worst=100i 1555538488000000000
> smart_attribute,device=sdc,fail=-,flags=-O--C-,host=pve-1,id=235,name=POR_Recovery_Count,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=35i,threshold=0i,value=99i,worst=99i 1555538488000000000
> smart_attribute,device=sdc,fail=-,flags=-O--CK,host=pve-1,id=241,name=Total_LBAs_Written,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,raw_value=36452846103i,threshold=0i,value=99i,worst=99i 1555538488000000000
> smart_device,capacity=256060514304,device=sdc,enabled=Enabled,host=pve-1,model=Samsung\ SSD\ 850\ PRO\ 256GB,serial_no=S39KNX0J718036J,wwn=5002538d4218a3df exit_status=0i,health_ok=true,udma_crc_errors=0i 1555538488000000000

@glinton
Copy link
Contributor

glinton commented Apr 17, 2019

@chrishoage Can you paste the output of

sudo smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/sdg

@vvershkov Can you paste the output of the same, but with /dev/sdc instead of /dev/sdg?

Thanks @ddimick. I assume a and b are your SAS drives?

@chrishoage
Copy link

› sudo smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/sdg
[sudo] password for chris:
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-145-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Deskstar NAS
Device Model:     HGST HDN724040ALE640
Serial Number:    PK1334PEK49SBS
LU WWN Device Id: 5 000cca 250ec3c9c
Firmware Version: MJAOA5E0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 17 15:14:27 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Power mode is:    ACTIVE or IDLE

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   135   135   054    -    84
  3 Spin_Up_Time            POS---   125   125   024    -    621 (Average 619)
  4 Start_Stop_Count        -O--C-   100   100   000    -    33
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   119   119   020    -    35
  9 Power_On_Hours          -O--C-   098   098   000    -    19371
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    33
192 Power-Off_Retract_Count -O--CK   100   100   000    -    764
193 Load_Cycle_Count        -O--C-   100   100   000    -    764
194 Temperature_Celsius     -O----   176   176   000    -    34 (Min/Max 21/53)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

@ddimick
Copy link

ddimick commented Apr 17, 2019

I assume a and b are your SAS drives?

Yes, that's correct.

@glinton
Copy link
Contributor

glinton commented Apr 17, 2019

Thanks. @chrishoage can you also paste the output of the same command but with a disk that is being collected (anything other than /dev/sdg)

@chrishoage
Copy link

› sudo smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/sdh
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-145-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Deskstar NAS
Device Model:     HGST HDN724040ALE640
Serial Number:    PK1334PEJLL6NS
LU WWN Device Id: 5 000cca 250e4a210
Firmware Version: MJAOA5E0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 17 16:27:58 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Power mode is:    ACTIVE or IDLE

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   136   136   054    -    83
  3 Spin_Up_Time            POS---   125   125   024    -    621 (Average 617)
  4 Start_Stop_Count        -O--C-   100   100   000    -    28
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   124   124   020    -    33
  9 Power_On_Hours          -O--C-   098   098   000    -    19322
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    28
192 Power-Off_Retract_Count -O--CK   100   100   000    -    30
193 Load_Cycle_Count        -O--C-   100   100   000    -    30
194 Temperature_Celsius     -O----   187   187   000    -    32 (Min/Max 23/55)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

@vvershkov
Copy link
Author

vvershkov commented Apr 18, 2019

Hm, I didn't think about it but yep, that is SAS drives.

My telegraf version is 1.10.0-1 but I can update it to 1.10.3 (I am using ubuntu 18.04 and influxdata repo).

smartctl output:

# smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/sdg
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-46-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUH721212AL5204
Revision:             C3Q1
Compliance:           SPC-4
User Capacity:        12,000,138,625,024 bytes [12.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca27076bfe8
Serial number:        8HJ39K3H
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Thu Apr 18 13:25:03 2019 MSK
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     34 C
Drive Trip Temperature:        85 C

Manufactured in week 35 of year 2018
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  7
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  39
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 544135446528

(same for sdc - it has 60 drives from sdc to sdbj)
sda and sdb are SATA drives and I can get their status via telegraf.

# smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-46-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi/HGST Travelstar Z7K500
Device Model:     HGST HTE725050A7E630
Serial Number:    RCE50G20G81S9S
LU WWN Device Id: 5 000cca 90bc3a98b
Firmware Version: GS2OA3E0
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Apr 18 13:27:51 2019 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Power mode is:    ACTIVE or IDLE

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   062    -    0
  2 Throughput_Performance  P-S---   100   100   040    -    0
  3 Spin_Up_Time            POS---   100   100   033    -    1
  4 Start_Stop_Count        -O--C-   100   100   000    -    4
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   100   100   040    -    0
  9 Power_On_Hours          -O--C-   099   099   000    -    743
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    4
191 G-Sense_Error_Rate      -O-R--   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    2
193 Load_Cycle_Count        -O--C-   100   100   000    -    13
194 Temperature_Celsius     -O----   250   250   000    -    24 (Min/Max 15/29)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
223 Load_Retry_Count        -O-R--   100   100   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/smart bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants