Browse Source


Johann Schmitz 5 years ago
1 changed files with 77 additions and 0 deletions
  1. +77

+ 77
- 0 View File

@ -0,0 +1,77 @@
# What is smart-check?
After several undetected silent hard drive failures i looked (once again) for a working Nagios/Icinga plugin. There are many, many plugins which claim to be able to detect early signs of failing hard drives. Most of them just don't.
The few which actually do more than just checking for the temperature or some bogus attributes fall short on some other aspects, so i decided to add just another plugin to world which tries to solve this problem.
# Basic usage
usage: smart-check [-h] [--disks-file DISKS_FILE] [-s]
[--smartctl-path SMARTCTL_PATH] [-a SMARTCTL_ARGS]
[-i INTERFACE] [-f FILE] [-x]
[--ignore-attributes [IGNORE_ATTRIBUTES [IGNORE_ATTRIBUTES ...]]]
[-v] [--debug]
positional arguments:
drive The device as passed to smartctl's positional argument
optional arguments:
-h, --help show this help message and exit
--disks-file DISKS_FILE
-s, --sudo Use sudo to execute smartctl
--smartctl-path SMARTCTL_PATH
Path to smartctl (default: /usr/sbin/smartctl)
Other arguments passed to smartctl (default: -n
The smartctl interface specification (passed to
smartctl's -d parameter
-f FILE, --file FILE Use S.M.A.R.T. report from file instead of calling
smartctl (Use - to read from stdin)
-x, --exclude-notices
Do not report NOTICE warnings (default: False)
Ignore this S.M.A.R.T. attributes (id or name)
-v, --verbose Verbose messages
--debug Print debug messages
The basic usage is
smart-check <device>
root@host# smart-check /dev/sda
If your devices are behind a raid controller, pass `smartctl`'s `--device` argument via the `-i/--interface` parameter:
root@host# smart-check -i areca,1 /dev/sg13
# Attributes and thresholds
`smart-check` checks for several well-known S.M.A.R.T. attributes to predict signs a hardware failure. Note: This code is taken from [gsmartcontrol](
It checks that the following attributes have a non-zero value for `VALUE`/`RAW_VALUE` and emits a NOTICE.
* `reallocated_sector_count` / `reallocated_sector_ct`
* `spin_up_retry_count`
* `soft_read_error_rate`
* `temperature_celsius` / `temperature_celsius_x10`
* `reallocation_event_count`
* `current_pending_sector` / `current_pending_sector_count` / `total_pending_sectors`
* `offline_uncorrectable` / `total_offline_uncorrectable`
* `ssd_life_left`
`smart-check` also contains a disk database (taken from [check_smart_attributes]( which can be used to specify more fine-grained thresholds for specific hard drive device models.
If you want to ignore certain attributes (e.g. because your hard drive reports bogus values) pass the `--ignore-attributes` parameter to `smart-check`, e.g.:
smart-check --ignore-attributes Temperature_Celcius /dev/sda