SMART error (ErrorCount) detected on host - but the NVRAM disk is perfectly fine

Posted by ads' corner on Saturday, 2019-12-21
Posted in [Hardware][Linux]

Recently I got a new system with a NVRAM disk (nice and fast). Upon installing smartmontools, it started reporting that the error counter for the disk is increasing. For a brand new disk?

A quick search revealed, that some vendors are (ab)using the error counter for storing messages. In my case I did update the firmware of the device (using fwupdmgr), and the result of that was stored in - the error counter. Let’s check the stats:

smartctl --all /dev/nvme0

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.3.0-24-generic] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Media and Data Integrity Errors:    0
Error Information Log Entries:      32
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

It says: Error Information Log Entries = 32, but also No Errors Logged.

The nvme-cli package (Ubuntu, your mileage might vary) provides the nvme tool, and it can extract the logs from the disk:

nvme error-log /dev/nvme0

.................
 Entry[62]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
cs           : 0
.................
 Entry[63]
.................
error_count  : 0
sqid         : 0
cmdid        : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba          : 0
nsid         : 0
vs           : 0
cs           : 0
.................

And here we are, the disk BIOS is using the ERROR log to report SUCCESS! Disk vendors these days …


Categories: [Hardware] [Linux]