Recently I got a new system with a NVRAM disk (nice and fast). Upon installing smartmontools, it started reporting that the error counter for the disk is increasing. For a brand new disk?
A quick search revealed, that some vendors are (ab)using the error counter for storing messages. In my case I did update the firmware of the device (using fwupdmgr), and the result of that was stored in - the error counter. Let’s check the stats:
smartctl --all /dev/nvme0
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.3.0-24-generic] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO Plus 1TB
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Media and Data Integrity Errors: 0
Error Information Log Entries: 32
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
It says: Error Information Log Entries = 32
, but also No Errors Logged
.
The nvme-cli
package (Ubuntu, your mileage might vary) provides the nvme
tool, and it can extract the logs from the disk:
nvme error-log /dev/nvme0
.................
Entry[62]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
cs : 0
.................
Entry[63]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
cs : 0
.................
And here we are, the disk BIOS is using the ERROR
log to report SUCCESS
! Disk vendors these days …