SMART error (ErrorCount) detected on host - but the NVRAM disk is perfectly fine
Recently I got a new system with a NVRAM disk (nice and fast). Upon installing smartmontools, it started reporting that the error counter for the disk is increasing. For a brand new disk?
A quick search revealed, that some vendors are (ab)using the error counter for storing messages. In my case I did update the firmware of the device (using fwupdmgr), and the result of that was stored in - the error counter. Let's check the stats:
smartctl --all /dev/nvme0 smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.3.0-24-generic] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO Plus 1TB === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Media and Data Integrity Errors: 0 Error Information Log Entries: 32 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Error Information (NVMe Log 0x01, max 64 entries) No Errors Logged
It says: "Error Information Log Entries = 32", but also "No Errors Logged".
The "nvme-cli" package (Ubuntu, your mileage might vary) provides the "nvme" tool, and it can extract the logs from the disk:
nvme error-log /dev/nvme0 ................. Entry ................. error_count : 0 sqid : 0 cmdid : 0 status_field : 0(SUCCESS: The command completed successfully) parm_err_loc : 0 lba : 0 nsid : 0 vs : 0 cs : 0 ................. Entry ................. error_count : 0 sqid : 0 cmdid : 0 status_field : 0(SUCCESS: The command completed successfully) parm_err_loc : 0 lba : 0 nsid : 0 vs : 0 cs : 0 .................
And here we are, the disk BIOS is using the ERROR log to report SUCCESS! Disk vendors these days ...
Display comments as Linear | Threaded