If a disk failure happens, the disk is replaced with a similar disk, and then needs to be configured and re-added to the RAID.
Newer systems all use the GUID partition table (GPT), and therefore allow almost unlimited disk sized. Instructions for re-adding a disk using GPT are a bit different from the days when MBR (up to 2 TB disk space) was used, therefore I’m writing them down here for future use.
First, obviously, the failed disk must be replaced. If there is a disk error, the software RAID should already have failed the disk. This can be verified by looking into
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sda5 1943342080 blocks [1/2] [U_] unused devices: <none>
One out of two disks for
/dev/md2 is available. In this example,
/dev/sda is healthy, and
The serial number of the failed disk can be identified using the SMART control tools:
smartctl -i /dev/sdb === START OF INFORMATION SECTION === Device Model: TOSHIBA DT01ACA300 Serial Number: 84QA172SE
The serial number is helpful in order not to accidentaly remove the wrong disk, and then lose all data (the other RAID 1 disk is the only working disk right now). Alternatively, if the failed disk is no longer responsing at all, this method can be used to identify the healthy disk - and then swap the other one.
Once the disk is replaced, the new disk needs to be partitioned. It’s important that the new disk has exactly the same partition structure as the old one. For GPT disks, the
sgdisk utility can copy the partition table from the healthy to the new disk. On Debian and Ubuntu systems, the utility is in the
For the next step, be super careful, and verify twice that the disk names are in order.
This will replicate (R) the partition table from
/dev/sdb. This command is seen in different formats in tutorials on the Internet, sometimes also shown as:
If you get the order wrong, the partition table of your healthy disk
/dev/sda will be overwritten with the empty table from
/dev/sdb. This will render your data unaccessible!
Copying the GPT will also copy the GUID, both disks will have the same. It’s required to assign a new GUID to the new disk:
Both disks should have the same partition table now, verify it:
The lists must be equal.
Now it’s time to re-add the partitions back to the RAID. In the example above I only listed /dev/md2, usually there are a few more partitions. Repeat the following step for every partition:
A look into
/proc/mdstat should show that the rebuild process for /dev/md2 has started, or even has already finished if it’s only a small partition. Once the rebuild is finished, the RAID status should look like this:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sdb5 sda5 1943342080 blocks [2/2] [UU]