How to replace a failed hard drive in mdadm (software RAID) - Linux (Almalinux/RHEL/Ubuntu/Debian)

When you have a faulty disk in a software RAID array (mdadm), you'll see an output similar to this when using cat /proc/mdstat command, which shows [2/1] and [U_]. This means that one disk is missing from a two-disk array.

 

cat /proc/mdstat
md0 : active raid1 nvme0n1p2[0] nvme1n1p2[1](F)
      614336 blocks super 1.0 [2/1] [U_]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 nvme1n1p3[1](F) nvme0n1p3[0]
      959237120 blocks super 1.2 [2/1] [U_]
      bitmap: 6/8 pages [24KB], 65536KB chunk

md127 : active raid1 nvme1n1p1[1](F) nvme0n1p1[0]
      16759808 blocks super 1.2 [2/1] [U_]

As you can see in the above output, there are three mdadm devices on RAID-1 array.

In this example, the disk is used for the operating system and has three partitions: /boot, /, and swap.

  • md0 - /boot
  • md1 - /
  • md127 - swap

To identify which partitions to remove from the raid array, use the following commands individually:

mdadm --detail /dev/md0 | grep faulty
mdadm --detail /dev/md1 | grep faulty
mdadm --detail /dev/md127 | grep faulty

You'll see a line similar to the following on each command output:

 1       259       7        1      faulty        /dev/nvme1n1p3

Or you can get this information from the cat /proc/mdstat output shown above. It shows (F) beside the faulty partition. e.g. for md0 the failed partition is nvme1n1p2 (or/dev/nvme1n1p2) and for md1 it's nvme0n1p3 (or /dev/nvme1n1p3)  and so on.

 

To get the serial number of the faulty disk follow this article.
Note: if you can't get the serial of the faulty disk because it's no longer functioning, you can get the serial of the working disk so at least you know which disk(s) to keep in the system and replace the other one.

 


Step 1: Remove the Faulty Disk from RAID array

Run the following commands to remove partitions of the faulty disk from the RAID array:

mdadm --manage /dev/md0 --remove /dev/nvme1n1p2
mdadm --manage /dev/md1 --remove /dev/nvme1n1p3
mdadm --manage /dev/md127 --remove /dev/nvme1n1p1

Important: replace the partition names with the correct ones on your system.

Now, shut down the system and replace the faulty disk.


Step 2: Prepare the new Disk

After physically replacing the faulty disk, copy the partition table from the existing (working) disk to the new one:

sgdisk -R /dev/NEW-DISK /dev/EXISTING-DISK

For example, in my output, the faulty disk (which has been replaced with a new one) is /dev/nvme1n1 and the existing, working disk is /dev/nvme0n1 so the command should be as follows:

sgdisk -R /dev/nvme1n1 /dev/nvme0n1

The above command will copy the partition table and the disk GUID. To avoid conflicts, randomize the new disk GUID using this command:

sgdisk -G /dev/NEW-DISK



If sgdisk command is not found, install gdisk by using the following command on RHEL/Almalinux OS:

dnf install gdisk

 


Step 3: Re-add the New Disk Partitions

Run the following commands to re-add the new disk partitions to the RAID array:

mdadm --manage /dev/md0 --add /dev/nvme1n1p2
mdadm --manage /dev/md1 --add /dev/nvme1n1p3
mdadm --manage /dev/md127 --add /dev/nvme1n1p1

Important: replace the partition names with the correct ones on your system.

The rebuild process will start immediately after you add the partitions. You can monitor the process using the command:

cat /proc/mdstat

 


Optional: Speed Up the Rebuilding Process

To speed up the rebuilding process, you can increase the speed limit to 400MB/s or more if you are using NVMe disks:

echo 400000 > /proc/sys/dev/raid/speed_limit_max


This value will reset after reboot.


More information in this article https://www.redhat.com/en/blog/raid-drive-mdadm

  • mdadm, raid, linux, software raid
  • 984 Users Found This Useful
Was this answer helpful?

Related Articles

How to reinstall the OS on my server?

If you already have an OS installed on your server and want to change or reinstall it, you can do...

How to set rDNS records for my dedicated server IPs?

You can update rDNS records of your server IPs from Client Area (One Portal) Log in to your...

How to change SSH Port on CentOS 7/8/9

Here's how to change SSH port on Almalinux/RHEL to increase your server security. Changing the...

Do you provide KVM/IPMI access?

Yes, we provide KVM access to all servers in all locations.KVM access is free of charge in all...

Can I install a custom OS?

Yes, you can install a custom OS (OS that is not listed on the server order page) by attaching...