Monday 6 January 2014

Replacing mirrored disks under Linux software RAID

So you've got a Linux server using software RAID with disk mirroring configured and when it's healthy it looks something like this:
# cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [linear] 
md3 : active raid1 sdb3[1] sda3[0]
      949995648 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[1] sda2[0]
      8514368 blocks [2/2] [UU]
      
md1 : active raid1 sdb1[1] sda1[0]
      8602688 blocks [2/2] [UU]
      
md4 : active raid1 sdb4[1] sda4[0]
      9638912 blocks [2/2] [UU]

A disk goes pop, the system is still running but you need to replace the dead disk. After physically replacing the disk here's how to add it back to the raid configuration.

So we see that /dev/sda has died.
# cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [linear] 
md3 : active raid1 sdb3[1]
      949995648 blocks [2/1] [_U]
      
md2 : active raid1 sdb2[1]
      8514368 blocks [2/1] [_U]
      
md1 : active raid1 sdb1[1]
      8602688 blocks [2/1] [_U]
      
md4 : active raid1 sdb4[1]
      9638912 blocks [2/1] [_U]

First copy the partition table from /dev/sdb onto /dev/sda:
# sfdisk -d /dev/sdb > /tmp/partitiontable
# sfdisk /dev/sda < /tmp/partitiontable
# sfdisk -l /dev/sda ; sfdisk -l /dev/sdb  (They should now be identical)

Now add the partitions on /dev/sda back into the raid devices:
# mdadm -a /dev/md1 /dev/sda1
# mdadm -a /dev/md2 /dev/sda2
# mdadm -a /dev/md3 /dev/sda3
# mdadm -a /dev/md4 /dev/sda4

This will probably take some time so watch its' progress with:
# watch -n 5 cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [linear]
md3 : active raid1 sda3[2] sdb3[1]
      949995648 blocks [2/1] [_U]
      [>....................]  recovery =  2.3% (21904000/949995648) finish=241.8min speed=63952K/sec

md2 : active raid1 sda2[0] sdb2[1]
      8514368 blocks [2/2] [UU]

md1 : active raid1 sda1[0] sdb1[1]
      8602688 blocks [2/2] [UU]

md4 : active raid1 sda4[2] sdb4[1]
      9638912 blocks [2/1] [_U]
        resync=DELAYED