Recovering from a RAID ‘Degraded Array’ event

OK, so you have that nice and shiny server up and running with a RAID array, and all of a sudden you start getting ‘Degraded Array’ messages…

Assuming the error is not hardware-related (i.e. one drive of the array that is failing) you can easily recover from this message.

Examine the current RAID status

# cat /proc/mdstat

A valid RAID will look like this:

md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

An invalid RAID entry will look like:

md2 : active raid1 sda2[0]
78043648 blocks [2/1] [U_]

Examine the failing entry

# mdadm -D /dev/md2

will return you the status messages including:

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0       -1      removed

Only /dev/sda2 is in active sync.

Aha, it seems that /dev/sdb2 is missing from the array. Let’s add it again

# mdadm /dev/md2 -a /dev/sdb2

Reverify the RAID status

# cat /proc/mdstat

Returns

Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
      78043648 blocks [2/1] [U_]
      [=====>...............]
        recovery = 28.3% (22156672/78043648) 
        finish=36.9min speed=25179K/sec
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

OK, the system is now rebuilding the md2 entry.

This can also be verified with

# mdadm -D /dev/md2

that will return

/dev/md2:
        Version : 00.90.01
  Creation Time : Tue Apr 11 21:07:15 2006
     Raid Level : raid1
     Array Size : 78043648 (74.43 GiB 79.92 GB)
    Device Size : 78043648 (74.43 GiB 79.92 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent
    Update Time : Sat Nov  4 10:55:33 2006
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
 Rebuild Status : 30% complete
    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0       -1      removed
       2       8       18        1      spare   /dev/sdb2
           UUID : e151ccd0:5bb92b79:d26db88d:0581e61d
         Events : 0.4130158

Let it finish and you are up and running again…