Linux Software RAID

From Lolly's Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

mdadm

Force rebuild of a failed RAID

Example for /dev/md10

The problem: Two failed disks in a RAID5

Looks ugly but maybe we have luck and the disks are just marked as bad.

cat /proc/mdstat

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
...
md10 : inactive sdap1[11] sdao1[5] sdah1[15](S) sdag1[4] sdy1[3] sdz1[14] sdr1[8] sdb1[13] sdq1[16](S) sdi1[1] sda1[12]
       5236577280 blocks super 1.2
...

State is inactive this is not what we want... look for the details in the next step

mdadm --detail

# mdadm --detail /dev/md10
 /dev/md10:
          Version : 1.2
    Creation Time : Wed Feb  6 13:44:52 2013
       Raid Level : raid5
    Used Dev Size : 476052288 (454.00 GiB 487.48 GB)
     Raid Devices : 11
    Total Devices : 11
      Persistence : Superblock is persistent

      Update Time : Wed Jun 15 17:46:57 2016
            State : active, FAILED, Not Started
   Active Devices : 9
 Working Devices : 11
   Failed Devices : 0
    Spare Devices : 2

           Layout : left-symmetric
       Chunk Size : 64K

             Name : md10
             UUID : 82f2b88d:276a1fd3:55a4928e:b2228edf
           Events : 17071

      Number   Major   Minor   RaidDevice State
        11      66      145        0      active sync   /dev/sdap1
         1       8      129        1      active sync   /dev/sdi1
         2       0        0        2      removed
         3      65      129        3      active sync   /dev/sdy1
         4      66        1        4      active sync   /dev/sdag1
         5      66      129        5      active sync   /dev/sdao1
        12       8        1        6      active sync   /dev/sda1
         7       0        0        7      removed
         8      65       17        8      active sync   /dev/sdr1
        13       8       17        9      active sync   /dev/sdb1
        14      65      145       10      active sync   /dev/sdz1

        15      66       17        -      spare   /dev/sdah1
        16      65        1        -      spare   /dev/sdq1


Force the rescan and reassemble the RAID

For a SCSI-rescan you can try this: Scan all SCSI buses for new devices

And you have to do this:

# mdadm --scan /dev/md10
# mdadm --assemble --force --scan
# mdadm --run /dev/md10

Check the status

# mdadm --detail /dev/md10
 
 /dev/md10:
          Version : 1.2
    Creation Time : Wed Feb  6 13:44:52 2013
       Raid Level : raid5
       Array Size : 4760522880 (4539.99 GiB 4874.78 GB)
    Used Dev Size : 476052288 (454.00 GiB 487.48 GB)
     Raid Devices : 11
    Total Devices : 12
      Persistence : Superblock is persistent
 
      Update Time : Thu Jun 16 10:59:16 2016
            State : clean, degraded, recovering
   Active Devices : 10
 Working Devices : 12
   Failed Devices : 0
    Spare Devices : 2
 
           Layout : left-symmetric
       Chunk Size : 64K
 
   Rebuild Status : 5% complete
 
             Name : md10
             UUID : 82f2b88d:276a1fd3:55a4928e:b2228edf
           Events : 17074
 
      Number   Major   Minor   RaidDevice State
        11      66      145        0      active sync   /dev/sdap1
         1       8      129        1      active sync   /dev/sdi1
        16      65        1        2      spare rebuilding   /dev/sdq1
         3      65      129        3      active sync   /dev/sdy1
         4      66        1        4      active sync   /dev/sdag1
         5      66      129        5      active sync   /dev/sdao1
        12       8        1        6      active sync   /dev/sda1
         7       8      145        7      active sync   /dev/sdj1
         8      65       17        8      active sync   /dev/sdr1
        13       8       17        9      active sync   /dev/sdb1
        14      65      145       10      active sync   /dev/sdz1
 
        15      66       17        -      spare   /dev/sdah1

This is good:

State : clean, degraded, recovering

Better wait with the next reboot for completion:

Rebuild Status : 5% complete

It should continue rebuilding if you boot but... know the devils...

Replace a disk in a mirror

Device /dev/cciss/c0d1 is a replaced and new disk in a HP Array Controller

[root@app02 ~]# sfdisk -d /dev/cciss/c0d0 | sfdisk --no-reread --force /dev/cciss/c0d1
[root@app02 ~]# mdadm --manage /dev/md0 --fail /dev/cciss/c0d1p1
[root@app02 ~]# mdadm --manage /dev/md0 --remove /dev/cciss/c0d1p1
[root@app02 ~]# mdadm --manage /dev/md0 --add /dev/cciss/c0d1p1
[root@app02 ~]# mdadm --manage /dev/md1 --fail /dev/cciss/c0d1p2
[root@app02 ~]# mdadm --manage /dev/md1 --remove /dev/cciss/c0d1p2
[root@app02 ~]# mdadm --manage /dev/md1 --add /dev/cciss/c0d1p2
[root@app02 ~]# cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 cciss/c0d1p2[2] cciss/c0d0p2[0]
      36925312 blocks [2/1] [U_]
      	resync=DELAYED
      
md0 : active raid1 cciss/c0d1p1[2] cciss/c0d0p1[0]
      256003712 blocks [2/1] [U_]
      [>....................]  recovery =  0.0% (38144/256003712) finish=2680.2min speed=1589K/sec
      
unused devices: <none>