Linux Software RAID
mdadm
Force rebuild of a failed RAID
Example for /dev/md10
The problem: Two failed disks in a RAID5
Looks ugly but maybe we have luck and the disks are just marked as bad.
cat /proc/mdstat
<source lang=bash>
- cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] ... md10 : inactive sdap1[11] sdao1[5] sdah1[15](S) sdag1[4] sdy1[3] sdz1[14] sdr1[8] sdb1[13] sdq1[16](S) sdi1[1] sda1[12]
5236577280 blocks super 1.2
... </syntaxhighlight> State is inactive this is not what we want... look for the details in the next step
mdadm --detail
<source lang=bash>
- mdadm --detail /dev/md10
/dev/md10: Version : 1.2 Creation Time : Wed Feb 6 13:44:52 2013 Raid Level : raid5 Used Dev Size : 476052288 (454.00 GiB 487.48 GB) Raid Devices : 11 Total Devices : 11 Persistence : Superblock is persistent
Update Time : Wed Jun 15 17:46:57 2016 State : active, FAILED, Not Started Active Devices : 9 Working Devices : 11 Failed Devices : 0 Spare Devices : 2
Layout : left-symmetric Chunk Size : 64K
Name : md10 UUID : 82f2b88d:276a1fd3:55a4928e:b2228edf Events : 17071
Number Major Minor RaidDevice State 11 66 145 0 active sync /dev/sdap1 1 8 129 1 active sync /dev/sdi1 2 0 0 2 removed 3 65 129 3 active sync /dev/sdy1 4 66 1 4 active sync /dev/sdag1 5 66 129 5 active sync /dev/sdao1 12 8 1 6 active sync /dev/sda1 7 0 0 7 removed 8 65 17 8 active sync /dev/sdr1 13 8 17 9 active sync /dev/sdb1 14 65 145 10 active sync /dev/sdz1
15 66 17 - spare /dev/sdah1 16 65 1 - spare /dev/sdq1
</syntaxhighlight>
Force the rescan and reassemble the RAID
For a SCSI-rescan you can try this: Scan all SCSI buses for new devices
And you have to do this: <source lang=bash>
- mdadm --scan /dev/md10
- mdadm --assemble --force --scan
- mdadm --run /dev/md10
</syntaxhighlight>
Check the status
<source lang=bash>
- mdadm --detail /dev/md10
/dev/md10: Version : 1.2 Creation Time : Wed Feb 6 13:44:52 2013 Raid Level : raid5 Array Size : 4760522880 (4539.99 GiB 4874.78 GB) Used Dev Size : 476052288 (454.00 GiB 487.48 GB) Raid Devices : 11 Total Devices : 12 Persistence : Superblock is persistent Update Time : Thu Jun 16 10:59:16 2016 State : clean, degraded, recovering Active Devices : 10 Working Devices : 12 Failed Devices : 0 Spare Devices : 2 Layout : left-symmetric Chunk Size : 64K Rebuild Status : 5% complete Name : md10 UUID : 82f2b88d:276a1fd3:55a4928e:b2228edf Events : 17074 Number Major Minor RaidDevice State 11 66 145 0 active sync /dev/sdap1 1 8 129 1 active sync /dev/sdi1 16 65 1 2 spare rebuilding /dev/sdq1 3 65 129 3 active sync /dev/sdy1 4 66 1 4 active sync /dev/sdag1 5 66 129 5 active sync /dev/sdao1 12 8 1 6 active sync /dev/sda1 7 8 145 7 active sync /dev/sdj1 8 65 17 8 active sync /dev/sdr1 13 8 17 9 active sync /dev/sdb1 14 65 145 10 active sync /dev/sdz1 15 66 17 - spare /dev/sdah1
</syntaxhighlight> This is good:
State : clean, degraded, recovering
Better wait with the next reboot for completion:
Rebuild Status : 5% complete
It should continue rebuilding if you boot but... know the devils...
Replace a disk in a mirror
Device /dev/cciss/c0d1 is a replaced and new disk in a HP Array Controller
<source lang=bash> [root@app02 ~]# sfdisk -d /dev/cciss/c0d0 | sfdisk --no-reread --force /dev/cciss/c0d1 [root@app02 ~]# mdadm --manage /dev/md0 --fail /dev/cciss/c0d1p1 [root@app02 ~]# mdadm --manage /dev/md0 --remove /dev/cciss/c0d1p1 [root@app02 ~]# mdadm --manage /dev/md0 --add /dev/cciss/c0d1p1 [root@app02 ~]# mdadm --manage /dev/md1 --fail /dev/cciss/c0d1p2 [root@app02 ~]# mdadm --manage /dev/md1 --remove /dev/cciss/c0d1p2 [root@app02 ~]# mdadm --manage /dev/md1 --add /dev/cciss/c0d1p2 [root@app02 ~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 cciss/c0d1p2[2] cciss/c0d0p2[0]
36925312 blocks [2/1] [U_] resync=DELAYED
md0 : active raid1 cciss/c0d1p1[2] cciss/c0d0p1[0]
256003712 blocks [2/1] [U_] [>....................] recovery = 0.0% (38144/256003712) finish=2680.2min speed=1589K/sec
unused devices: <none> </syntaxhighlight>