I recently set up a new server here at the office that includes three RAID1 arrays. I have two, identical 160GB drives with 3 partitions. Each partition is in an array with its "twin" partition on the opposite drive. On boot, I'm getting a degraded array error.
/proc/mdstat contains
root@securevm [~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1]
102388 blocks super 1.0 [2/1] [_U]
md2 : active raid1 sdb3[1]
139801532 blocks super 1.1 [2/1] [_U]
bitmap: 2/2 pages [8KB], 65536KB chunk
md1 : active raid1 sdb2[1]
16382908 blocks super 1.1 [2/1] [_U]
unused devices: <none>
mdadm returns
root@securevm [~]# mdadm -D /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Mon Jan 23 05:54:37 2012
Raid Level : raid1
Array Size : 102388 (100.01 MiB 104.85 MB)
Used Dev Size : 102388 (100.01 MiB 104.85 MB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Tue Jan 24 08:33:30 2012
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : <snip>
UUID : 2bfa1531:acf02178:f07dd7bd:8b5f18cf
Events : 120
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
This is repeated for the other two arrays
If I'm reading this correctly, it seems my arrays aren't seeing my first drive (/dev/sda), right?
Smartctl shows no issues with either drive, so I'm assuming the drives are good, but just not sync'd.
How do I fix this?
Thanks in advance.
Posts
In a raid 1 implementation with 2 hard drives that means one of the two hard drives has failed. You can try removing the hard drive and re-inserting it while hot to see if it will recognize and rebuild the hard drive (if you suspect the hard drive itself is fine). Make sure you are not removing the active drive while you do this, or you will kill the machine and potentially lose data. The other method is to force a rebuild through a raid managment utility if one exists for your server implementation.
I think the last time i dealt with this on a linux machine I rebooted it and used the built in raid management utility before I got into the operating system. When rebooting watch for a prompt that tells you to hit a key combination like CTRL+C. The management utility should tell you the serial number of the drive that failed (failing that, the SN of the drive that you dont want to remove) and you should be able to force a rebuild.
Also not familiar with Linux RAID config, but this would seem to indicate that the server thinks that device 0 is not connected. This could mean that there's a physical connection issue (possibly with a cable or backplane problem) or the software isn't actively monitoring the connection status and needs to be triggered to rescan the bus for changes.
@ceres, go ahead and lock this if you'd like.
Since my first (/dev/sda) drive is the one that is "faulty" I had to overwrite its superblock and then add it back to the array:
I had 3 arrays to fix so I repeated the command for sda2 and sda3.
Depending on the size of the array, it can take some time to rebuild it. You can check the status by reading the /proc/mdstat file
And when complete:
And finally:
The state is clean and there are two active and sync'd devices.
Problem solved.