The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

[solved] Degraded RAID1 Array

SeñorAmorSeñorAmor !!!Registered User regular
edited January 2012 in Help / Advice Forum
I recently set up a new server here at the office that includes three RAID1 arrays. I have two, identical 160GB drives with 3 partitions. Each partition is in an array with its "twin" partition on the opposite drive. On boot, I'm getting a degraded array error.

/proc/mdstat contains
root@securevm [~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb1[1]
      102388 blocks super 1.0 [2/1] [_U]
      
md2 : active raid1 sdb3[1]
      139801532 blocks super 1.1 [2/1] [_U]
      bitmap: 2/2 pages [8KB], 65536KB chunk

md1 : active raid1 sdb2[1]
      16382908 blocks super 1.1 [2/1] [_U]
      
unused devices: <none>

mdadm returns
root@securevm [~]# mdadm -D /dev/md0
/dev/md0:
        Version : 1.0
  Creation Time : Mon Jan 23 05:54:37 2012
     Raid Level : raid1
     Array Size : 102388 (100.01 MiB 104.85 MB)
  Used Dev Size : 102388 (100.01 MiB 104.85 MB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Tue Jan 24 08:33:30 2012
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : <snip>
           UUID : 2bfa1531:acf02178:f07dd7bd:8b5f18cf
         Events : 120

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       17        1      active sync   /dev/sdb1

This is repeated for the other two arrays

If I'm reading this correctly, it seems my arrays aren't seeing my first drive (/dev/sda), right?

Smartctl shows no issues with either drive, so I'm assuming the drives are good, but just not sync'd.

How do I fix this?

Thanks in advance.

SeñorAmor on

Posts

  • DraygoDraygo Registered User regular
    I am not familiar with raid management on linux machines but, degraded means that in the case of another hard drive failure you will lose data.

    In a raid 1 implementation with 2 hard drives that means one of the two hard drives has failed. You can try removing the hard drive and re-inserting it while hot to see if it will recognize and rebuild the hard drive (if you suspect the hard drive itself is fine). Make sure you are not removing the active drive while you do this, or you will kill the machine and potentially lose data. The other method is to force a rebuild through a raid managment utility if one exists for your server implementation.

    I think the last time i dealt with this on a linux machine I rebooted it and used the built in raid management utility before I got into the operating system. When rebooting watch for a prompt that tells you to hit a key combination like CTRL+C. The management utility should tell you the serial number of the drive that failed (failing that, the SN of the drive that you dont want to remove) and you should be able to force a rebuild.

  • spool32spool32 Contrary Library Registered User, Transition Team regular
    RAID1 is mirroring - your mirror is fractured. You need to rebuild it, but I'm not familiar with how you'd do that with your particular setup.

  • RuckusRuckus Registered User regular
    Number   Major   Minor   RaidDevice State
           0       0        0        0      removed
           1       8       17        1      active sync   /dev/sdb1
    

    Also not familiar with Linux RAID config, but this would seem to indicate that the server thinks that device 0 is not connected. This could mean that there's a physical connection issue (possibly with a cable or backplane problem) or the software isn't actively monitoring the connection status and needs to be triggered to rescan the bus for changes.

  • SeñorAmorSeñorAmor !!! Registered User regular
    I ended up fixing this myself. One of my disks must have missed some data (during a reboot or something) and it wasn't perfectly in sync. I removed and then readded the "faulty" disk which forced the array to rebuild itself. It seems the problem is solved.

    @ceres, go ahead and lock this if you'd like.

  • SeñorAmorSeñorAmor !!! Registered User regular
    edited January 2012
    If anyone cares, here are the commands to fix this issue.

    Since my first (/dev/sda) drive is the one that is "faulty" I had to overwrite its superblock and then add it back to the array:
    mdadm --zero-superblock /dev/sda1
    mdadm --add /dev/md0 /dev/sda1
    

    I had 3 arrays to fix so I repeated the command for sda2 and sda3.

    Depending on the size of the array, it can take some time to rebuild it. You can check the status by reading the /proc/mdstat file
    root@securevm [~]# cat /proc/mdstat
    Personalities : [raid1] 
    md0 : active raid1 sda1[2] sdb1[1]
          102388 blocks super 1.0 [2/2] [UU]
          
    md2 : active raid1 sda3[2] sdb3[1]
          139801532 blocks super 1.1 [2/1] [_U]
          [>....................]  recovery =  1.8% (2523072/139801532) finish=27.2min speed=84102K/sec
          bitmap: 2/2 pages [8KB], 65536KB chunk
    
    md1 : active raid1 sda2[2] sdb2[1]
          16382908 blocks super 1.1 [2/2] [UU]
          
    unused devices: <none>
    

    And when complete:
    root@securevm [~]# cat /proc/mdstat
    Personalities : [raid1] 
    md0 : active raid1 sda1[2] sdb1[1]
          102388 blocks super 1.0 [2/2] [UU]
          
    md2 : active raid1 sda3[2] sdb3[1]
          139801532 blocks super 1.1 [2/2] [UU]
          bitmap: 1/2 pages [4KB], 65536KB chunk
    
    md1 : active raid1 sda2[2] sdb2[1]
          16382908 blocks super 1.1 [2/2] [UU]
          
    unused devices: <none>
    

    And finally:
    root@securevm [~]# mdadm -D /dev/md1
    /dev/md1:
            Version : 1.1
      Creation Time : Mon Jan 23 05:54:37 2012
         Raid Level : raid1
         Array Size : 16382908 (15.62 GiB 16.78 GB)
      Used Dev Size : 16382908 (15.62 GiB 16.78 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
    
        Update Time : Tue Jan 24 09:38:11 2012
              State : clean 
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    
               Name : <snip>
               UUID : a00794a5:880ec6f7:e77e7254:86101dba
             Events : 32
    
        Number   Major   Minor   RaidDevice State
           2       8        2        0      active sync   /dev/sda2
           1       8       18        1      active sync   /dev/sdb2
    

    The state is clean and there are two active and sync'd devices.

    Problem solved.

    SeñorAmor on
This discussion has been closed.