So, I'm doing research on effective therapy. We have a few therapists (8-10) who have clients who have agreed to be part of our study, and we're recording their sessions. I need to set up a machine on which to store/edit the recorded video. Right now, we've just got a bunch of hard drives, one for each therapist, but this scares me, because it just takes one drive failure to wipe out a few hundred hours of video.
Basically, this machine will serve as a video storage/editing vault--it won't need to be connected to the internet, but i'll be backing up recorded interview sessions to it weekly.
Data redundancy/protection from failure is highest priority, but i'll also be doing some light editing on it (trimming sections, compiling multiple clips, no high end rendering or anything like that).
I've been thinking that I'll just build a basic box with a hardware RAID card, but my knowledge of that is theoretical (meaning, I've only read some wikipedia/tom's hardware stuff, i've never actually done it.)
So, questions:
Which type of RAID will best serve me? (I'm leaning to RAID 5 or RAID 10 right now)
What should I be looking for in a good hardware RAID card to handle/rebuild the RAID?
Thanks in advance for help on this, folks.
Edit: damn typos.
Posts
RAID 10 will give you better performance, but you need to spend more money for the same capacity. For N disks of the same size in a RAID10 array, where N is some multiple of 4, you get N/2 disks' worth of capacity. In other words, you sacrifice fully half your total capacity for data redundancy. Overall performance is slightly faster in normal operation because there's no parity data overhead, and you can tolerate up to N/2 drive failures, as long as any two failed disks are not in the same mirrored pair. For example, if you had four disks in a RAID 10, it would be possible to have two drives fail and not lose any data, as long as the two drives weren't mirroring each other. Another benefit to RAID 10 is that drive failures have minimal impact on performance compared to RAID 5. The controller can read data from the remaining mirror drive at the full speed that the drive supports.
It sounds like cost/capacity would be important than performance in your application, so I'd say RAID 5 would be the best solution available for what you want to do. You may also want to consider having more than one array and mirroring the data between the two arrays, for example maybe have a second JBOD array to mirror data onto. Or even a separate machine to mirror data onto. My point being, don't trust a single solution with your data. Putting all your data on a RAID 5 array is better than having it on a single disk, but you're still just one RAID controller firmware bug away from losing everything. I've actually seen this happen. One of the places I used to work, we used Dell NAS units for network backups and thought "we're good, we've got everything backed up on a RAID 5 array". Except the Dell firmware had some bugs and the arrays corrupted themselves with alarming frequency. Thankfully we also had tape backups, but it was still shocking to see how unreliable the firmware could be on an enterprise-grade RAID controller integrated into a device whose only purpose is to provide networked storage.
I currently work as an Oracle DBA, and we use RAID 5 for the vast majority of our servers. Some of our high-end units also include a separate RAID 10 array where we put database files that see higher than average I/O, but for the most part RAID 5 is the go-to solution because it satisfies reliability needs while maximizing cost effectiveness.
That way you could just have an editing workstation, but have the video archived on DVD. Considering that most data for research is only usually required/supposed to be kept for like 7-10 years after conclusion of the study, the DVDs don't need to remain readable forever, and it's easier/cheaper than trying to set up a huge RAID array for video storage.
A followup question--how can I find reliability stats for different cards? I trust Newegg pretty implicitly, and they've yet to steer me wrong. This is the card i've got my eye on--PCI-e and 8 SATA II ports. I'm totally unfamiliar with areca as a brand, but the overall feeling on Newegg is positive. I'll likely be running it under plain vanilla WinXPPro, unless I am persuaded otherwise.
Manufacturer wise, I'm a fan of Areca. They were essentially a nobody when my company took a gamble on their products for a test system, and they've all function super-excellently for us.
This is a good four port SATA Areca card, but they make a very wide variety of cards, SATA and SAS, anywhere between 4 and 24 drives, various RAID level support, and various administrative options (some of their cards even come with a dedicated Ethernet port for managing the controller, it hosts a webserver that allows array construction/configuration/maintenance).
[edit] Ha ha ha, ya, I'd totally recommend that card, but it's a bit cheaper from my link.
[edit] Whoops mines the 4 port, the 8 port is more expensive at mine.
Ruckus, thanks for the endorsement. I'm pretty set on needing 8 ports, though, and that link is to a 4 port card. Appreciate the perspective, though--it sounds like Areca is solid.
Edit: Haha, ITT: Data and post redundancy.
It's pretty easy to figure out the number of ports you'll need.
Current harddrives max out at 2TB, so in a RAID5, a 4 port will give you up to 6TB of space, and an 8 port will allow for up to 14TB.
[edit] Don't forget to take power consumption into consideration. Spinning up that many harddrives concurrently can put a strain on most PSU's.
I'm thinking of going with the 8 port cause it will end up being cheaper in the end.
8 port card: $440 for card, 9 x $95 1TB drives (one spare)=$1290 for 7TB of space
4 port card: $340 for card, 5 x $210 2TB drives (one spare)= $1390 for 6TB of space
Edit: On second thought, 8 drives spinning is just asking for one of them to fail. Any experience with the brand 3Ware? This card of theirs is 4 ports for a bit cheaper, but I don't know about their reputation.
If you are seriously concerned about these issues you need a true backup solution, not just a RAID array. Spend some additional money and get something you can do nightly backups with so you don't lose it all when your RAID controller loses its mind or dies.
Care to give an example? Are you talking about an offsite solution? I'll be accessing the machine probably twice weekly to backup/compile video. When I'm not using it, it will be off (unplugged and locked in a storage cabinet)--so it's not going to be accessed constantly (like a server).
I'd recommend you leave this thing running all the time, preferably with a UPS battery backup to carry it over short power outages or brownouts. I avoid shutting my systems off unless they're going to be completely unattended for more than a week.
And on backups, I'd recommend a Network Attached Storage device like the Dlink DNS-323. It can hold a pair of SATA harddrives either striped, mirrored, or JBOD. Then just use ntbackup and set the NAS as the destination, scheduled to run whenever (another reason I never shut my systems off).
Ruckus, why would I want to leave it running all the time? To clarify, this is the backup. All of the video is stored on the psychologists individual machines, then I'll be backing it up to this array as well.