Hey guys,
I'm currently doing CS research at a university (can you guess which one?
) and I'm in the need to host between 4 and 6 TB of data. I know that this is peanuts for the storage and database communities, but I'm not a member of one of them!
The intention is to do some exploratory data mining with this stuff. I am writing a research proposal where I can request
a server (as in a tower PC, don't have access to rack space). My instinct is to buy a memory heavy, Core 2 Duo box, put 4 x 1.5TB drives in there with RAID (which RAID do I want here? So many cryptic numbers), bung Ubuntu on there and then a database program... I can get my hands on DB2 for free, but I might try PostgreSQL as I know that last.fm are running 10s of TB on that, and it seems to be doing them OK.
What are people's recommendations about how to do this? The easy way is the better way, as I'm still writing the research proposal. An extra hundred bucks is not really a problem.
Posts
For example, at work the file server I built is set up with eight 750GB drives in RAID6. The eighth drive is a spare (I like lots of redundancy), that gives me a total of 3.4TB for actual storage.
Also, be prepared for cost. The hard drives will be the single largest cost. Also, get a good RAID card. Don't go with software RAID, it is unreliable. I recommend 3ware for good SATA RAID cards.
EDIT: The 750GB drives were the best and largest on the market at the time. When we go with RAID server 3 (the current one is the second one) then we'll again go with the best and largest SATA drives we can.
EDIT2: If you can convince the university for lots of money to buy large SCSI drives then I shall worship you. Not even my work is willing to pay the kind of money that is involved in buying very large SCSI drives in large quantities.
---
I've got a spare copy of Portal, if anyone wants it message me.
I've got that 1.5TB drive but I'm not running it in RAID so I couldn't tell you one way or the other.
Me, I'd buy one of these:
http://www.nexsan.com/sataboy.php
Those are pretty expensive, but any generic raid disk tray will give you satisfactory results. Get a fiber channel card for any old PC, and plug into that.
Any dual core system with as much memory as you can afford should be good enough for the OS.
Fair point. That one was a little out of range at $15 000, but let's say I top out at $2500 all-in
How many HDs can I fit in a tower case? I was under the impression I'd top out at 4, but if people think I could get 6 x 1TB drives, what would I store it in?
You're getting a full tower right? You should be able to fit a shitload.
Most towers will fit a lot more than 4. I have a mid tower and it will hold 8 if I used 5.25 to 3.5 mounting adapters. I would be a bit scared of the heat, though. I'm sure a full tower would hold plenty.
I'd probably shoot for an 8 port card, and get 6-8 1.5TB drives. Will probably eat up a huge chunk of your $2500 budget, leave you a bit under $1000 for the rest of the computer. Aim for 8GB of memory or more, for serious I/O work memory is more important than CPU.
Unless you plan on doing a lot of serious processing on your data. Then you might want to consider a quad core cpu, depends how well your software is threaded.
I think it's only the 750gb drives, isn't it?
OK, so:
- 1 x 8 SATA RAID card = $480 (I looked at this one, but the price seems expensive!)
- 6 x 1.5 TB drives at $150 each = $900
- Dell server that has six 3.5 internal drive bays = $1100
So all in I'm at $2480, excluding sales tax.
Does that seem right to you guys?
There are days I dream of 5tb SCSI/SAS drives just for personal use. 10 or 15krpm of course. I still salivate at that new X58 board with SAS onboard.
The 750s had the firmware that disabled cache. that was fun. put four of those damn things in my mac and would get staggering delays anytime i moved between windows or monitors. i was pissed but totally neglected to research the drives. just flashed them finally (without Seagate since they refuse to post it on the web where you know we could actually find it and they never responded to my emails) and they appear to have cache enabled and work under osx again.
You'll probably want more memory than that. The rest of the system looks okay though. I really wouldn't consider anything under 4GB for data crunching.. especially if it's working on 6 terabytes of data.
Their kb site is down for me right now, probably being hammered thanks to /.. There are ~15 different models of 7200.11, 3 or 4 models of es.2 7200.11 and about 10 different models of maxtor drives that are effected by the bug.
Is there a serial # range this affects or something?
There is a utility to run that will give the model serial and firmware revision. If yours is in the list, you email them the info and they aparently never get back to you (I've been waiting since day 1). You can find more about it here -> http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931
edit - nice - they have put the firmwares online