Hard drive raw data read/write speeds.
Hi,
So i need tech answers sirs and madams.
So, short summary of my issue. I've got hard drives that are running a database. This database does not cache the data (for reliability), so whats been discovered is that the 160 MB/s read /write average speed (per the spec sheets) hard drives we have running them is actually running at about 200kb/s when it comes to database writes.
I've been over the spec sheet for the drive and cannot for the life of me find a number that gives me the true raw write/read speed of these drives. While its obvious the company is not going to hamper its ability to sell the drives by advertising low numbers, i need to know what to look for on a spec sheet so i can point to it and tell my boss what drive would meet the needs for a product that is expecting a 10MB/s write speed.
The only way i was able to determine the current hard drives speed was to use dd with the oflag=dsync option. how do i find this number (or at least calculate it with the given information) when shopping?
Thanks
Stercus, Stercus, Stercus, Morituri Sum
0
Posts
Data Sheet Interpretation
Manufacturing data-sheets for a drive often put the best number for the drive's stated work load, queue-depth, interface technology, with an idealized system (best HBA, no other processing going on, etc).
For example, take WD's datasheet for the Purple drive: http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-800012.pdf
The first thing that should strike you as a thing is that they're marketing this for what is a write-intense work-load (i.e., streaming video to be stored and read back at a later time). So, it's highly likely that the firmware loaded onto this particular product has been tuned to perform better in large-block host writes at the cost of host-read performance, and mixed workload.
Buffer to Host - The 6Gb/s suggests that it is a SATA-3 interface, so you should make sure your host system has a HBA that is also at least SATA-3.
Host to/from drive - This number is the grab-bag number based on whatever mixed work load performed the best. Is this 90% write/10% read? You don't know, that's why you have to look for other clues in the data sheet, or you can try e-mailing/calling your particular drive's manufacturer and ask. If they answer, great! If they don't, you'll have to get some samples and carry out your own testing.
Cache Write-Through Shenanigans
That dramatic reduction in performance when switching from write-back to write-through cache doesn't suggest that the drive is the sole (or even main) problem. It suggests that there is a problem with either A) hardware limitations with the HBA, or south bridge, when write-through cache is enabled, or a under tuned system, and kernel, for your desired use-case (some drives might have better performance on 520 sector sizes while some might have better performance on 512).
With this in mind, I'd suggest using either iometer or vdbench to gain some additional insight.
You are trying to use rotating hard drives like a solid state drive, and they aren't designed to work like that.
Let the drive cache the write and use a raid 1, raid 6, or raid 10 setup, in order to get your reliability.
I have worked in finance, supporting the traders, at both a mutual fund company and an exchange, and in neither case did they turn off drive caching to handle their reliability issues.
It really does sound like a database/software problem, not a hardware one.
If you are getting dirty reads from your database, aka, you're reading old data when you're 100% sure you've written new data, it means you aren't performing row, page, or table level locking during your database writes, which normally means that your database code is missing begin/end transaction statements, tying together a set of reads and writes that must be performed all at once. This is a common problem for non-database developers who don't know SQL well, because java/c/etc insulates them from the database layer, so they never learn how databases deal with concurrency issues, and thus, never learn about transactions.
Next, losing data during reboots brings up several points at once.
First, you need to figure out why you need to reboot in the first place, as well written programs should never need to reboot. For example, if you need to reboot due to a runaway memory leak in the code, you're better off fixing the memory leak.
Second, programs that write important data to the drive need to shutdown gracefully, so that they flush their output buffers and sync the disk before they actually terminate. There are a lot of different ways to handle graceful shutdowns, but none of them are automatic; you have to code them yourself. For servers, it's common to add a shutdown command to the listener thread. For clients, you need to intercept the termination signal, or add a listener thread.
Third, reboots cannot just happen, you've got to have a script to do them, one that sends out those graceful shutdown requests for all your important programs, before you actually perform the reboot.
Fourth, if you really cannot prevent reboots, due to stuff like blackouts or brownouts, you need to buy an uninterruptible power supply.
Without all four parts in place, you'll always lose data during a reboot.
We are aware of all of this. The database application has the protections you mention. The only time we lose data is when we are running shell scripts outside the database to receive and send data via polling. As far as the reboots we also have all that, the reboots occur because the users of the equipment are moro.......computer illiterate. we have 4000+ locations using this system, you are going to have Power outages, people not following the procedure which causes open delays while the delayed backup runs, then they get impatient and think the message on the screen telling them not to reboot the computer will go away if they reboot the computer.
As for my original question, I understand some of it now. I was confusing something i was told last week. Its not the hard drive buffers that are being unused by the database and that dd command, its the OS buffer its not using.
How often do you run stats or index?
What is the average read/write size?
Have you queries been optimized to use proper indexing and the appropriate data level locking is in place?
Is it local or over a network?
What database?
When it comes to databases, there are a lot of factors in play.
You're company's real problem is that they didn't assume worst case scenario when designing the software. There needs to be a store/forward/retry/restart design for all your client based apps. That is, store each command that comes across the wire, so they can be retried at a later, letting uncompleted ones restart after a reboot.
It actually doesn't seem like it'd be too hard to put that logic into your polling app. Just save all the commands received into a db table, execute the command, and update the db table with the result. Any rows in the db table without a completed result gets rerun upon a restart. (Note that I'd add 2 fail safes: a command that clears the queue, and a queue limit, over which non-clear commands get a "too busy" reply)
Okay, I cant answer all of your questions...I'm still learning a lot of this but I think we are getting off track here. Lemme kinda start over describing the problem.
The business uses a very Customized Suse Linux OS, and a Progress OpenEdge Database. We were doing hardware refreshes and the older model servers were no longer available, so we did some verification of the new hardware and the metrics we used were within tolerances, but when they started going to the field, we began getting intermittant reports of long pauses and delays during specific parts of the database application only on this hardware so I began investigating the problem.
Comparing the new gen hardware to the previous gen hardware showed it running much slower, but the problems were only showing up in the Database application under different kinds of tests then we ran for the hardware verification...for instance I would load, then delete 100k records and time both, then compare to other, older hardware. We narrowed the problem to the hard drive. When moved from the new server to a 2-generations ago model, the problem followed the hard drive and the 2 generations ago model hard drive performed great in the new hardware.
So we began investigating why with the vendor. We eventually discovered that the new hardware hard drives were advanced format (4k), and the image we use to create new systems for replacements were based on a 512 byte image. So we went thought the process of converting the image to a 4k image to put on the drive and while we saw an improvement in speed of our tests, it still was not as fast as the 2 generations ago hard drives. Meanwhile, we tested the image on another brand of 4k drive and it was much faster, showing numbers better then the 2-gen old harddrive, but the earlier conversion to the 4k had put us in the ballpark so the vendor wasnt going to be replacing the drives we already bought with a different brand.
Parallel to us speaking with the vendor who sold us the drives, we were also speaking with Progress about why this was occuring. Progress had us do some tests and claimed that our drives were writing very slowly, first bringing into the discussion the 'our 126 average read/write hard drive is actually writing at 150-200kB/s, and you are suppose to be writing at 10MB/s'. The dd tests with the dsync option showed the same thing his DB tests showed, so we are currently gathering data about this, but this isnt a satisfactory answer yet since our tests so far will show that load/delete tests are taking 100% more time on a drive that is only say 10% slower per the dd writing tests.
What I'm trying to understand is what my options are. Firmware for the drive? OS tuning? drivers? We are currently also in the process of converting to OracleOS, so whatever we do is going to have a short life anyway, I'm just trying to understand the issue better.
Replacing all our HD's with SSD's is not a viable solution. We did test SSD's a few years ago, and we did not really see enough of a performance improvement to make it cost effective.
Our polling app is fairly reliable, it does have built in controls, We do have the occasional loss of file, but given the number of systems and the number of failures, its reliable. I was just mentioning we do see the occasional issue with buffers getting wiped.