Hard drive raw data read/write speeds.

azith28 · June 2015

Hi,

So i need tech answers sirs and madams.

So, short summary of my issue. I've got hard drives that are running a database. This database does not cache the data (for reliability), so whats been discovered is that the 160 MB/s read /write average speed (per the spec sheets) hard drives we have running them is actually running at about 200kb/s when it comes to database writes.

I've been over the spec sheet for the drive and cannot for the life of me find a number that gives me the true raw write/read speed of these drives. While its obvious the company is not going to hamper its ability to sell the drives by advertising low numbers, i need to know what to look for on a spec sheet so i can point to it and tell my boss what drive would meet the needs for a product that is expecting a 10MB/s write speed.

The only way i was able to determine the current hard drives speed was to use dd with the oflag=dsync option. how do i find this number (or at least calculate it with the given information) when shopping?

Thanks

wunderbar · June 2015

So um, pretty much any spinning drive of 7200 rpm or more is more than capable of 10MB/s write speeds. if you have a write speed bottleneck it isn't going to be the hard drive. I'd actually gather it's more that database that's bottlenecking it more than anything, either from just being pooly optimized or the fact that it is expecting to write to a cache that then ends up being sequentially written to disk when necessary. If the database isn't designed to write directly to disk without cache that could very well be your problem.

azith28 · June 2015

No, the database is designed to go straight to the disk and the drives are 7200 rpms. the commands im doing with dd are command line options, not part of the database, and they show the raw data write speed of 200kb/s when writing straight to the disk. Sure, with the cache in play it shows 60-100 MB.

thatassemblyguy · June 2015

What you want is either iometer or vdbench.

thatassemblyguy · June 2015

I realize I didn't read to the end of your post last night, and didn't provide a meaningful answer.

Data Sheet Interpretation
Manufacturing data-sheets for a drive often put the best number for the drive's stated work load, queue-depth, interface technology, with an idealized system (best HBA, no other processing going on, etc).

For example, take WD's datasheet for the Purple drive: http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-800012.pdf

The first thing that should strike you as a thing is that they're marketing this for what is a write-intense work-load (i.e., streaming video to be stored and read back at a later time). So, it's highly likely that the firmware loaded onto this particular product has been tuned to perform better in large-block host writes at the cost of host-read performance, and mixed workload.

Buffer to Host - The 6Gb/s suggests that it is a SATA-3 interface, so you should make sure your host system has a HBA that is also at least SATA-3.

Host to/from drive - This number is the grab-bag number based on whatever mixed work load performed the best. Is this 90% write/10% read? You don't know, that's why you have to look for other clues in the data sheet, or you can try e-mailing/calling your particular drive's manufacturer and ask. If they answer, great! If they don't, you'll have to get some samples and carry out your own testing.

Cache Write-Through Shenanigans
That dramatic reduction in performance when switching from write-back to write-through cache doesn't suggest that the drive is the sole (or even main) problem. It suggests that there is a problem with either A) hardware limitations with the HBA, or south bridge, when write-through cache is enabled, or

a under tuned system, and kernel, for your desired use-case (some drives might have better performance on 520 sector sizes while some might have better performance on 512).

With this in mind, I'd suggest using either iometer or vdbench to gain some additional insight.

hsu · June 2015

Rotating hard drives are not meant to work in the way you are trying to use them. Stop using them that way.

You are trying to use rotating hard drives like a solid state drive, and they aren't designed to work like that.

Let the drive cache the write and use a raid 1, raid 6, or raid 10 setup, in order to get your reliability.

I have worked in finance, supporting the traders, at both a mutual fund company and an exchange, and in neither case did they turn off drive caching to handle their reliability issues.

azith28 · June 2015

Trust me, i see data loss due to buffer wipes on a reboot on about a daily basis due to the number of systems we have in the field. But the loss only occurs to things that are outside the database. Thats not a solution.

wunderbar · June 2015

well, we can only tell you what we think, and that is that your spinning hard drives are not the problem here. I've personally never seen a database set up in a method that writes directly to the platters without any kind of cacheing, because 99% of the people you'd talk to would say that that's not a best practice method.

It really does sound like a database/software problem, not a hardware one.

hsu · June 2015

Let me cover some common programming mistakes that may cause you to believe that the fault is with your hard drive.

If you are getting dirty reads from your database, aka, you're reading old data when you're 100% sure you've written new data, it means you aren't performing row, page, or table level locking during your database writes, which normally means that your database code is missing begin/end transaction statements, tying together a set of reads and writes that must be performed all at once. This is a common problem for non-database developers who don't know SQL well, because java/c/etc insulates them from the database layer, so they never learn how databases deal with concurrency issues, and thus, never learn about transactions.

Next, losing data during reboots brings up several points at once.

First, you need to figure out why you need to reboot in the first place, as well written programs should never need to reboot. For example, if you need to reboot due to a runaway memory leak in the code, you're better off fixing the memory leak.

Second, programs that write important data to the drive need to shutdown gracefully, so that they flush their output buffers and sync the disk before they actually terminate. There are a lot of different ways to handle graceful shutdowns, but none of them are automatic; you have to code them yourself. For servers, it's common to add a shutdown command to the listener thread. For clients, you need to intercept the termination signal, or add a listener thread.

Third, reboots cannot just happen, you've got to have a script to do them, one that sends out those graceful shutdown requests for all your important programs, before you actually perform the reboot.

Fourth, if you really cannot prevent reboots, due to stuff like blackouts or brownouts, you need to buy an uninterruptible power supply.

Without all four parts in place, you'll always lose data during a reboot.

azith28 · July 2015

hsu wrote: »

Let me cover some common programming mistakes that may cause you to believe that the fault is with your hard drive.

If you are getting dirty reads from your database, aka, you're reading old data when you're 100% sure you've written new data, it means you aren't performing row, page, or table level locking during your database writes, which normally means that your database code is missing begin/end transaction statements, tying together a set of reads and writes that must be performed all at once. This is a common problem for non-database developers who don't know SQL well, because java/c/etc insulates them from the database layer, so they never learn how databases deal with concurrency issues, and thus, never learn about transactions.

Next, losing data during reboots brings up several points at once.

First, you need to figure out why you need to reboot in the first place, as well written programs should never need to reboot. For example, if you need to reboot due to a runaway memory leak in the code, you're better off fixing the memory leak.

Second, programs that write important data to the drive need to shutdown gracefully, so that they flush their output buffers and sync the disk before they actually terminate. There are a lot of different ways to handle graceful shutdowns, but none of them are automatic; you have to code them yourself. For servers, it's common to add a shutdown command to the listener thread. For clients, you need to intercept the termination signal, or add a listener thread.

Third, reboots cannot just happen, you've got to have a script to do them, one that sends out those graceful shutdown requests for all your important programs, before you actually perform the reboot.

Fourth, if you really cannot prevent reboots, due to stuff like blackouts or brownouts, you need to buy an uninterruptible power supply.

Without all four parts in place, you'll always lose data during a reboot.

We are aware of all of this. The database application has the protections you mention. The only time we lose data is when we are running shell scripts outside the database to receive and send data via polling. As far as the reboots we also have all that, the reboots occur because the users of the equipment are moro.......computer illiterate. we have 4000+ locations using this system, you are going to have Power outages, people not following the procedure which causes open delays while the delayed backup runs, then they get impatient and think the message on the screen telling them not to reboot the computer will go away if they reboot the computer.

As for my original question, I understand some of it now. I was confusing something i was told last week. Its not the hard drive buffers that are being unused by the database and that dd command, its the OS buffer its not using.

schuss · July 2015

What else is running on the machine?
How often do you run stats or index?
What is the average read/write size?
Have you queries been optimized to use proper indexing and the appropriate data level locking is in place?
Is it local or over a network?
What database?

When it comes to databases, there are a lot of factors in play.

hsu · July 2015

If you want a quick bandaid solution, start replacing all your rotating hard drives with solid state ones. But that's just a bandaid.

You're company's real problem is that they didn't assume worst case scenario when designing the software. There needs to be a store/forward/retry/restart design for all your client based apps. That is, store each command that comes across the wire, so they can be retried at a later, letting uncompleted ones restart after a reboot.

It actually doesn't seem like it'd be too hard to put that logic into your polling app. Just save all the commands received into a db table, execute the command, and update the db table with the result. Any rows in the db table without a completed result gets rerun upon a restart. (Note that I'd add 2 fail safes: a command that clears the queue, and a queue limit, over which non-clear commands get a "too busy" reply)

azith28 · July 2015

schuss wrote: »

What else is running on the machine?
How often do you run stats or index?
What is the average read/write size?
Have you queries been optimized to use proper indexing and the appropriate data level locking is in place?
Is it local or over a network?
What database?

When it comes to databases, there are a lot of factors in play.

Okay, I cant answer all of your questions...I'm still learning a lot of this but I think we are getting off track here. Lemme kinda start over describing the problem.

The business uses a very Customized Suse Linux OS, and a Progress OpenEdge Database. We were doing hardware refreshes and the older model servers were no longer available, so we did some verification of the new hardware and the metrics we used were within tolerances, but when they started going to the field, we began getting intermittant reports of long pauses and delays during specific parts of the database application only on this hardware so I began investigating the problem.

Comparing the new gen hardware to the previous gen hardware showed it running much slower, but the problems were only showing up in the Database application under different kinds of tests then we ran for the hardware verification...for instance I would load, then delete 100k records and time both, then compare to other, older hardware. We narrowed the problem to the hard drive. When moved from the new server to a 2-generations ago model, the problem followed the hard drive and the 2 generations ago model hard drive performed great in the new hardware.

So we began investigating why with the vendor. We eventually discovered that the new hardware hard drives were advanced format (4k), and the image we use to create new systems for replacements were based on a 512 byte image. So we went thought the process of converting the image to a 4k image to put on the drive and while we saw an improvement in speed of our tests, it still was not as fast as the 2 generations ago hard drives. Meanwhile, we tested the image on another brand of 4k drive and it was much faster, showing numbers better then the 2-gen old harddrive, but the earlier conversion to the 4k had put us in the ballpark so the vendor wasnt going to be replacing the drives we already bought with a different brand.

Parallel to us speaking with the vendor who sold us the drives, we were also speaking with Progress about why this was occuring. Progress had us do some tests and claimed that our drives were writing very slowly, first bringing into the discussion the 'our 126 average read/write hard drive is actually writing at 150-200kB/s, and you are suppose to be writing at 10MB/s'. The dd tests with the dsync option showed the same thing his DB tests showed, so we are currently gathering data about this, but this isnt a satisfactory answer yet since our tests so far will show that load/delete tests are taking 100% more time on a drive that is only say 10% slower per the dd writing tests.

What I'm trying to understand is what my options are. Firmware for the drive? OS tuning? drivers? We are currently also in the process of converting to OracleOS, so whatever we do is going to have a short life anyway, I'm just trying to understand the issue better.

Replacing all our HD's with SSD's is not a viable solution. We did test SSD's a few years ago, and we did not really see enough of a performance improvement to make it cost effective.

Our polling app is fairly reliable, it does have built in controls, We do have the occasional loss of file, but given the number of systems and the number of failures, its reliable. I was just mentioning we do see the occasional issue with buffers getting wiped.

schuss · July 2015

Sounds like you need to beat on the vendor more, honestly, as if they sold you the kit and caboodle to drive this app based on your specs and it's encountering failures (that you've been able to alleviate with other hardware), it's time to replace the drives.

Penny Arcade

Quick Links

Hard drive raw data read/write speeds.

Posts