So I've been really struggling with my boot SSD lately. And it's been a persistent issue for almost over a year now. This gets a little long, so I apologize for that in advance.
The short story: I have a 970 EVO 2 TB SSD that I use as my boot drive. It started showing an increasing number of "Media and Data Integrity Errors", so I RMA'd the drive to Samsung and they replaced it. Now the replacement is showing the same behavior and degrading slowly on me, and I'm not sure why.
The long story: I'm running 2 x 2TB 970 EVO drives on Windows 10, both at the latest firmware. They're on an ASUS ROG Maximus X Formula, both connected directly to the motherboard. My boot drive is one of them, and it's connected to the motherboard via the vertical connector that stands straight up and down - Initially this drive had no heatsink. The second drive is under the motherboard's heatsink shield. I use it as a secondary drive for my Steam Library.
Both are running via PCIe 4x connection.
The secondary drive is fine. It's been in the system for a little under two years, and has had no problems.
The boot drive is another story.
After I built the system initially, it was relatively stable for about a year. Then one day when I was running a routine backup, the backup failed with a Cyclic Redundancy Check error. This prompted me to look deeper into the health of the drive. A standard chkdsk, sfc /scannow, etc., didn't reveal anything untoward at all.
However, when I dug into the S.M.A.R.T. values for the drive via Crystal Disk Info, I found that the value "media and data integrity errors" had a value of 4. This is in contrast to the non-boot nVME SSD I have that read 0 for that value.
I kept an eye on it for weeks, and that number slowly crept upward to 6. At that point I contacted Samsung, and replaced the drive via RMA. After replacement, I added a heatsink to the drive just to be safe.
After cloning the drive and replacing it (Using Samsung data migration assistant), all was fine until a few weeks ago. Suddenly the "Media and Data Integrity Errors" has crept up to 1 again, from 0. Then it crept up to 3. I fear it's going to continue increasing until the drive is unusable.
Checking the Windows Event Viewer, I see that there's a log that indicates the drive had a "Bad Block" just about the same time this happened. It seems to be happening again, despite having replaced the drive.
Is this something to worry about with regard to the drive degrading? Should I consider this drive failing at this point?
I'm not sure if I just got unlucky with two drives that were both bad, or if there's some other issue that might be causing it - Like a bad m.2 slot on the motherboard. Or if this could even be software related.
The Plea: Does anyone have any insight as to what might be happening, if it's something to be concerned over, and/or what measures to take to assess the situation more deeply? Should I be replacing this SSD ASAP? Should I stop using this m.2 slot altogether? Is there any way to know
why this keeps happening repeatedly?
Thanks very much in advance.
Posts
Now to my thoughts on what you are seeing.
I do suspect that you're right. I'm wondering if my sandboxing software might be hitting the drive in a way that's uncommon? Either that, or it might be something to do with a defect on the actual motherboard slot? I'm getting utterly baffled. I do have backups, though. And a fresh SSD to replace it with. I just don't want to replace it, and then fry a THIRD drive.
I've hardly moved much data around this drive - well within tolerances. 5 TB reads, 6 TB writes - Compared to my other identical model drive, which has 43 TB reads, and 6 TB writes. I feel like both of those should be well within the scope of Samsung's suggested lifetime.
I also do have a heatsink on the drive already, hoping it would combat these issues. Temps hold at around 39 C at idle, pushing 58 C at heavy synthetic load.
I appreciate the input! I'm just pulling my hair out over this. Wondering if it's not better to just buy a standard SATA SSD and swap it out for the boot drive.
How well supported is the drive in this vertical drive connector? If you walk by, does it move? Do the fans cause it to move in its socket? I would remove the drive and check for wear where it connects to the motherboard.
Looking at this image from Guru3D, the drive looks very poorly supported. I personally wouldn't trust that socket you have the drive in.
Single PCIe NVME adapter cards are relatively inexpensive. Maybe you could try the drive in one of those?
The drive is actually supported in the slot by a backplate that screws perpendicularly into the motherboard.
I'm mortified because of the dust (please forgive me), but here's a picture of what it looks like, on the backplate, with the heatsink attached.
Nevertheless, a PCIe nVME adapter sounds like a VERY solid idea. My primary concern there is that my motherboard and CPU (8700k) don't really have the PCIe lanes to support both a 4x nVME SSD and a 16x GPU simultaneously.
Looking at the manual, the bottom slot is PCIe 3.0 x4, not shared with the GPU slots, though it is shared with the x1 slots. If you don't have anything in those, it looks like you could run it at x4 with a setting in the BIOS which disables the x1 slots, otherwise the best you can do is a x2 drive there.
I do have a sound card running in the bottom-most x1 slot, sadly. I suppose the option would be to disable the soundcard or go with a slower x2 speed?
Yes, if I understood the manual correctly. I think you set it to Auto and it splits stuff up accordingly. If you need x4, you could always use an external DAC/Amp. You probably wouldn't notice a difference between x2 and x4 realistically.
EDIT: NM on this, I'm a dumdum.
You're probably correct that I wouldn't notice a difference, even if I hate to leave performance on the table. Nevertheless, I'll see if I can't acquire an adapter card, and see what I can do with regard to swapping over to that.
Should it be pretty easy to just change the SSD from the on-board m.2 slot to the PCIe adapter, and then just boot from there? I'm wondering if this would cause Windows to pitch a fit, as it sometimes does when it sees a change to the boot drive.
It shouldn't matter, but do backups just incase. Do you have some spinning rust you dump an image of your boot drive onto?
Yes indeed I do! I just did a full system backup of all drives including the boot drive yesterday. Every time something like this crops up, it makes me paranoid enough to keep multiple redundant backups.
I'll give it a go once I get an adapter card. Seems like Asus makes their own specific card just for this purpose that should interface with the motherboard with minimal difficulty. It's pricey, but the bonus is that it's got a massive heatsink and an included fan.
Bad blocks can and will happen with SSDs, especially consumer grade ones. Especially if you don’t have a heatsink on a NVMe PCIe 4.0 model. Nuking SSDs makes for shorter lifetime of the NAND cells.
Without knowing how many blocks total are on the drive it’s difficult to tell when you get into the danger zone of having a read only drive.
Since it’s your main drive, page swapping will definitely have an impact as you’re not writing to the same physical location every time (only logical) so you will tend to wear out the drive if there’s lots of that happening.
Also, depending on how Samsung is handling their consumer drives they might also be background moving data to keep it “fresh” so that there’s lower latency on decode. Moving data around in the background wears out the drive too.
Just a nice reminder that NAND is suuuper bad and everyone should be backing up their data on stone tablets.