Our new Indie Games subforum is now open for business in G&T. Go and check it out, you might land a code for a free game. If you're developing an indie game and want to post about it, follow these directions. If you don't, he'll break your legs! Hahaha! Seriously though.
Our rules have been updated and given their own forum. Go and look at them! They are nice, and there may be new ones that you didn't know about! Hooray for rules! Hooray for The System! Hooray for Conforming!
Random GPU shutoffs and crashes "nvlddmkm stopped responding and has successful recovered"
Basically what the title says: I'm running a pair of EVGA GTX 470 in SLI, and periodically in games (including Skyrim, but not limited to), the cards will shut off. On the bright side, they almost immediately recover--my monitor registers a lack of input for a second, before it comes back.
More of an annoyance than anything, but I'm wondering what might cause it. I'm not a PSU expert by any means, though I did get a new one to run these two new cards in the first place, and they ran fine for about nine months without any problems. I was hoping that updating the drivers might address the problem, but that doesn't seem to help either. Hopefully this isn't an early sign of a failing PSU or video cards (thankfully, I've got EVGA's lifetime warranty on these things).
Anyone have thoughts as to what might be causing this?
EDIT EDIT:
So, now I'm getting very frequent crashes when I shut down, restart, or start the PC.
On shutoff: PC crashes right before the system should shut down, just hangs.
On restart/start up: PC crashes immediately after "Starting Windows 7" screen finishes loading, just hangs.
I'm fearing I've got a few weeks or days before this becomes a universal, rather than frequent, experience. Getting a lot worse, really fast. Thanks EVGA (Or Nvidia, if it turns out the widespread 470 chipset complaint I've run into are related).
Do you get the message that the video driver failed or anything like that?
Something like that would happen with my 8800 on my previous machine, went away when I replaced my RAM because my motherboard was killing my RAM at higher speeds.
No, no messages or warnings or anything--that's the weird part, because I've had GPUs go bad and shut down in the past, but they never recovered very soon afterwards, and they usually left some notice of it.
Once every few weeks, I used to get corruption/artifacting crashes, but I was on beta drivers that I've since replaced.
Very strange... my issue would happen under Vista and every time it recovered it would tell me that the NVDLMMKM.SYS had crashed and been recovered. If you're not getting that... very strange.
PSUs lose output power as they age. Not saying that is the cause, but it's a possibility. Does the Event Viewer show anything under the System category? (Start -> Run -> eventvwr.msc)
I'll have to try it without SLI--for the hell of it, I pulled out both cards, cleaned them out, and but them back in them.
The PSU is about 1.5 years old, if I'm remembering (same age as the cards), so I guess that's also a possibility, though I wouldn't expect an immediate recover in that case.
My ATI was doing this about three weeks ago, but it ended up being the 11.11 drivers just being total shit. Knocked up to the 12.01's and it stopped...try updating your drivers maybe?
It happened once after my most recent driver update--so I don't think that's it (but I think it helped with other issues, namely hard-crashes).
In the meantime, I switched the cards themselves, cleared out the dust, and returned the clocks to normal speed (they were hardly overclocked that much to start with, but hey, you never know). Nothing yet, on the bright side.
Follow up: okay, it's still happening--a lot in World of Tanks. Seems like a daily occurrence. Doesn't appear to be tied to any specific thing happening in game, and to be fair, the game is really badly programmed (memory leaks, CPU issues, etc.) already.
Going to stop playing WoT (I actually hate the game a lot, but I have premium time I don't want to go to waste), so I'll try a complete driver wipe as well. Hope it's worth the trouble of reprogramming all my 3D settings.
Thought I'd participate since I've had the same thing starting just a couple days ago with my GTX 560. Probably been an issue for longer but I've only had time for gaming (ironically) while away on weekends for work, so I've been playing things on my laptop.
Anyhow, I did my standard thing with GPU fail/recover errors and lowered memory/GPU clock a bit (by 100 each I think) to see if it fixed it, and it did. So I expect a thorough cleaning of the HSF will correct my issue. I've got cats and I overclock my gear as far as I can so I'm pretty used to this sort of thing cropping up as a problem once dust (and disgusting cat hair) accumulates.
edit: Just wanted to point out that I'm not a slob, it's just that I run a completely open case and dust/hair gets sucked in there easily, not helped by my asshole cats napping right next to all my heatsinks any time I'm not around to shoo them away.
Might want to give it a try. Just takes installing forceware then hopping into the 'performance' section of your nVidia control panel to adjust things.
The thing is, I've already done that--I took out both my cards and gave them a thorough cleaning. I'll try lowering the clocks further, but if I have to lower them past their default speeds, it's a case of the solution being as bad as the problem in some respects (EVGA Tuner will let me do that.)
Eh, it's okay. More and more I'm thinking this is a PSU issue (though again, it's just a suspicion). The idea of replacing my Ultra x4 1.2kW PSU--especially since that model doesn't seem to be offered anymore at their website--is a pain.
Huh. So I got around to doing my cleaning and set clocks back to normal and my problem came back.
Turned out, for some reason, 3d vision got turned on. Not sure why that made me have to cut my clocks or get GPU fail/recovery, but it did and I've confirmed that the problem comes back when I enable it.
Weird. Don't suppose yours got turned on my accident, synthesis?
It happens once a day, like clockwork. Literally. It seems to always happen at 12:30. Except I never have anything schedule for 12:30, last I checked.
As of late, I've always been playing Skyrim that late in the evening, so I'm going to stay off Skyrim for the next few days. If I can isolate it to one buggy-as-hell Bethesda game, I'll call it a minor victory.
EDIT: Nope, happened in WoT as well. So, once a day, always at night. No idea why.
I was totally incorrect about the "it always happens at 12:30"--I ran a test, and it happened at 9:00 PM, three hours earlier.
My PC is plugged into an APU-brand power supply--which, I suppose could be the cause of the problem. I'm pretty sure "night" as a whole is due to the fact that it's the only time I've been playing several hours in a row due to my work schedule. It doesn't seem to be heating problem, because in an hour or so, my GPUs hit a roof with the fans running at full power and stay there for however long I'm playing.
On the bright side, I don't think my PSU is dying--the 12v rail is still at a pretty healthy 12.16v or so.
I thought maybe there is something putting a big drain on your house power and causing momentary voltage spikes or something. But if you're running it through a power supply then it shouldn't affect it even if your aircon or water heater or something is kicking in.
That's a good point, actually--I was thinking the reverse (the APU is a bit too small for my PC--that, or the reserve battery has gone bad, because it can't be counted on to keep it on for any length of time).
Nothing mentioned in the Event Viewer--though, to be fair, I wouldn't know what to look for either. I just have looked for things immediately after it happens, with no luck yet.
Have you had a chance to clock similar hours on your machine during the daytime? Maybe you could call your local power company to see if there have been any anomalies as of late?
I'm trying to think outside the box here because, even with some pretty in-depth Google-fu your issue is as unique as they come.
Well, it happened again. At 6 PM, hardly nighttime, while playing Skyrim. Not even that much time playing. Came back in a second.
This time, checked the event log, and actually found what I'm pretty sure it was--I quit out of the game immediately, and did get error message in the system tray. The record in the event viewer is as follows:
"Display driver nvlddmkm stopped responding and has successfully recovered."
It is, in fact, a display issue, it seems--didn't think it was a PSU issue. Oddly, I regressed to slightly older beta drivers because, on Friday, I was getting catastrophic crashes and errors--boot ups leading to complete restarts after the Windows screen, artifact corruption, etc., and system restore sent me back a few versions (thankfully, I'm not having the issue again).
One more clue, I guess. On a side note, apparently, EVGA is demanding a receipt for a possible RMA--a receipt from a purchase two years ago. Funny thing they don't warn you about when you register the product on their website, I guess. I'm beginning to see why most of my friends have given up high-end PC gaming.
One more clue, I guess. On a side note, apparently, EVGA is demanding a receipt for a possible RMA--a receipt from a purchase two years ago. Funny thing they don't warn you about when you register the product on their website, I guess. I'm beginning to see why most of my friends have given up high-end PC gaming.
If you didn't want to keep your receipt you could have just uploaded it when you registered your card. EVGA needs a receipt so they know that you actually bought the card.
One more clue, I guess. On a side note, apparently, EVGA is demanding a receipt for a possible RMA--a receipt from a purchase two years ago. Funny thing they don't warn you about when you register the product on their website, I guess. I'm beginning to see why most of my friends have given up high-end PC gaming.
If you didn't want to keep your receipt you could have just uploaded it when you registered your card. EVGA needs a receipt so they know that you actually bought the card.
I really didn't see any such option two years ago when I first registered the cards on EVGA's website--I could have missed it though.
I didn't own a scanner two years ago (I still don't, since filling out a warranty usually does not imply scanning your receipt--what if you got it for a gift?), so I guess it's redundant. I could have taken a photograph of the receipt, but even that sounds a little silly for a warranty. Guess "lifetime warranties" are different. Then again, it was probably unrealistic to assume EVGA was that much better than the other GPU manufacturers.
It doesn't seem that silly to me. A lot of electrical goods still have those fill-out-and-mail warranty cards in the owners manuals, where you're required to photocopy the receipt and attach it. And one thing my girlfriend has gotten me into doing is keeping all the receipts and manuals in a divider in our filing cabinet for safe keeping and easy finding.
Fair point. These came with the normal warranty cards, which I did fill out (rather, I followed the instructions to fill them out). Looking at them, they make no mention of having a copy of your receipt, but they're just instructive and not binding or anything.
I've moved in the last year, so I'm not surprised that those particular receipts are lost. I guess that's what I get for not treating PC hardware receipts like insurance and passport documents.
Thankfully, I do not have that issue with my TV--the receipt for that was tossed long ago, but the actual record with the retailer who sold it is all I've needed in the last +three years.
Granted, a meteorite could flatten the store, and then I'd be in trouble. But then the online account would hold, I think.
I don't own other really large appliances (come from an apartment culture, which means the largest appliance I own is a TV, and small toaster, rice cooker, and appliances that aren't eligible for return with or without a receipt in a short period). And I didn't have A/C until I moved into an apartment that came with one permanently affixed. Didn't know people still used portable ones outside of dormitories.
Then again, I doubt I have many receipts from before I moved to the United States either. Anyway, back on topic: crashes, GPU black-outs, and freezes.
TychoCelchuuu___________PIGEON_________San Diego, CA Registered Userregular
The easiest way to sort this out might be to just use one card for a while, and if you don't have any issues, use the other card for a while. This will tell you whether it's something specific to one of the cards or whether it's something about both at once (maybe SLI, maybe power draw, maybe drivers, who knows) that is causing this.
I spent a week or so running without SLI, and got no errors whatsoever. $10 and a new SLI bridge later, and I can confirm it is not the SLI bridge (I had a bad SLI bridge a few years back that basically made my two 8800GTs not work for a ridiculous number of games).
Voltages look completely normal. It's possible that my 2nd GTX 470 is simply bad.
I did. The question, of course, is "until when?" If that is the problem, I can't really troubleshoot it that way.
Meanwhile, I'm sitting on a few hundred dollars (not to mention the price of the PSU itself) that is bugged in some way.
You're right inasmuch as it would be more convenient. Though Skyrim runs significantly better with the second card (particularly outside, unsurprisingly). I'll keep pestering EVGA in the mean time.
Indeed, I'm planning that--though right now, I'm lowering the clocks a bit (ugh) and moving PhysX support to the CPU, and seeing if that does anything (suggestions I read from the Nvidia forums).
I played about 6 hours of ME3 on Tuesday, no hiccups there.
Well, I was able to "isolate" the issue--running the second card at same clock speed as the first one (607 versus 625) inevitably leads to my crashes. That's why turning SLI off caused no problems. Given that I used to be able to overclock both cards easily, and my PSU still reads healthy, it looks like the second one just went bad. Now to get on EVGA's case to replace it.
Posts
Something like that would happen with my 8800 on my previous machine, went away when I replaced my RAM because my motherboard was killing my RAM at higher speeds.
Once every few weeks, I used to get corruption/artifacting crashes, but I was on beta drivers that I've since replaced.
Does it happen when SLI is turned off?
The PSU is about 1.5 years old, if I'm remembering (same age as the cards), so I guess that's also a possibility, though I wouldn't expect an immediate recover in that case.
In the meantime, I switched the cards themselves, cleared out the dust, and returned the clocks to normal speed (they were hardly overclocked that much to start with, but hey, you never know). Nothing yet, on the bright side.
Going to stop playing WoT (I actually hate the game a lot, but I have premium time I don't want to go to waste), so I'll try a complete driver wipe as well. Hope it's worth the trouble of reprogramming all my 3D settings.
Anyhow, I did my standard thing with GPU fail/recover errors and lowered memory/GPU clock a bit (by 100 each I think) to see if it fixed it, and it did. So I expect a thorough cleaning of the HSF will correct my issue. I've got cats and I overclock my gear as far as I can so I'm pretty used to this sort of thing cropping up as a problem once dust (and disgusting cat hair) accumulates.
edit: Just wanted to point out that I'm not a slob, it's just that I run a completely open case and dust/hair gets sucked in there easily, not helped by my asshole cats napping right next to all my heatsinks any time I'm not around to shoo them away.
Might want to give it a try. Just takes installing forceware then hopping into the 'performance' section of your nVidia control panel to adjust things.
Turned out, for some reason, 3d vision got turned on. Not sure why that made me have to cut my clocks or get GPU fail/recovery, but it did and I've confirmed that the problem comes back when I enable it.
Weird. Don't suppose yours got turned on my accident, synthesis?
It happens once a day, like clockwork. Literally. It seems to always happen at 12:30. Except I never have anything schedule for 12:30, last I checked.
As of late, I've always been playing Skyrim that late in the evening, so I'm going to stay off Skyrim for the next few days. If I can isolate it to one buggy-as-hell Bethesda game, I'll call it a minor victory.
EDIT: Nope, happened in WoT as well. So, once a day, always at night. No idea why.
Is your PSU plugged into a filtered powerboard, or straight into a plug?
"If you don't know who Kendra is, I'm officially not speaking to you."
My PC is plugged into an APU-brand power supply--which, I suppose could be the cause of the problem. I'm pretty sure "night" as a whole is due to the fact that it's the only time I've been playing several hours in a row due to my work schedule. It doesn't seem to be heating problem, because in an hour or so, my GPUs hit a roof with the fans running at full power and stay there for however long I'm playing.
On the bright side, I don't think my PSU is dying--the 12v rail is still at a pretty healthy 12.16v or so.
"If you don't know who Kendra is, I'm officially not speaking to you."
And so, the puzzle continues.
I'm trying to think outside the box here because, even with some pretty in-depth Google-fu your issue is as unique as they come.
This time, checked the event log, and actually found what I'm pretty sure it was--I quit out of the game immediately, and did get error message in the system tray. The record in the event viewer is as follows:
"Display driver nvlddmkm stopped responding and has successfully recovered."
It is, in fact, a display issue, it seems--didn't think it was a PSU issue. Oddly, I regressed to slightly older beta drivers because, on Friday, I was getting catastrophic crashes and errors--boot ups leading to complete restarts after the Windows screen, artifact corruption, etc., and system restore sent me back a few versions (thankfully, I'm not having the issue again).
One more clue, I guess. On a side note, apparently, EVGA is demanding a receipt for a possible RMA--a receipt from a purchase two years ago. Funny thing they don't warn you about when you register the product on their website, I guess. I'm beginning to see why most of my friends have given up high-end PC gaming.
I really didn't see any such option two years ago when I first registered the cards on EVGA's website--I could have missed it though.
I didn't own a scanner two years ago (I still don't, since filling out a warranty usually does not imply scanning your receipt--what if you got it for a gift?), so I guess it's redundant. I could have taken a photograph of the receipt, but even that sounds a little silly for a warranty. Guess "lifetime warranties" are different. Then again, it was probably unrealistic to assume EVGA was that much better than the other GPU manufacturers.
"If you don't know who Kendra is, I'm officially not speaking to you."
I've moved in the last year, so I'm not surprised that those particular receipts are lost. I guess that's what I get for not treating PC hardware receipts like insurance and passport documents.
It's the thickest divider in the cabinet, even bigger than tax records for two people for nearly a decade and a half...
"If you don't know who Kendra is, I'm officially not speaking to you."
Granted, a meteorite could flatten the store, and then I'd be in trouble. But then the online account would hold, I think.
I don't own other really large appliances (come from an apartment culture, which means the largest appliance I own is a TV, and small toaster, rice cooker, and appliances that aren't eligible for return with or without a receipt in a short period). And I didn't have A/C until I moved into an apartment that came with one permanently affixed. Didn't know people still used portable ones outside of dormitories.
Then again, I doubt I have many receipts from before I moved to the United States either. Anyway, back on topic: crashes, GPU black-outs, and freezes.
I spent a week or so running without SLI, and got no errors whatsoever. $10 and a new SLI bridge later, and I can confirm it is not the SLI bridge (I had a bad SLI bridge a few years back that basically made my two 8800GTs not work for a ridiculous number of games).
Voltages look completely normal. It's possible that my 2nd GTX 470 is simply bad.
Meanwhile, I'm sitting on a few hundred dollars (not to mention the price of the PSU itself) that is bugged in some way.
You're right inasmuch as it would be more convenient. Though Skyrim runs significantly better with the second card (particularly outside, unsurprisingly). I'll keep pestering EVGA in the mean time.
I played about 6 hours of ME3 on Tuesday, no hiccups there.