Our new Indie Games subforum is now open for business in G&T. Go and check it out, you might land a code for a free game. If you're developing an indie game and want to post about it, follow these directions. If you don't, he'll break your legs! Hahaha! Seriously though.
Our rules have been updated and given their own forum. Go and look at them! They are nice, and there may be new ones that you didn't know about! Hooray for rules! Hooray for The System! Hooray for Conforming!

The Slow Failure of a Video Card

NipsNips In constant existential crisis.Registered User regular
Hey Tech Friends.

So my desktop rig, originally built in 2012, is starting to show its signs of aging. I recently had to replace the CPU cooler (was using the stock one) after a series of crashes, which seemed to fix things right up.

Enter the next failure: I suspect my video card is struggling (XFX Double D Radeon HD 7950 DirectX 11 FX-795A-TDJC 3GB). Two weeks ago the rig suffered a graphics-corruption-led crash during heavy gaming (playing Warframe), and multiple attempts to recover were required. In a weird twist of unconvincing troubleshooting, a system restore to a point several days before the crash completely fixed it. I remained suspect (there were no system changes, software or hardware, between the restore point and the crash), but everything's been fine until today.

This morning it crashed again (again while playing Warframe). When it crashed, it first briefly threw something like this on the monitor, same as the previous time:
IMG_8348.jpg
Not my monitor or picture; the color is different, but the pattern is the same. After a second or less, the screen went black. However this time it didn't crash the entire system. I was able to alt-tab out of the game, and the desktop rendered fine. I force-closed the program, and now here I am, typing.

The thing that's most frustrating is that I can't seem to pin down exactly why this crash is occurring. I suspected the GPU temps were getting out of hand, but a full burn-in with Furmark didn't cause anything. I played on the computer for almost fourteen hours straight yesterday, and didn't have a single problem. Then, after an hour and a half today, it crashes.

If anyone has any helpful troubleshooting ideas, I'll happily take them.

Posts

  • bowenbowen ayyyyyyyy Registered User regular
    edited January 6
    You see that kind of thing when the GPU bus or memory begins to fail.

    Edit: however, the fact that windows is fine makes me suspect something more sinister, like a corrupt install of warframe, or hard drive failure

    bowen on
    Warning: I am a programmer/sysop. IANAL/IANAD, seek actual advice from certified people in their respective fields if you are actually in need of it.
  • wunderbarwunderbar Registered User regular
    Have you tried new/different video drivers? That's the next step.

    Otherwise, yea the card is probably dying. My guess is that some of the video card memory has gone bad, which is why sometimes it'll work for hours and sometimes it won't. When it tries to load an asset into a memory module that's gone bad it crashes, and it might only be one module.

    XBL: thewunderbar PSN: thewunderbar NNID: thewunderbar Steam: wunderbar87 Twitter: wunderbar
  • bowenbowen ayyyyyyyy Registered User regular
    Apparently this is super common for AMD cards in re: to drivers after a quick google. Try downgrading/upgrading your video card drivers and see if that fixes anything.

    Warning: I am a programmer/sysop. IANAL/IANAD, seek actual advice from certified people in their respective fields if you are actually in need of it.
  • NipsNips In constant existential crisis.Registered User regular
    My driver is on 15.20.1062-150715a-187142C, and the latest available looks to be 16.50.2001-161204A, so I'll give that a try. The only problem with this is that since the crash has been so unpredictable, I'm not convinced updated drivers will prove or disprove that as the cause.

    I'm still suspect that it's something more insidious, like a bad memory sector. Is there some sort of diagnostic tool to dig into the GPU's brains? I'm not afraid to get elbow deep into this thing, if it saves me from having to get a new video card.

  • bowenbowen ayyyyyyyy Registered User regular
    https://www.raymond.cc/blog/having-problems-with-video-card-stress-test-its-memory/

    you can try that first one

    Typically though, if you an alt tab out of a program and your screen is still acting fine, it's typically isolated to the program and whatever it's doing.

    You've generally got the following three options from there:
    - Corrupted install or corrupted hard drive sectors, loading the corrupted data crashes the program
    - The program is accessing the video card in a weird way and the program itself is crashing
    - The video card driver is out of date and the program is pretending its not and crashing itself from there

    The easiest to check is the driver, then if that doesn't solve it, you'll want to uninstall the application (completely, don't save anything) and do a chkdisk and defrag, then reinstall

    If that doesn't fix the issue, it's likely a hardware issue somewhere. I'm skeptical because everything else seems to work fine before/during/after that program crashes, and nothing else seems to be having the issue (the full burn in worked fine, so it's not a temperature throttling thing either).

    Warning: I am a programmer/sysop. IANAL/IANAD, seek actual advice from certified people in their respective fields if you are actually in need of it.
  • ButtcleftButtcleft Registered User regular
    Nips wrote: »
    My driver is on 15.20.1062-150715a-187142C, and the latest available looks to be 16.50.2001-161204A, so I'll give that a try. The only problem with this is that since the crash has been so unpredictable, I'm not convinced updated drivers will prove or disprove that as the cause.

    I'm still suspect that it's something more insidious, like a bad memory sector. Is there some sort of diagnostic tool to dig into the GPU's brains? I'm not afraid to get elbow deep into this thing, if it saves me from having to get a new video card.

    Harddrive can be ruled out by running chkdsk.

    Run that before anything, cause that'll take nothing but time.

    After that's done, if it doesn't bring back bad sectors, just do a verify integrity of cache on warframe (since it seems only warframe is crashing it)

    After that's done, if its still being crashy, update the drivers.

    I personally put drivers last because, in my experience, AMD drivers are like Mr.Hands level pain in the ass and like to fuck shit up if your card is older than a year.

    that's it, I'm shutting this entire forum down, everyone thank buttcleft
  • NipsNips In constant existential crisis.Registered User regular
    edited January 6
    Thanks guys, I'll give all that a try.

    The only reason I'm still suspicious it's the card and not software, is that the crash a couple weeks back impacted everything. Game, Windows, all the way down to graphics glitches during reboot and startup. When it crashed today, I was surprised when it didn't crash the entire system.

    Nips on
  • wunderbarwunderbar Registered User regular
    That's personally why I think it could be drivers if it isn't a bad card.

    Driver issues can be random like that.

    XBL: thewunderbar PSN: thewunderbar NNID: thewunderbar Steam: wunderbar87 Twitter: wunderbar
    bowen
  • bowenbowen ayyyyyyyy Registered User regular
    AMD drivers are notoriously bad too.

    They're better now, but, that's like saying "hey poison doesn't always have to kill you."

    Warning: I am a programmer/sysop. IANAL/IANAD, seek actual advice from certified people in their respective fields if you are actually in need of it.
    DarkPrimus
  • dporowskidporowski Registered User regular
    I have had multiple AMD cards go bad in exactly this manner. (Admittedly on a Mac, but still.) It's fine except for "thing", then over time it just gets progressively worse, and then everything becomes awful. Sometimes a software tweak "fixes it" for a while, but eventually...

  • NipsNips In constant existential crisis.Registered User regular
    edited January 7
    Ok. Driver updated (new UI frontend too, going from the Catalyst to the Radeon program, that took some getting used to). I finally figured out how to run Video Memory Stress Test across the entirety of the RAM on the GPU, and it came back with zero errors. I then moved onto memtestCL, and though it's only able to access 2 Gb (approximate) of the GPU RAM, it's finding errors in the "Random blocks" test with every iteration. That's not encouraging. :/

    Edit: And now I'm reading Random Block errors might not be the card's fault, but the memtest. Hrmmm.

    Nips on
  • bowenbowen ayyyyyyyy Registered User regular
    memtestCL requires an nvidia card I think?

    Warning: I am a programmer/sysop. IANAL/IANAD, seek actual advice from certified people in their respective fields if you are actually in need of it.
  • BigityBigity Lubbock, TXRegistered User regular
    Is that a CRT?

    steam_sig.png
    EliminationKendrik
Sign In or Register to comment.