I built my new computer from various parts ordered on Newegg about a month or two ago. It worked beautifully, save for the fact that I hate Windows Vista. Then about a week or two ago, I started getting Blue Screens of Death while playing games. Now, looking at the crash dump and using a combination of Google and the (useless) Windows help, it looks like my crash falls under the category of essentially an unidentified unhandled exception. Windows help was so kind as to suggest that it may be due to a problem in my harddrive, memory, PCI-e cards, PCI cards, motherboard, temperature, IRQ settings, processor, 3rd party software, etc. etc. etc. The crash I get seems to come at random, although it seems to only occur when I'm playing a game. So far, crashes in WoW and NWN2, though no crashes in Dawn of War or Company of Heroes.
-I've ran a memory tester for several nights as I sleep and everything came up green.
-All drivers have been patched to the most up to date stable version that their companies provide. In doing this, it seemed to reduce the frequency of crashes I was receiving, but still I'm crashing.
-Heat seems to be a non issue, my machine claims to be running in the low 30s.
-The harddrive seems to be solid. Running chkdsk and another disk scanning method that came with Vista show everything as fine.
-I doubt it is a software issue, as if it were the case then the software should just crash, not BSOD my computer along with it.
So, I'm sort of at my wit's end on fixes to try. I've already removed anything non-essential, and pared down the services list to anything that isn't essential to starting up and getting on the internet. Then I went into my device manager and took a look at IRQs. Now, IRQs are a deep black magic that I fear to tread upon, and I rightfully fear their power. I know very little about them, except I vaguely remember something about not sharing them. So, I'm coming to you guys to see if you guys can tell me if this is wrong or perfectly normal:
If the picture doesn't show, or is difficult to read, here is the part that caught my attention:
(PCI) 16 Intel(R) ICH9 Family PCI EXpress Root Port 1 - 2940
(PCI) 16 Intel(R) ICH9 Family PCI EXpress Root Port 5 - 2948
(PCI) 16 Intel(R) ICH9 Family USB UNiversal Host Controller - 2937
(PCI) 16 NVIDIA GeForce 8800 GT
(PCI) 16 Standard Dual Channel PCI IDE Controller
It seems to me, there shouldn't be 5 things sharing IRQ 16, another 4 on IRQ 18, and an pair each sharing IRQ 17, 19, and 23. I've already expressed my ignorance of this topic, so could someone let me know if this is indeed a
bad thing, or if it's entirely common to have multiple devices share IRQs. If so, would it be safe for me to start reassigning items to the next free IRQ on the list? (IE, IRQ 23 is used twice, but IRQ 24 is unused. Could I just move one of the 23's over to 24?) If I start to move around IRQs, what would be the best and safest way to do so?
Posts
from what your describing it sounds more like a video card / driver issue. did you just recently get a new video game or recently update your video card driver?
more on irq's later....
The 8800 GT was relatively new when I got it, so I stayed with the drivers provided with the CD. When the computer started crashing, the first thing I tried was updating the drivers. It seemed to reduce the frequency of crashes, but that could have just been psycho-somatic.
Every piece in the machine is younger than three months old. So it's hard to tell if there was ever a stable build, I might have just gotten lucky at first.
Playing games taxes and heats up a lot of components in your system, and something that was marginal is going to have trouble.
The real difficulty here is identifying which component is your problem, and you're in a bad spot because you built the machine yourself. You likely don't have identical replacement parts sitting around to test.
If I had to guess, I'd say the most likely culprit is marginal hardware in your video board. I'd swap it out and see if you still get BSODs.
Which one? memtest86+ is pretty comprehensive, and free. I've had it smoke out errors that other tests didn't. Tests 1 through 4 strongly indicate a hardware fault, tests 5 through 8 tend to indicate a problem in the memory controller, or a the FSB, or some interaction between components on the memory bus.
The reason I'd like the extra detail is that you may be having a problem I was having with my new rig a few weeks back. I have a P35-based motherboard with a Core 2 Duo processor and 2.2v DDR-1066 RAM. The RAM is also rated to run as DDR-800 using only 1.8v. As is mentioned at the top of the ever useful Twice and Future Computer Thread:
So, so true. I was getting blue screens all over the place while gaming. After altogether too much time spent troubleshooting, I discovered that Test 5 in memtest86+ was spewing a lot of errors. After reading robaal's comment, I went into the BIOS and manually specified memory speed (DDR800) and voltage (1.8), and I haven't had a single blue screen since. Prior to setting that manually, I noticed that sometimes the BIOS would detect the RAM as DDR1066, but would default to running it at 1.8v, which it is simply not rated to do. I tried manually setting DDR-1066 and 2.2v, but I had one or two more blue screens and went back to DDR800. Personally, I'm happy to take the 5% performance hit in exchange for a fully stable rig.
Anyway, you may be having a similar problem with RAM speed, voltage or timings. Try memtest86+, double check the memory speeds and voltage supported by your RAM and mobo, and try manually specifying those values in the BIOS. If your RAM is like mine and it supports two different speed-voltage combos, try the lower speed-voltage pair (e.g. DDR800 @ 1.8v), see if that makes a difference.
I don't know if I should be disappointed or relieved. On the one hand, my RAM is working perfectly, on the other it means I still don't know what my computer's ailment is. Thank you for the suggestion though vonPoonBurGer, memtest86+ is definitely what I'll go with in the future though.
Right now I'm very much leaning towards believing there's a fault in the video card. Unfortunately, it's also the single most expensive part of the computer and I don't have a replacement at hand. Does anyone know of any methods to check to see how a video card is doing?
Not really, especially if it only dies when you're gaming. At that point, basically all the components in your system are being maxed out: CPU, RAM, buses, video board, even the power supply really. You can see if there are any nVidia diagnostics that you can run.
Do you have any more information on the failures: when it bluescreens, what is the error and (more importantly) what driver does it happen in? That information is on the bluescreen usually.
You could return your video card to the manufacturer as defective for warranty replacement. If you can convince them to do this without substantial proof (which you don't have) they will likely send you a reconditioned video board, which is some board that used to be defective and they have now declared isn't. If the problem WAS your video board, and they didn't send you another defective one, then you're golden. If it wasn't, then you've replaced your good shiny new video board for somebody's previously-defective used video board, and now maybe you have two problems.