So, I've been going slowly insane.
I wrote a program that is basically a fake OS. In it, I have a fake Process Control Block. One of the functions in the OS is "RUN", which adds a PCB struct to a list with a pointer to a textfile, and then when you input "GO", the script in that textfile is executed.
Now, that worked fine not two weeks ago.
Yesterday, I tried doing the new assignment's programming part, which involves slightly changing the PCB struct. Basically, adding a process ID field, an int.
so my change was to go from
typedef struct PCB_REC {
char* filename;
FILE* process;
struct PCB_REC* next;
}PCB_REC;
to
typedef struct PCB_REC {
int pid;
char* filename;
FILE* process;
struct PCB_REC* next;
}PCB_REC;
This caused a seg fault every time I used "GO".
Now, I decided to go back to my original and use that to figure out what could be going wrong. That worked for a bit, but got me no closer to figuring out the reason. Then I defragged and restarted because my computer was running insanely slow. Now, the original code doesn't work either, in exactly the same manner.
So, basically: anyone know whether it could have something to do with just my environment? Am I missing something basic? Why would adding a field to a struct cause a seg fault?
Posts
This may also explain why things seem to act nondeterministically. I'd run the code through a debugger and keep a good watch on your pointer operations and see if something's going haywire.
My feeling as to why it was working before and not after you added the field is that there was some subtle error in your code before, but for whatever reason you got lucky and it wasn't breaking anything visibly. I'd say the top two suspects are that you're trying to dereference an uninitialized or NULL pointer, or that you made some other small change that broke something and forgot about it.
My guess is that you're not doing a clean compile, and that there are still some components that are using the old definition of the struct instead of the new definition.
Given the buffer 00000001 7fffff34 7fffff30 7fffff58
Your new code sees:
pid = 1
filename = 7fffff34 (valid memory)
process = 7fffff30 (valid memory)
next = 7fffff58 (valid memory)
The old components see:
filename = 00000001 (invalid memory)
process = 7fffff34 (valid, but not correct)
next = 7fffff30 (valid, but not correct)
I don't have the code in front of me, but here's my gut feeling on what's happening. Somewhere, you're working with a pointer to this struct (or within this struct) that isn't initialized correctly. Either that or you're indexing into your struct in some strange way that doesn't take this new field into account (less likely, but still probable.)
If it's the former, things worked for a little while because the memory you were corrupting wasn't tied to anything critical, or you had enough "padding" in your program to contain the corruption. Adding a new field into the struct changed this because the int field pushed things beyond your "safe zone" in the memory.
If it's the latter, you're grabbing the wrong section of memory when you index into your struct... for example, you're grabbing the PID as part of your filename, which could cause all kinds of problems.
The defragging thing is definintely puzzling.
I'd go through this thing with a fine-toothed comb. Start up your debugger and step through the program, line by line. Find where the program dies and work your way back from there. Watch all of your variables for weirdness. Weirdness includes (but is not limited to): changing one variable and overwriting part of another somewhere else in memory; assigning a new value to a pointer with a non-null value; accessing members of a NULLed or non-initialized pointer; garbled or nonsensical values in structures (especially strings.)
edit: beat'd
It...
IT WAS AN ==. I missed an = somewhere. How the hell I managed to test-run earlier I do not know.
Let this be a lesson: look for all of the really really simple stuff first.
Good lord I'm embarrassed now.
A neat trick I've heard to prevent that problem is, whenever you're doing an equality comparison and there's an rvalue involved (in other words, an expression or literal or something else that you can't assign a value to), put it on the left side. So if you'd normally write if(val == 1), do if(1 == val) instead. That way if you mess up and use = instead of == you get a compile time error instead of subtly broken code.
Was your particular error that you assigned NULL to a pointer instead of comparing it?