Echo
ski-bapba-dapModerator mod

So I'm trying to wrap my mind around Bayes' theorem. No, this is not my homework. :P

Let's take the following data:

1 in 1000 people get a disease.

A mandatory test for it has 95% certainty -- it will correctly give a positive result 95% of the time and correctly give a negative result 95% of the time.

If you get a positive result, what are the odds that you actually have the disease?

Right. Stop me if you see me doing something wrong.

P(d) = probability to have the disease. One in a thousand = P(d) = 0.001.

P(n) = probability to not have the disease = 1 - P(d) = 0.999.

P(p|d) = probability to have the disease and get a positive result. 95% certain test = 0.95.

P(p|n) = probability to NOT have the disease and get a positive result. 5% error margin = 0.05.

P(p) = the probability to get a positive result regardless of other facts.

The probability of having the disease and getting a positive result = 0.001 * 0.95 = 0.00095.

The probability to NOT have it and still get a positive result = 0.999 * 0.05 = 0.04995.

Add those up and we get P(p) = 0.0509.

Right. Bayes' theorem states:

I want the probability to have the disease if I get a positive result, or P(d|p). So, hooking the numbers into the theorem...

P(d|p) = (P(p|d) * P(d)) / P(p)

P(d|p) = (0.95 * 0.001) / 0.0509

P(d|p) = 0.0186640472

So, ~1.87% chance to have the disease if I get a positive result. Did I do this right?

Let's take the following data:

1 in 1000 people get a disease.

A mandatory test for it has 95% certainty -- it will correctly give a positive result 95% of the time and correctly give a negative result 95% of the time.

If you get a positive result, what are the odds that you actually have the disease?

Right. Stop me if you see me doing something wrong.

P(d) = probability to have the disease. One in a thousand = P(d) = 0.001.

P(n) = probability to not have the disease = 1 - P(d) = 0.999.

P(p|d) = probability to have the disease and get a positive result. 95% certain test = 0.95.

P(p|n) = probability to NOT have the disease and get a positive result. 5% error margin = 0.05.

P(p) = the probability to get a positive result regardless of other facts.

The probability of having the disease and getting a positive result = 0.001 * 0.95 = 0.00095.

The probability to NOT have it and still get a positive result = 0.999 * 0.05 = 0.04995.

Add those up and we get P(p) = 0.0509.

Right. Bayes' theorem states:

I want the probability to have the disease if I get a positive result, or P(d|p). So, hooking the numbers into the theorem...

P(d|p) = (P(p|d) * P(d)) / P(p)

P(d|p) = (0.95 * 0.001) / 0.0509

P(d|p) = 0.0186640472

So, ~1.87% chance to have the disease if I get a positive result. Did I do this right?

0

## Posts

Smug DucklingonYou want to know: Given that you got a positive result, what's the probability you have the disease? P(D|P)

You need to know P(P|D), given that you DO have the disease, what's the probability of getting a positive result? We know this one is .95, given by the definition of the test's accuracy

The probability, offhand, of having the disease at all is .001=P(D)

There are two cases where you test positive: If you have the disease and if you don't

So there's a 1 in 1 thousand chance that you get the disease at all, and if you do, there's ALMOST a 100 percent chance that you test positive, so the probability of getting the disease and a positive result is ALMOST the same as just getting the disease, which intuitively tells us your .001*.95=.00095 is right

So there's a .999 chance you don't get the disease at all. If you do NOT have the disease there's a 5 percent chance you still test positive. If you had a 100 percent chance of having the disease, that means a flat .05 is your chance of testing positive. But it's only .999 that you don't have the disease, so a number slightly less than 5 percent is expected. If you're slightly less likely to not have it, you're slightly less like to get a false positive! So I believe you did that right

That's the two possible ways to get a positive result, so you can add them like you did the find the probability of getting a positive result.

No sense showing you me thinking out loud, but that's how I go through it in my head and the logic I use to justify my results. For example:

I'll take your word for it, but is it really that crazy? The .001 probability of getting the disease means you practically never have it when taking the test, so the opportunity for a true positive is kinda rarer than expected. If you're testing yourself for an incredibly rare disease using a very accurate test, and you get a positive, you need to ask yourself, what's rarer; the accurate test failing or you having the rare disease?

So P(D|P)=P(P|D)*P(D)/P(P)=.95*.001/.0509=what you got probably

So it looks right to me. To sanity check your final equation, ask questions like "what if the test were 100 % accurate?" and "what if I had a 0% chance of having the disease?" and if you don't get the answers your intuitively expect, figure out why, be it error or you misunderstanding the situation

If you're curious, I've seen Bayesian updating used primarily in a financial situation, where you use EVM data to project uncertainties in schedule and budget, and every month you update those projections based on the new data

One more point: So is a 95 percent accurate test useless? Well it is if you're testing for something relatively unlikely. What's a home pregnancy test, like 99.99 percent accurate? What if the probability of having the disease were more likely? I would assume that the test's value(how likely it is to give you a correct positive)will increases as the disease becomes more likely. Just using .5 instead of .001 in your equation(and adjusting the other values), we see the probability of a correct positive becomes 95 percent

BlochWaveonlessaccurate than your example) and use them the way you used in the example, you come up with a theoretical number that doesn't pertain to real life in any way, because in real life diagnostic tests aren't generally performed on a random sample from the population (which is an implicit assumption in your example).Instead, you perform some kind of selection first and then do the testing, in which case you need to nest Bayesian equations to get the final probability. For example, diagnosis for coeliac disease is based on endoscopy, which as far as I know is neither fantastically specific nor sensitive. But in reality you generally go through three "tests", all of which are individually either less sensitive or less specific than the test in your example. First, you need to have symptoms consistent with the disease. Second, you're tested for antibodies in your blood. Third, you undergo endoscopy to obtain a tissue sample (this is sometimes done if the symptoms are highly indicative even if the blood tests comes out as negative). To get the final probability of a positive endoscopy result indicating coeliacia, you need to take the result of the first Bayesian equation and input into the next, etc.

Bliss 101on