## Quick Links

Club PA 2.0 has arrived! If you'd like to access some extra PA content and help support the forums, check it out at patreon.com/ClubPA
The image size limit has been raised to 1mb! Anything larger than that should be linked to. This is a HARD limit, please do not abuse it.
Our new Indie Games subforum is now open for business in G&T. Go and check it out, you might land a code for a free game. If you're developing an indie game and want to post about it, follow these directions. If you don't, he'll break your legs! Hahaha! Seriously though.
Our rules have been updated and given their own forum. Go and look at them! They are nice, and there may be new ones that you didn't know about! Hooray for rules! Hooray for The System! Hooray for Conforming!

# Understanding Bayes' theorem

edited May 2008
So I'm trying to wrap my mind around Bayes' theorem. No, this is not my homework. Let's take the following data:

1 in 1000 people get a disease.

A mandatory test for it has 95% certainty -- it will correctly give a positive result 95% of the time and correctly give a negative result 95% of the time.

If you get a positive result, what are the odds that you actually have the disease?

Right. Stop me if you see me doing something wrong.

P(d) = probability to have the disease. One in a thousand = P(d) = 0.001.
P(n) = probability to not have the disease = 1 - P(d) = 0.999.

P(p|d) = probability to have the disease and get a positive result. 95% certain test = 0.95.
P(p|n) = probability to NOT have the disease and get a positive result. 5% error margin = 0.05.

P(p) = the probability to get a positive result regardless of other facts.

The probability of having the disease and getting a positive result = 0.001 * 0.95 = 0.00095.
The probability to NOT have it and still get a positive result = 0.999 * 0.05 = 0.04995.

Add those up and we get P(p) = 0.0509.

Right. Bayes' theorem states: I want the probability to have the disease if I get a positive result, or P(d|p). So, hooking the numbers into the theorem...

P(d|p) = (P(p|d) * P(d)) / P(p)

P(d|p) = (0.95 * 0.001) / 0.0509

P(d|p) = 0.0186640472

So, ~1.87% chance to have the disease if I get a positive result. Did I do this right?

Echo wrote: »
Let they who have not posted about their balls in the wrong thread cast the first stone.
Echo on

## Posts

• edited May 2008
Yep, that's right. What's even more amazing is that even if the test is 99.99% accurate, your chance of actually having the disease is still only like 90%. Crazy stuff.

Smug Duckling on • edited May 2008
mmmm Bayesian updating
Let's make sure we say these right

You want to know: Given that you got a positive result, what's the probability you have the disease? P(D|P)

You need to know P(P|D), given that you DO have the disease, what's the probability of getting a positive result? We know this one is .95, given by the definition of the test's accuracy

The probability, offhand, of having the disease at all is .001=P(D)

There are two cases where you test positive: If you have the disease and if you don't

So there's a 1 in 1 thousand chance that you get the disease at all, and if you do, there's ALMOST a 100 percent chance that you test positive, so the probability of getting the disease and a positive result is ALMOST the same as just getting the disease, which intuitively tells us your .001*.95=.00095 is right

So there's a .999 chance you don't get the disease at all. If you do NOT have the disease there's a 5 percent chance you still test positive. If you had a 100 percent chance of having the disease, that means a flat .05 is your chance of testing positive. But it's only .999 that you don't have the disease, so a number slightly less than 5 percent is expected. If you're slightly less likely to not have it, you're slightly less like to get a false positive! So I believe you did that right

That's the two possible ways to get a positive result, so you can add them like you did the find the probability of getting a positive result.

No sense showing you me thinking out loud, but that's how I go through it in my head and the logic I use to justify my results. For example:
Yep, that's right. What's even more amazing is that even if the test is 99.99% accurate, your chance of actually having the disease is still only like 90%. Crazy stuff.

I'll take your word for it, but is it really that crazy? The .001 probability of getting the disease means you practically never have it when taking the test, so the opportunity for a true positive is kinda rarer than expected. If you're testing yourself for an incredibly rare disease using a very accurate test, and you get a positive, you need to ask yourself, what's rarer; the accurate test failing or you having the rare disease?

So P(D|P)=P(P|D)*P(D)/P(P)=.95*.001/.0509=what you got probably

So it looks right to me. To sanity check your final equation, ask questions like "what if the test were 100 % accurate?" and "what if I had a 0% chance of having the disease?" and if you don't get the answers your intuitively expect, figure out why, be it error or you misunderstanding the situation

If you're curious, I've seen Bayesian updating used primarily in a financial situation, where you use EVM data to project uncertainties in schedule and budget, and every month you update those projections based on the new data

One more point: So is a 95 percent accurate test useless? Well it is if you're testing for something relatively unlikely. What's a home pregnancy test, like 99.99 percent accurate? What if the probability of having the disease were more likely? I would assume that the test's value(how likely it is to give you a correct positive)will increases as the disease becomes more likely. Just using .5 instead of .001 in your equation(and adjusting the other values), we see the probability of a correct positive becomes 95 percent

BlochWave on
• edited May 2008
The calculations seem correct to me. It's worth noting that drawing conclusions from Bayesian probabilities is tricky because the underlying assumptions aren't explicitly stated and aren't always apparent, and unless you understand the assumptions you can't really understand the result. E.g. if you take the numbers for a diagnostic test kit (some of which are considerably less accurate than your example) and use them the way you used in the example, you come up with a theoretical number that doesn't pertain to real life in any way, because in real life diagnostic tests aren't generally performed on a random sample from the population (which is an implicit assumption in your example).

Instead, you perform some kind of selection first and then do the testing, in which case you need to nest Bayesian equations to get the final probability. For example, diagnosis for coeliac disease is based on endoscopy, which as far as I know is neither fantastically specific nor sensitive. But in reality you generally go through three "tests", all of which are individually either less sensitive or less specific than the test in your example. First, you need to have symptoms consistent with the disease. Second, you're tested for antibodies in your blood. Third, you undergo endoscopy to obtain a tissue sample (this is sometimes done if the symptoms are highly indicative even if the blood tests comes out as negative). To get the final probability of a positive endoscopy result indicating coeliacia, you need to take the result of the first Bayesian equation and input into the next, etc.

Bliss 101 on Sign In or Register to comment.