Help me relearn statistics

AresProphet · August 2007

I somehow got a 5 on the AP stats test back in high school, and I can't remember a damn thing from the class. Mostly, I just need to know how to solve a specific type of problem (it's not in any way school-related, nor is it gambling-related despite the example), though a site that gives a general overview of stats math would be really nice. The problem is simplified below.

Say I'm playing a shell game, where someone puts a coin under one of three shells at random, shuffles them, then I get to guess which one has the coin under it. The probability of being right p = .33, that's easy. I can also figure out the probability of losing/winning a certain number of times in a row (p^n and [1-p]^n). What I can't remember is how to calculate what the odds are of getting a certain number of guesses right, with a fixed n. How can I calculate the probability of winning exactly 4 out of 15? Or more than 3 out of 10? Or less than five out of twenty?

I should remember this, and I don't, and it's embarassing.

Gdiguy · August 2007

The most understandable way: the odds of getting success on 4 specific trials (say, trials 1, 2, 3, and 4) and failure on the others is simply p^4 * (1-p) ^ (15-4) (which is just the probability of trial 1 outcome * probability of trial 2 outcome **** etc etc, so you'll have 4 p's and 11 1-p's).

The overall chance of getting 4 on any combination of trials, then, is the number of 4 element subsets of 15, which is the n choose k binomial factor (n! / ( (n-k)! k!) ). The link below explains it a bit more, but basically you have 15*14*13*12 possible arrangements if you choose without replacement (the trials that were successful, first could be any of them, second is any but the one you already picked, etc), but then you have to divide by some factor because in your example, you don't care whether trial 1 or trial 2 was chosen for success first (i.e., success on trials 1, 3, 5, and 7 is the same as 7, 5, 3, 1).

So it's 15! / (11 ! * 4!) * p^4 * (1-p) ^ (15-4)

(http://en.wikipedia.org/wiki/Binomial_coefficient)

Something like "more than 3 out of 10" is usually a pain in the ass - the only way I really know of calculating it exactly is just to sum up the probability of 4, 5, 6, ... etc (or 1 - sum from 0, 1, 2, 3 if that's easier to calculate, which is the equivalent problem)

GoodOmens · August 2007

You might also want to check http://faculty.vassar.edu/lowry/binomialX.html, which does the calculations for you, if you're just interested in getting an answer quickly.

AresProphet · August 2007

GoodOmens wrote: »

You might also want to check http://faculty.vassar.edu/lowry/binomialX.html, which does the calculations for you, if you're just interested in getting an answer quickly.

That page is incredibly helpful, thanks. I didn't think the calculations would be so messy to do by hand. Shows what I remember....

There's one other thing I know stats can help with, but I can't remember this either. Say I have a set of data where I don't know the probability of something happening, and I want to get a decent guess of it. I'll use some actual data I've collected for this example, with an unknown p:

6 out of 29 (.207)
10 out of 42 (.238)
14 out of 50 (.280)
8 out of 27 (.296)
5 out of 15 (.333)

Total: 43 out of 163 (.263)

Let's say I want to assume that p = .25 for this data. What are the odds that my data could show a p of .263 and simply be random chance, although the real p = .25? What if I assume p =.26? .30?

senor_x · August 2007

You may try looking more into Normal Distributions. They're a little more complicated to set up, but all the calculations are tabularized and easy once you normalize everything. For your second post, it looks like you're getting into Random Variable territory. You can find some Confidence Intervals and perform some Hypothesis Testing for the means and stuff to get a sense of the "goodness" of the data. Considering that I've taken two college Statistics & Probability for Engineers courses and one refresher course for work, I should be able to provide more explicit guidance, but I'm a systems engineer now and haven't calculated anything in a non-academic setting in seven years.

AresProphet · August 2007

senor_x wrote: »

You may try looking more into Normal Distributions. They're a little more complicated to set up, but all the calculations are tabularized and easy once you normalize everything. For your second post, it looks like you're getting into Random Variable territory. You can find some Confidence Intervals and perform some Hypothesis Testing for the means and stuff to get a sense of the "goodness" of the data. Considering that I've taken two college Statistics & Probability for Engineers courses and one refresher course for work, I should be able to provide more explicit guidance, but I'm a systems engineer now and haven't calculated anything in a non-academic setting in seven years.

This is what I vaguely having to remember doing for this kind of problem; if the true mean (the peak of the curve) is a certain number, I can make assumptions about the shape of the curve (calculate the standard deviation) and then get a confidence interval that includes the mean I got from my data (which has a rather small sample size)

I just don't remember how to go about doing it. I'm doing a lot of Googling on the subject, and I can't remember what the hell all the variables in stats are supposed to be.

Edit: whipped out my old TI-83+ to help with this. If anyone knows how to do what I need on this, that'd work too.

Edit 2: there's another way to think of this problem. The data is from running multiple instances of a binary distribution. I only have data collected from 5 points in this test (after 29, 71, 121, 148, and 163 runs) but each one was done independently. So it's a like a coin toss with a weighted coin that lands on one side more often than the other. I'm trying to calculate just how weighted the coin might be, based on my data; I want to be able to assume anything from a 1:9 to a 4:6 ratio of tosses.

Penny Arcade

Quick Links

Help me relearn statistics

Posts