The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
Can you please give me a concise picture of the difference between a population variance and sample variance and why that difference means the sample variance is computed by dividing by n-1, and not n like in the population variance?
The textbook is being unbelievably vague about it, just spouting formulas. It also goes on about degrees of freedom and bias and I am totally lost. Wikipedia is absolutely no help either.
I just want to understand conceptually why they are computed differently, considering the sample is just a subset of the population.
When you compute a sample variance, what you're really trying to do is estimate the variance of the population. Because you know you aren't going to estimate perfectly, you want to be conservative in your estimate. Conservative in this context means stating the variance as larger than you otherwise would, since most conclusions you want to reach would be helped by a smaller variance. So, you divide by a smaller number than you otherwise would.
That's the justification for it. I cannot even come close to remembering the math behind the justification 8! years after learning it.
That n-1 is known as Bessel's correction. Now you may be thinking "Correction?? That sounds a lot like a fudge factor." You would be correct. Statistics is all voodoo math, man, it's just highly useful voodoo math with some really good justifications behind it.
The best informal and fairly intuitive explanation for Bessel's correction is mentioned briefly by the wikipedia article: when you use a sample mean to compute a sample variance you are effectively removing one degree of freedom. When computing population variance, your "sample" is the entire population, and so your "sample" mean is in fact the actual mean of the distribution. It's easy to see, modifying the calculation on the wikipedia article, that the n/(n-1) fudge factor is not necessary if you can use the actual mean of the distribution rather than a sample mean.
Wow, that's really ghetto, I didn't know that myself.
Basically, it's an unbiased estimator (as you get more samples, you converge on the correct value) and it's slightly higher *depending on n* (which is a related point).
So if n=5, you multiply by 5/4 = 1.25, since you don't have a lot of samples. If n=1000, you multiply by 1001/1000 = 1.001, which is much smaller. So it's a function of n that decreases non-linearly as you get more data. Obviously as n goes to infinity, (n/(n-1)) will approach 1, so you won't be correcting anymore, which is why this is an unbiased estimator.
Basically: it has all of the properties that a fudge factor should have, and it works decently in practice. Voodoo math but with a little reasoning sprinkled on top to taste.
The basic idea is this: within the population there will be a few examples of extreme values. For example, if you're talking about height, there are a small number of very tall people, and a small number of very short people. These extreme values tend to increase the standard deviation and variance, because they are far from the norm.
OK, so take a sample from the population. Grab 25 people at random. Chances are very good that you won't get a super-tall or super-short person because they're very rare. So your sample will have values closer to the average, so the sample variance will be artificially small as a result. It won't match the population variance, which is what you're trying to do. To account for that, you divide by n-1, because dividing by a smaller number results in a larger result in order for them to match.
There are more complex methods to account for that difference, but the n-1 works well for most situations.
That n-1 is known as Bessel's correction. Now you may be thinking "Correction?? That sounds a lot like a fudge factor." You would be correct. Statistics is all voodoo math, man, it's just highly useful voodoo math with some really good justifications behind it.
This page goes into the meat of the matter.
One reason for "why n-1?" is that you are using the sample mean, xbar = (x_1 + x_2 + ... + x_n) / n, to compute the sample variance, instead of the true mean. Bessel's correction just makes it so the sample variance is an unbiased estimator of the true variance, which means the expected value of the estimator is equal to the true value that is being estimated.
Here is another derivation on wikipedia showing that the corrected sample variance is an unbiased estimator:
Note how the variance of the sample mean has to be taken into account when computing the overall expected value of the estimator.
It also goes on about degrees of freedom and bias and I am totally lost.
Degrees of freedom apply when your sample size is under 30. You then use n-2 instead of n-1, and another table comes into play for calculating Z scores, which is considerably smaller and easier to use.
I don't know of the actual explanation behind it, but from the other replies I'd deduct it's to further compensate for the possible error.
ApexMirage on
I'd love to be the one disappoint you when I don't fall down
Posts
That's the justification for it. I cannot even come close to remembering the math behind the justification 8! years after learning it.
Basically, it's an unbiased estimator (as you get more samples, you converge on the correct value) and it's slightly higher *depending on n* (which is a related point).
So if n=5, you multiply by 5/4 = 1.25, since you don't have a lot of samples. If n=1000, you multiply by 1001/1000 = 1.001, which is much smaller. So it's a function of n that decreases non-linearly as you get more data. Obviously as n goes to infinity, (n/(n-1)) will approach 1, so you won't be correcting anymore, which is why this is an unbiased estimator.
Basically: it has all of the properties that a fudge factor should have, and it works decently in practice. Voodoo math but with a little reasoning sprinkled on top to taste.
OK, so take a sample from the population. Grab 25 people at random. Chances are very good that you won't get a super-tall or super-short person because they're very rare. So your sample will have values closer to the average, so the sample variance will be artificially small as a result. It won't match the population variance, which is what you're trying to do. To account for that, you divide by n-1, because dividing by a smaller number results in a larger result in order for them to match.
There are more complex methods to account for that difference, but the n-1 works well for most situations.
IOS Game Center ID: Isotope-X
This page goes into the meat of the matter.
One reason for "why n-1?" is that you are using the sample mean, xbar = (x_1 + x_2 + ... + x_n) / n, to compute the sample variance, instead of the true mean. Bessel's correction just makes it so the sample variance is an unbiased estimator of the true variance, which means the expected value of the estimator is equal to the true value that is being estimated.
Here is another derivation on wikipedia showing that the corrected sample variance is an unbiased estimator:
Note how the variance of the sample mean has to be taken into account when computing the overall expected value of the estimator.
Degrees of freedom apply when your sample size is under 30. You then use n-2 instead of n-1, and another table comes into play for calculating Z scores, which is considerably smaller and easier to use.
I don't know of the actual explanation behind it, but from the other replies I'd deduct it's to further compensate for the possible error.