As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/
Options

Lies, Damned Lies, and Statistics

13»

Posts

  • Options
    SyrdonSyrdon Registered User regular
    edited June 2011
    Feral wrote: »
    Oh, I've got another one.

    Let's say you have a medical test for an exotic disease and you're testing the general population. If the likelihood of a false positive on the test is higher than the prevalence of that disease, then a positive result is more likely to be wrong than it is right.

    In other words, let's say there's a new disease called Fabricatosis. 1% of the people in the world have fabricatosis; there's a blood test with a false positive rate of 5% and a false negative rate of 5%. If we test 10,000 people, we would expect:

    95 true positives
    500 false positives
    5 false negatives
    9,400 true negatives

    There are a couple of ways (that I know of) to get around this.

    1) Don't bother testing random people. Only test people who have symptoms or have been exposed. The prevalence of fabricatosis in the general population might be 1%, but we can presume that the prevalence is actually much higher among the subset of people who show the symptoms of fabricatosis. This is what we do with mononucleosis.

    2) Develop a different test and use the two tests together. This reduces the false positive rate exponentially. This is what we do for HIV.
    It relates to your first option, but you can also just use the test as a negative. That is, never take the word of the test that you whatever to be correct, only that you do not have whatever. That way you end up with 9,400 correct, 595 unsure and 5 wrong. You are left with the issue of what to do with the nearly 595 unsure results, but you can at least salvage some data from a fairly bad system.

    edit: As an additional thing, what's the best way to handle very uncertain events? For example, how do you reasonably handle something that occurs one time in a thousand? You basically need a sample size that is a least an order of magnitude larger right?

    second edit: My former work place used to put up graphs of customer satisfaction scores for the different call centers that worked for the company as a way of encouraging competition between the centers (and hopefully improving the scores). Every single on the of the graphs had the leading group not quite at the top of the graph, and the trailing group not quite at the bottom, so you ended up with a 5-6 inch difference when it was printed out. Actual difference in these scores? Usually 3-5 out of a possible 100. Also missing from the graph? Margin of error on the customer survey. Estimated at 10% last I saw. Every quarter I got to point out to my boss and the other people in the meeting that there was a scale with numbers on the left side of the graph, and that they told a different story than the picture.

    Syrdon on
  • Options
    curby_fcurby_f Registered User regular
    edited June 2011
    Yar wrote: »
    When a certain piece of information is reported as fact, the amount of time, intelligence, dedication, and education it takes to actually research it for yourself and determine its validity can be beyond the means of most people. There is a wealth of information in the media, books, etc., which is presented as rigorous statistical evidence, or even as science, but which after painstaking follow-up research can be reasoned to be completely misleading or bogus. Most people can't sort this out. People like us on this board can, but even then only on a limited set of information. People are left with few options, and an attractive option for most is just to believe what they want to believe.

    Some might argue for "consensus of experts" or such, but this falls into the same problem. Without painstakingly looking to see if it is in fact the consensus of experts, rather than just being reported as such, you don't know. Perhaps what you do know is that the same childhood friend who told you that the world would run out of oil before the year 2000, and that the planet would be completely deforested by 2010, is now the one telling you that climate is undergoing dramatic and damaging changes and we'll all be X by the year Y.

    This is where peer reviewing comes in to play. The thing is that the news cycle and publishing industry don't like to wait for peer reviewed articles before promoting the results of a study. Sometimes, especially with controversial findings, preliminary results are sent to the media before the original paper has even been released so we have no true understanding of their methodology or assumptions.

    But back on the topic of statistics, here's a couple of pick up lines for you all:
    • You know what they say: it’s not the size of the p-value, but how you interpret it!
    • When I saw you walk in, I had to adjust my R-squared!
    • You make the slope of my utility function strictly convex!

    Shamelessly stolen from: http://www.drewconway.com/zia/?p=336

    curby_f on
Sign In or Register to comment.