As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/

Lies, Damned Lies, and Statistics

2

Posts

  • DarkPrimusDarkPrimus Registered User regular
    edited June 2011
    DarkPrimus wrote: »
    At work, they display the secret shopper score ratings on a chart next to the punch clock. The thing is that they start the ratings at 70% and end the ratings at 120%. There is no way for the secret shopper score to go past 100%. If they ended the chart at 100% the difference between the various scores would be all the more apparent.

    Presumably that's one instant where percentages above 100 are actually useful? I.e. 120% score being above the level expected of the employee/shop?

    No, it isn't useful, because the entire point of the chart is to showcase the bar graph representing the differences between the last half-dozen or so secret shopper scores. Setting the high water mark above what is actually possible does nothing but make the different bars on the chart look more similar, instead of making the 3~7 percent difference stand out more.

    It is counter to the entire purpose of what the graph is supposed to achieve.

    DarkPrimus on
  • CervetusCervetus Registered User regular
    edited June 2011
    ronya wrote: »
    A house with fewer rooms has larger rooms, and it seems quite plausible that differing layouts like these are rrelated with other factors (geography, proximity to desirable features?) that may be correlated with house price.

    i.e., omitted variable?

    just a guess, anyway

    In the end, we came to the same conclusion you did: We just didn't have enough variables, mostly because we were running off of public available data and just didn't have that many variables to work with. The story, thankfully for my grade, has a happy ending. Our model presentation was pretty funny due to the silly numbers we kept getting and, perhaps more importantly, we only took a few minutes after a long string of twenty to thirty minute presentations by people who were trying far too hard to be funny. We all ended up getting an A on the project.

    I would also guess it has something to do with a lack of variance between houses, that they're all going to have 1-3 bedrooms and 1-3 bathrooms of fairly similar sizes, and if you zoom in on a curved graph like that it might look straight.

    Cervetus on
  • GoumindongGoumindong Registered User regular
    edited June 2011
    ronya wrote: »
    A house with fewer rooms has larger rooms, and it seems quite plausible that differing layouts like these are rrelated with other factors (geography, proximity to desirable features?) that may be correlated with house price.

    i.e., omitted variable?

    just a guess, anyway

    Sorry, Nope! [sorry Ronya, but I love correcting you, more so than anyone else here]

    Omitted variable bias would move in a specific direction a set amount. I.E. The amount of bias is the true expected beta from the omitted variable multiplied by the correlation between the two[or covariance, forget which, pretty sure its correlation]. The final value for our beta of the variable we want to look at is its expected beta + the bias

    But we expect that all of those variables given [size of the house, number of bathrooms] to have fairly strong correlation [this is why we have to include them all to get accurate measures]. Such, anything that is going to correlate with one is also likely to correlate with the rest[and don't you say "but they don't have too, see instrumental variables", I know that. I just can't think of a variable that would be omitted that would correlate with one, but not others.].

    Because of this we expect the size of the bias to be roughly the same for each variable, so if we see that the beta for bathrooms is -200k and we expect it to be positive we have to wonder why the beta for everything else isn't -200k either. [or +200k respectively if we were expecting negative correlation]

    Unless our bias was multiplicative rather than additive and we had other, confounding errors on signs[or simply unlikely results] to complicate things.

    Most likely errors are this.

    1. Rooms =/= Rooms. I.E. recording errors. Not sure what effect this is going to have. But let us say that we record the number of rooms in total as well as the number of bedrooms and bathrooms. Let us say that one listing recorded rooms as "rooms that are not bedrooms and bathrooms" and others as "total rooms in the house". Not sure what kind of errors this will cause, but it should make things go all kinds of wonky.

    2. Autocorrelation: Lets say again that we are looking at rooms, bedrooms, and bathrooms. For a lot of situations these are going to sum to the same number, indicating a high level of correlation between the two. If the houses chosen happened to be similar in total room numbers you could have ballooned the confidence interval to beyond insane number.

    It is important to realize that your numbers aren't biased, per se. But since your CI is so large you could just have a relatively unlikely sample. This seems like backwards logic, but it really isn't. If you have a large negative number but a large CI [lets say to a positive hypothesized point] it could just be the case you got unlucky.

    Autocorrelation can also creep in if you aren't using a strong statistical package.

    3. Misapplication of fixed effect models due to various factors such as serial correlation. These can also bias your results. Fixed effects models seem like they would be appropriate here as would difference in difference models. Truthfully the easiest way to look at this would have been to use two rather than three period data and just use a difference in difference estimator.

    4. Mis reading of the results. I.E. lets say you're looking at rooms, bathrooms, and bedrooms. If rooms = all rooms. And not [all rooms minus bathrooms/bedrooms] then going from 2 to 1 bathroom indicates that the house has GAINED additional non-bathroom, non-bedroom rooms. Then the margin on "bathrooms" then is not "add one bathroom" its "add one bathroom and subtract one other room"!.

    5. Misspecification of non-linearities in model. The poster seems to indicate that they looked at the addition of a bathroom and that subtracting 200k. But a more reasonable look at a house does not affix a value to the bathroom at 200k, but rather some % of the sales price.

    I.E. the model is not linear-linear, but rather log-linear or log-log [or log-linear/log wooo!].

    I am actually not sure what kind of errors this type of misspecification will cause and frankly don't want to consider it right now.

    6. Non-zero error term.

    Error in our model is by "quality" having an effect on price/ sq-ft. Bathrooms in higher price/sq-ft houses are more expensive. Houses with more bathrooms tend to be of higher quality. Ohhhh shit! And, you know, ditto for all our other variables.

    ronya wrote: »
    emnmnme wrote: »
    I want to know how the right wingers can conjure up one set of statistics to support their claims and the left wingers can conjure up another set of statistics to support their claims. Math is supposed to be impartial.

    In the absence of plausibly controlled experiments, a lot of statistics as applied to the social sciences rely on theory to fill in the gaps. Theory varies by ideology.

    No, theory should not make significant changes to the statistics. It will changed proposed models, but it should not change the results of those models.

    The real answer is that it is a lot easier to get things wrong in statistics than we like to admit[and they usually tend to follow our biases]. More things get published with bad statistics/methods/models than is in any way acceptable.

    In addition if we know how things get wrong we can manipulate the way data is presented and analyzed in order to achieve the desired results, or choose a specification we know is going to show what we want.

    And to complicate things further, statistics are just games of probability. Run enough samples and you'll get the results you're looking for [probably]. With the sheer amount of research out there this means that its very easy to "pick and choose" the studies you want to look at.

    The dirtiest secret in statistics is probably this:

    All of the statistics you see rely on the model proposed being correct. Small confidence intervals do not mean that the model is correct or close to correct it just means that "given the model is correct these numbers are probably pretty close to the results". That "given the model is correct" is a fucking whopper of an assumption though. Especially given what we know about how model specifications change results.

    Choosing between specifications, when you have no theoretical reason to do so is a very iffy proposition there are tests, but there aren't any good tests[imo]
    Feral wrote: »

    I would like to understand this, but I am caught on the financial buzzwords. I'm also caught on the use of the word 'variance' to refer to a percentage. Does "variance" in investing mean the same thing as it does in stats?

    Yes, in this case since the return is in % the variance will be in % as well.

    2% +/- 1% would mean between 1-3%.

    Goumindong on
    wbBv3fj.png
  • ClipseClipse Registered User regular
    edited June 2011
    Goumindong wrote: »
    Feral wrote: »

    I would like to understand this, but I am caught on the financial buzzwords. I'm also caught on the use of the word 'variance' to refer to a percentage. Does "variance" in investing mean the same thing as it does in stats?

    Yes, in this case since the return is in % the variance will be in % as well.

    2% +/- 1% would mean between 1-3%.

    I think what Feral was driving at (although I could be mistaken) is that there is a difference in the two %'s used here. The first is a percentage return on investment; the second is variance on percentage points of return on investment. There is a massive, staggering difference between those two meanings of %.

    Your reply basically agrees with my analysis, but does not really explain for the 'uninitiated' (so to speak) what is going on here.

    Clipse on
  • GoumindongGoumindong Registered User regular
    edited June 2011
    No, there was no difference in the %'s being talked about in the original post. It sure did look like it though, since the variances were larger than the expected value. It caught me off guard until I looked at it. I think feral was confused as well.

    The issue is just that because our return is listed in %, so is our variance.

    Goumindong on
    wbBv3fj.png
  • ClipseClipse Registered User regular
    edited June 2011
    Goumindong wrote: »
    No, there was no difference in the %'s being talked about in the original post. It sure did look like it though, since the variances were larger than the expected value. It caught me off guard until I looked at it. I think feral was confused as well.

    The issue is just that because our return is listed in %, so is our variance.

    You are saying that because they are both %, they are both the same. That is not the case. The variance can be interpreted as either a percent-as-relative-portion or as a percent-as-difference-between-percentages. The difference is, I believe, quite important depending on how the investment is modeled.

    Clipse on
  • GoumindongGoumindong Registered User regular
    edited June 2011
    Clipse wrote: »
    Goumindong wrote: »
    No, there was no difference in the %'s being talked about in the original post. It sure did look like it though, since the variances were larger than the expected value. It caught me off guard until I looked at it. I think feral was confused as well.

    The issue is just that because our return is listed in %, so is our variance.

    You are saying that because they are both %, they are both the same. That is not the case. The variance can be interpreted as either a percent-as-relative-portion or as a percent-as-difference-between-percentages. The difference is, I believe, quite important depending on how the investment is modeled.

    No, it cannot. Variances are absolute amounts, not percentages. If you see someone listing it as a % of the total they are wrong. [for a number of reasons both relating to the theoretical construct of variances and the mathematical convention]

    Edit: sometimes we want to look at relative variance but we do not call it variance when we look at it in that manner, there is another name for it.

    Goumindong on
    wbBv3fj.png
  • soxboxsoxbox Registered User regular
    edited June 2011
    kaleedity wrote: »
    For one thing, it'd help if the texts were done a bit better. At least in my second semester college course, there was actually a homework problem assigned that involved comparing the rate of different types of paint drying. I am not sure if the statisticians were trying to troll students or something, but god man, really?

    A lot of the problems with learning and teaching statistics comes from the entirely inconsistent language used within different statistical fields, and like a lot of maths subjects people are expected to understand the terminology more than the actual application of the statistics.

    I studied my way through most of a degree as an actuary (before realising I didn't want to be at uni any more when there was a full-time computer programmer job for me), where you get taught statistics by going to the actual math department statistics classes, so I've got a decent grasp of a lot of statistical concepts. But when a friend of mine asked for help on her psychology statistics class, the entire thing was just incomprehensible goobledegook.

    soxbox on
  • ClipseClipse Registered User regular
    edited June 2011
    Goumindong wrote: »
    Clipse wrote: »
    Goumindong wrote: »
    No, there was no difference in the %'s being talked about in the original post. It sure did look like it though, since the variances were larger than the expected value. It caught me off guard until I looked at it. I think feral was confused as well.

    The issue is just that because our return is listed in %, so is our variance.

    You are saying that because they are both %, they are both the same. That is not the case. The variance can be interpreted as either a percent-as-relative-portion or as a percent-as-difference-between-percentages. The difference is, I believe, quite important depending on how the investment is modeled.

    No, it cannot. Variances are absolute amounts, not percentages. If you see someone listing it as a % of the total they are wrong. [for a number of reasons both relating to the theoretical construct of variances and the mathematical convention]

    Edit: sometimes we want to look at relative variance but we do not call it variance when we look at it in that manner, there is another name for it.

    You keep stating the point that I think Feral was making, and then explaining why it is not a point. I will attempt one last time:

    Not everyone knows that variance is always an absolute amount, and expressing it in a potentially ambiguous manner serves only to confuse those who do not. My default assumption would be that the % in variance was percent-as-difference-of-percentages because I'm well aware that variance should be expressed as an absolute amount in the units of the distribution (in this case, percentages-as-relative-portions). But when you express a problem in that manner and then boggle at why anyone would misunderstand it, it comes across as either myopic (ie, your understanding of basic probability and statistics is so focused on your own domain that you can't see the ambiguity) or deliberately misleading. Neither of which is particularly flattering.

    This is only a single instance of the general problem (at least in English; anyone know if other languages make a more solid distinction?) of a lack of rigorous distinction between the two possible meanings of "percent".

    Clipse on
  • GlorfindelGlorfindel Registered User regular
    edited June 2011
    Goumindong wrote: »
    Clipse wrote: »
    Goumindong wrote: »
    No, there was no difference in the %'s being talked about in the original post. It sure did look like it though, since the variances were larger than the expected value. It caught me off guard until I looked at it. I think feral was confused as well.

    The issue is just that because our return is listed in %, so is our variance.

    You are saying that because they are both %, they are both the same. That is not the case. The variance can be interpreted as either a percent-as-relative-portion or as a percent-as-difference-between-percentages. The difference is, I believe, quite important depending on how the investment is modeled.

    No, it cannot. Variances are absolute amounts, not percentages. If you see someone listing it as a % of the total they are wrong. [for a number of reasons both relating to the theoretical construct of variances and the mathematical convention]

    Edit: sometimes we want to look at relative variance but we do not call it variance when we look at it in that manner, there is another name for it.

    Correct answer is to find the square roots of those variance figures (for standard deviation) and throw it into a Sharpe Ratio to get a risk-adjusted measure of return.

    Glorfindel on
  • KakodaimonosKakodaimonos Code fondler Helping the 1% get richerRegistered User regular
    edited June 2011
    Andrew Lo would like to have a chat with you about Sharpe ratios. :)

    And for people who don't know what a Sharpe ratio is, it is the ratio of return to risk. You take the difference between the investments return and a risk free return, divide by the difference of the std dev of the excess return.

    However, like all measures, it has some issues. It does not deal with fat-tail events very well. You can have an investment that has a great Sharpe ratio, such as sell the out of the money puts on the S&P index, that every year or two can have a catastrophic risk event and blow out all your working capital.

    Kakodaimonos on
  • Atlas in ChainsAtlas in Chains Registered User regular
    edited June 2011
  • GoodOmensGoodOmens Registered User regular
    edited June 2011
    emnmnme wrote: »
    I want to know how the right wingers can conjure up one set of statistics to support their claims and the left wingers can conjure up another set of statistics to support their claims. Math is supposed to be impartial.

    The key word there is "supposed to be." Remember that, in the vast majority of cases, data are drawn not from a population, but from a sample drawn from that population. You don't have data about, say, unemployment rates for everyone in America, but rather unemployment rates for some subset of people in America because getting data for every single person is essentially impossible. Each sample is going to be at least slightly different, whether by chance or by intent. It's very easy to cherry-pick a sample which will yield the data that you want. If I want to show that Americans support gun rights, I might decide to gather my data from a city hosting the NRA national convention.

    Ideally, when some talking head is on Meet the Press, he should say exactly what sample he is using and how it was selected. Same thing with polls in Time magazine or whatever. But they know that nearly everyone watching or reading either doesn't know about sampling, or doesn't give a shit.

    GoodOmens on
    steam_sig.png
    IOS Game Center ID: Isotope-X
  • GlorfindelGlorfindel Registered User regular
    edited June 2011
    Andrew Lo would like to have a chat with you about Sharpe ratios. :)

    And for people who don't know what a Sharpe ratio is, it is the ratio of return to risk. You take the difference between the investments return and a risk free return, divide by the difference of the std dev of the excess return.

    However, like all measures, it has some issues. It does not deal with fat-tail events very well. You can have an investment that has a great Sharpe ratio, such as sell the out of the money puts on the S&P index, that every year or two can have a catastrophic risk event and blow out all your working capital.

    Yeah the Sharpe, Treynor and a third (I forget it's name) all have issues with them. Still, the concept of risk-adjusted returns is a useful one for investors to know, if not the specific measures themselves.

    Glorfindel on
  • DiannaoChongDiannaoChong Registered User regular
    edited June 2011
    GoodOmens wrote: »
    emnmnme wrote: »
    I want to know how the right wingers can conjure up one set of statistics to support their claims and the left wingers can conjure up another set of statistics to support their claims. Math is supposed to be impartial.

    The key word there is "supposed to be." Remember that, in the vast majority of cases, data are drawn not from a population, but from a sample drawn from that population. You don't have data about, say, unemployment rates for everyone in America, but rather unemployment rates for some subset of people in America because getting data for every single person is essentially impossible. Each sample is going to be at least slightly different, whether by chance or by intent. It's very easy to cherry-pick a sample which will yield the data that you want. If I want to show that Americans support gun rights, I might decide to gather my data from a city hosting the NRA national convention.

    Ideally, when some talking head is on Meet the Press, he should say exactly what sample he is using and how it was selected. Same thing with polls in Time magazine or whatever. But they know that nearly everyone watching or reading either doesn't know about sampling, or doesn't give a shit.

    Agreed, source of information is always important, and many people freak the fuck out when you explain the source of a statistic because they realize the source doesnt fit their criteria. You see this alot in business numbers. (Note, I am not talking about financials necessarily, just rudimentary statistics on basic business information like active/repeat customers, etc)

    The "freak the fuck out" part is generally important here, because when one political side doesn't like the numbers of another political side, or they want to spin it, they first attack the source. That is why we hear "well unemployment is X, but they aren't counting people not looking for a job anymore". While this was a way we did the metric for a long time, its actually reasonable to try and look at it this way and the way we do it. If we can determine(Which I haven't seen how we came up with this number, stats built on stats) that after 18 months of being unemployed, x%(a high %) would of stopped looking for work, you are no longer competing for a job with those people if you are looking. It makes sense not to count those people, because otherwise we should probably count 15 year old on up because, hey, there jobless. But if you want to look at joblessness on the whole, or people that if jobs were available they would get off their ass and walk into it, you would want to include these people.

    But in the above, it isn't about getting a metric to try and spot/solve a problem(in the straight sense), its about trying to increase a number to make another faction look as terrible as possible, when under the same circumstances they would be pushing the opposite. It isn't about being "right" or doing the right thing (except in the sense that they can think anything they touch is gold), its about winning so they "can do right".

    The best thing I run into in business everyday is where someone will take a number, and just pop it into a formula and try to extrapolate data from that. And they have no idea what the source number meant, any documentation to support their finding, and as long as someone just says it is what it is, thats good enough for them. I run into this basically daily, usually trying to do it with the results from my work and they have no idea what the results meant because they just got a number instead of understanding the number. I am a bit jaded on that one.

    DiannaoChong on
    steam_sig.png
  • acidlacedpenguinacidlacedpenguin Institutionalized Safe in jail.Registered User regular
    edited June 2011
    mcdermott wrote: »
    Statistics: the art of torturing numbers until they tell you what you want to hear.


    Definitely agree on base probabilities. Every time I see some publicized study that touts a ZOMG 200% INCREASE!!1! I immediately start looking for it...because that nearly always means a jump from 0.1% to 0.3%.

    yeah like when the news media reported about the radiation increase at Fukushima being like 100 million times higher than the acceptable dose rate, but failed to mention that the reported dose amount changed from approximately 0% to 0% and that the effect would be a cancer risk increase from like 1% to 1.05%

    hey look I just manipulated statistics too!

    acidlacedpenguin on
    GT: Acidboogie PSNid: AcidLacedPenguiN
  • YarYar Registered User regular
    edited June 2011
    Guys, what do you expect when 50% of people in this country are below-average intelligence?

    Actually, I recently engaged in a back-and-forth email conversation with the guy who wrote the Politifact article on Jon Stewart, in which we agreed that the one study that seemed to indicate Fox News viewers were misinformed was actually itself a very misleading and confusing study that really only showed that people on both sides of the spectrum and among all media audiences will tend to choose the answer that supports their politics as the correct answer, whether it's correct or not.

    Yar on
  • Premier kakosPremier kakos Registered User, ClubPA regular
    edited June 2011
    furlion wrote: »
    I am a fan of statistics and using them to convey information in a useful manner. It really pisses me off when people use them to trick other people though. For instance while my wife was pregnant she saw an article in a magazine saying that women who had a mother or sister with preterm delivery were 50% more likely to have a preterm delivery. She was quite upset about it since her sister had two. Upon carefully perusing the article I noticed at the bottom in very very small print it said the incidence went from 2.7% to 4%. So yes, a roughly 50% increase, but that is not the number they should have emphasized. Taking advantage of people's ignorance like that is disgusting.

    Thank you for using the word "peruse" correctly. <3

    Premier kakos on
  • Edith_Bagot-DixEdith_Bagot-Dix Registered User regular
    edited June 2011
    Feral wrote: »
    Oh, I've got another one.

    Let's say you have a medical test for an exotic disease and you're testing the general population. If the likelihood of a false positive on the test is higher than the prevalence of that disease, then a positive result is more likely to be wrong than it is right.

    In other words, let's say there's a new disease called Fabricatosis. 1% of the people in the world have fabricatosis; there's a blood test with a false positive rate of 5% and a false negative rate of 5%. If we test 10,000 people, we would expect:

    95 true positives
    500 false positives
    5 false negatives
    9,400 true negatives

    There are a couple of ways (that I know of) to get around this.

    1) Don't bother testing random people. Only test people who have symptoms or have been exposed. The prevalence of fabricatosis in the general population might be 1%, but we can presume that the prevalence is actually much higher among the subset of people who show the symptoms of fabricatosis. This is what we do with mononucleosis.

    2) Develop a different test and use the two tests together. This reduces the false positive rate exponentially. This is what we do for HIV.

    It's fun to consider this sort of problem when considering the effectiveness of the additional security screening measures that have been implemented for air travelers.

    Edith_Bagot-Dix on


    Also on Steam and PSN: twobadcats
  • electricitylikesmeelectricitylikesme Registered User regular
    edited June 2011
    The title of this thread is the thing which really annoys me. You can go to some effort to present data in an accurate, informed way and make the whole thing as above board as possible and if it doesn't agree with someone's world view the very first thing they'll retort with is "well you can make statistics say anything!".

    electricitylikesme on
  • DetharinDetharin Registered User regular
    edited June 2011
    I am reminded of an discussion I got into some random pro Obama campaigner back before the election. You know the type who has picked a candidate before they even looked at the issues and knows jack all about their candidates campaign promises, positions, or anything else. They can wave a sign and hand out buttons like no ones business though. Anyway the conversation went something like this

    Them "You should vote for Obama because of he is going to make everything safer by banning guns!"
    Me "You know that really hasn't worked out well for areas like Washington DC, and Britain has just seen an increase in stabbings."
    Them "Well my parents say we would be safer without guns, they are Statisticians who work in Washington DC and they have the numbers"
    Me "Do you know what a Statistician does?"
    Them "Ummm..."
    Me "Ever heard the saying there are lies, damn lies, and statistics?"
    Them "Are you calling my parents liars?"

    At that point I wandered off laughing. 50% of the people in that conversation apparently believe everything shown to them using numbers.

    Detharin on
  • ElJeffeElJeffe Moderator, ClubPA mod
    edited June 2011
    The title of this thread is the thing which really annoys me. You can go to some effort to present data in an accurate, informed way and make the whole thing as above board as possible and if it doesn't agree with someone's world view the very first thing they'll retort with is "well you can make statistics say anything!".

    The problem is likely that the person is informed enough to know that people misuse statistics all the time, but not informed enough to know what properly-used statistics actually look like. Which is understandable, because honestly, the difference between proper statistics and an obfuscatory number orgy is hard to pick out. And it doesn't help that a lot of studies that are performed, or polls that are run, are done either by unscrupulous folks who are trying to generate a specific result, or by folks who honestly don't know what the hell they're doing (but are probably hoping for a specific result).

    The upshot is that a lot of people just ignore anything expressed as a percentage, and it's hard to blame them.

    ElJeffe on
    I submitted an entry to Lego Ideas, and if 10,000 people support me, it'll be turned into an actual Lego set!If you'd like to see and support my submission, follow this link.
  • TheBigEasyTheBigEasy Registered User regular
    edited June 2011
    Without meaningless and empty statistics, all of our sports analysts would be out of a job.


    "How can you trade Thompson?! He leads the team in 8th-inning doubles in the month of July!"

    "Watch out for Ramirez this season. He's never lost when his team has been up by 21 or more points going into the fourth quarter."


    Also, the use of statistics should have made professional sports drafts worthless, yet they persist.

    "We're picking this shitty kid for QB with our first pick, despite the fact that we already have a quarterback and this kid is really, really shitty, in a really obvious way. But what can we do? He went to Notre Dame!"

    Since I only watch the NBA, I never encountered it in the other sports ... do they do that in football and baseball as well?

    I can live with stuff like "his 4th quarter scoring average is x points" ... but when it ventures into "they are 11-0 in games where they led by 5 or more with 3 minutes or less to go" I am like "WTF? way to try and explain something by anecdotal evidence".

    On the other hand, the announcers would be sitting there saying nothing for large portions of a game, if they don't have those useless statistics as a filler.

    I'd love for someone to just make stuff up during a telecast. "Whenever he eats wheaties before a game and then is behind by 5 or more points, he always misses the first 5 shots in the fourth quarter" ... totally deadpan. He'd be out of a job soon, but some of that announcing stuff is so ingrained by now, it is jarring every time you hear it.

    Same with those useless interviews between quarters ... "Why are you winning right now?" Can please some coach answer with "Because we are scoring more points than the other team?"

    But enough ranting ... back to your regularly scheduled statistics.

    TheBigEasy on
  • OptimusZedOptimusZed Registered User regular
    edited June 2011
    ElJeffe wrote: »
    The title of this thread is the thing which really annoys me. You can go to some effort to present data in an accurate, informed way and make the whole thing as above board as possible and if it doesn't agree with someone's world view the very first thing they'll retort with is "well you can make statistics say anything!".

    The problem is likely that the person is informed enough to know that people misuse statistics all the time, but not informed enough to know what properly-used statistics actually look like. Which is understandable, because honestly, the difference between proper statistics and an obfuscatory number orgy is hard to pick out. And it doesn't help that a lot of studies that are performed, or polls that are run, are done either by unscrupulous folks who are trying to generate a specific result, or by folks who honestly don't know what the hell they're doing (but are probably hoping for a specific result).

    The upshot is that a lot of people just ignore anything expressed as a percentage, and it's hard to blame them.
    This phenomenon combines rather stabtacularly with confirmation bias to give us those people who uncritically accept any sort of analysis that tells them what they want to hear and simply cannot be convinced of the veracity of the data in situations where it contradicts their worldview.

    The anti-vaxers and global warming deniers being prime examples.

    OptimusZed on
    We're reading Rifts. You should too. You know you want to. Now With Ninjas!

    They tried to bury us. They didn't know that we were seeds. 2018 Midterms. Get your shit together.
  • YarYar Registered User regular
    edited June 2011
    When a certain piece of information is reported as fact, the amount of time, intelligence, dedication, and education it takes to actually research it for yourself and determine its validity can be beyond the means of most people. There is a wealth of information in the media, books, etc., which is presented as rigorous statistical evidence, or even as science, but which after painstaking follow-up research can be reasoned to be completely misleading or bogus. Most people can't sort this out. People like us on this board can, but even then only on a limited set of information. People are left with few options, and an attractive option for most is just to believe what they want to believe.

    Some might argue for "consensus of experts" or such, but this falls into the same problem. Without painstakingly looking to see if it is in fact the consensus of experts, rather than just being reported as such, you don't know. Perhaps what you do know is that the same childhood friend who told you that the world would run out of oil before the year 2000, and that the planet would be completely deforested by 2010, is now the one telling you that climate is undergoing dramatic and damaging changes and we'll all be X by the year Y.

    Yar on
  • hippofanthippofant ティンク Registered User regular
    edited June 2011
    My favourite is that since we scientists use p < 0.05, one in every 20 scientific studies is completely junk.

    Which then explains why it seems everything is trying to kill you with cancer.

    hippofant on
  • ElJeffeElJeffe Moderator, ClubPA mod
    edited June 2011
    @Yar - Bingo. Which is why I generally don't fault people too much for being ignorant. It's very difficult to not be ignorant when you're surrounded by the sort of media barrage we have. Me, I just pretty much ignore any claim that might be even the slightest bit controversial until I can look into it myself. Because the odds of the media taking something complicated and reporting on it accurately are probably 50-50 at best. And even if they get the facts right, they've probably muffed the implications.

    ElJeffe on
    I submitted an entry to Lego Ideas, and if 10,000 people support me, it'll be turned into an actual Lego set!If you'd like to see and support my submission, follow this link.
  • SliderSlider Registered User regular
    edited June 2011
    I use the words "more than likely" rather than using statistics or listing a certain percentage. In college, I was forced to take a Statistics course and had to work my ass off to get a B-.

    Slider on
  • PotatoNinjaPotatoNinja Fake Gamer Goat Registered User regular
    edited June 2011
    Statistics are very useful, but like all forms of technical information they require context and narrative in order to be useful in establishing an opinion.

    It would be nice for more people to be familiar with common logical fallacies, but if we're making wishes I'd also like a unicorn.

    PotatoNinja on
    Two goats enter, one car leaves
  • FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    edited June 2011
    Clipse wrote: »
    You keep stating the point that I think Feral was making, and then explaining why it is not a point.

    I wasn't making a point. I was simply sincerely confused. Like, I didn't know if the variance was stated as a percentage of your expected mean return or as a percentage of your total investment or what.

    Goum cleared me up and the rest of this discussion has been enlightening!

    But I really wasn't trying to make an argument there.

    Feral on
    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    edited June 2011
    hippofant wrote: »
    My favourite is that since we scientists use p < 0.05, one in every 20 scientific studies is completely junk.

    Which then explains why it seems everything is trying to kill you with cancer.

    Well, not exactly 1 in 20. It depends on the effect you're looking for and your sample size. I'm too lazy to do the math right now, but even with a p < 0.05, the possibility of a false positive gets infinitesimally small with an arbitrarily large sample size.

    But your overall point is correct. Studies looking for very subtle effects in small sample sizes (which describes, unfortunately, a lot of experiments in medicine and the social sciences) have a large enough possibility of false positives that we shouldn't take a single positive result and run with it.

    Feral on
    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • PotatoNinjaPotatoNinja Fake Gamer Goat Registered User regular
    edited June 2011
    I do find about 1/2 of the "statistics gotcha" tricks are really just semantic games that give me a 40% desire to punch a statistician 100% in the face.

    But plenty of them are useful thought exercises at least.

    PotatoNinja on
    Two goats enter, one car leaves
  • YarYar Registered User regular
    edited June 2011
    This thread is totally the best thread, by 57%.

    Yar on
  • GoumindongGoumindong Registered User regular
    edited June 2011
    Feral wrote: »
    Well, not exactly 1 in 20. It depends on the effect you're looking for and your sample size. I'm too lazy to do the math right now, but even with a p < 0.05, the possibility of a false positive gets infinitesimally small with an arbitrarily large sample size.

    Nope. Arbitrarily large sample size simply changes the place at which a p value of .05 occurs. If you set your p-value at .05 then you are defining a false positive 5% of the time.

    But note what a p-value actually means. It means, specifically that "If the null hypothesis is true, we will reject it, when we really ought not have done so 5% of the time". So it isn't quite right to say 1/20 studies will be false positives if everyone uses a 5% p-value, but "assuming the null hypothesis is true, 1/20 studies will be false positives if everyone uses a 5% p-value".

    The truth is that the type 2 error probability[vs type 1, which is the p-value] is typically unknown and that fact is may be even more so problematic than the 5% critical value.

    Note that the 5% critical value does mean that "If we have a bias to testing against null hypothesis that we deem as correct then we will make that mistake 5% of the time" is relatively true. And that should be worrying because a lot of people do tend to test things that we might view as true and results that don't pass the null are less likely to be printed than those that do.

    Goumindong on
    wbBv3fj.png
  • Edith_Bagot-DixEdith_Bagot-Dix Registered User regular
    edited June 2011
    Yar wrote: »
    This thread is totally the best thread, by 57%.

    It needs to be about 20% cooler.

    Edith_Bagot-Dix on


    Also on Steam and PSN: twobadcats
  • spool32spool32 Contrary Library Registered User regular
    edited June 2011
    This thread is definitely 57% cooler than the next coolest one.


    So I'm really thinking about seeing if i can get the charter school my kids attend to let me do a week-long lecture on this topic, for outgoing seniors. Has anyone ever tried to do anything like this?

    spool32 on
  • FeralFeral MEMETICHARIZARD interior crocodile alligator ⇔ ǝɹʇɐǝɥʇ ǝᴉʌoɯ ʇǝloɹʌǝɥɔ ɐ ǝʌᴉɹp ᴉRegistered User regular
    edited June 2011
    Goumindong wrote: »
    Feral wrote: »
    Well, not exactly 1 in 20. It depends on the effect you're looking for and your sample size. I'm too lazy to do the math right now, but even with a p < 0.05, the possibility of a false positive gets infinitesimally small with an arbitrarily large sample size.

    Nope. Arbitrarily large sample size simply changes the place at which a p value of .05 occurs. If you set your p-value at .05 then you are defining a false positive 5% of the time.

    Yeah, you're right. I wasn't thinking. It's the possibility of a false negative that gets smaller with a larger sample size. :P

    Feral on
    every person who doesn't like an acquired taste always seems to think everyone who likes it is faking it. it should be an official fallacy.

    the "no true scotch man" fallacy.
  • KakodaimonosKakodaimonos Code fondler Helping the 1% get richerRegistered User regular
    edited June 2011
    Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right. The third statistician didn't fire, but shouted in triumph, "We got it! We got it!"

    Confirmation and survivor bias is always a big issue with hedge funds & mutual funds. You never see the funds run by the management firm that get shut down due to poor returns. So if you're looking at only the investment funds that are still profitable, you may get the idea that the majority of the investment funds are profitable.

    Kakodaimonos on
  • darklite_xdarklite_x I'm not an r-tard... Registered User regular
    edited June 2011
    My favorite statistics are always those scare-tactic ones like 30% of alcohol drinkers die in car wrecks or 27% of all voters disapprove of the president (made up statistics). They're always worded to make you feel a certain way but like, 27% disapprove? Well what the fuck about the other 73%? The president must be doing a damn good job. What I'm getting at here is fuck news outlets for doing everything they can to skew statistics and word them in such a way as to manipulate people instead of just presenting them as facts.

    darklite_x on
    Steam ID: darklite_x Xbox Gamertag: Darklite 37 PSN:Rage_Kage_37 Battle.Net:darklite#2197
  • acidlacedpenguinacidlacedpenguin Institutionalized Safe in jail.Registered User regular
    edited June 2011
    darklite_x wrote: »
    My favorite statistics are always those scare-tactic ones like 30% of alcohol drinkers die in car wrecks or 27% of all voters disapprove of the president (made up statistics). They're always worded to make you feel a certain way but like, 27% disapprove? Well what the fuck about the other 73%? The president must be doing a damn good job. What I'm getting at here is fuck news outlets for doing everything they can to skew statistics and word them in such a way as to manipulate people instead of just presenting them as facts.

    but 60% of the time those manipulations work every time!
    STATS PANTHER

    acidlacedpenguin on
    GT: Acidboogie PSNid: AcidLacedPenguiN
Sign In or Register to comment.