Our new Indie Games subforum is now open for business in G&T. Go and check it out, you might land a code for a free game. If you're developing an indie game and want to post about it, follow these directions. If you don't, he'll break your legs! Hahaha! Seriously though.

Our rules have been updated and given their own forum. Go and look at them! They are nice, and there may be new ones that you didn't know about! Hooray for rules! Hooray for The System! Hooray for Conforming!

## Posts

No, it isn't useful, because the entire point of the chart is to showcase the bar graph representing the differences between the last half-dozen or so secret shopper scores. Setting the high water mark above what is

actually possibledoes nothing but make the different bars on the chart look more similar, instead of making the 3~7 percent difference stand out more.It is counter to the entire purpose of what the graph is supposed to achieve.

Gamertag: PrimusD| Rock Band DLC | GW:OttW - arrcd | WLD - ThortarI would also guess it has something to do with a lack of variance between houses, that they're all going to have 1-3 bedrooms and 1-3 bathrooms of fairly similar sizes, and if you zoom in on a curved graph like that it might look straight.

Sorry, Nope! [sorry Ronya, but I love correcting you, more so than anyone else here]

Omitted variable bias would move in a specific direction a set amount. I.E. The amount of bias is the true expected beta from the omitted variable multiplied by the correlation between the two[or covariance, forget which, pretty sure its correlation]. The final value for our beta of the variable we want to look at is its expected beta + the bias

But we expect that all of those variables given [size of the house, number of bathrooms] to have fairly strong correlation [this is why we have to include them all to get accurate measures]. Such, anything that is going to correlate with one is also likely to correlate with the rest[and don't you say "but they don't have too, see instrumental variables", I know that. I just can't think of a variable that would be omitted that would correlate with one, but not others.].

Because of this we expect the size of the bias to be roughly the same for each variable, so if we see that the beta for bathrooms is -200k and we expect it to be positive we have to wonder why the beta for everything else isn't -200k either. [or +200k respectively if we were expecting negative correlation]

Unless our bias was multiplicative rather than additive and we had other, confounding errors on signs[or simply unlikely results] to complicate things.

Most likely errors are this.

1. Rooms =/= Rooms. I.E. recording errors. Not sure what effect this is going to have. But let us say that we record the number of rooms in total as well as the number of bedrooms and bathrooms. Let us say that one listing recorded rooms as "rooms that are not bedrooms and bathrooms" and others as "total rooms in the house". Not sure what kind of errors this will cause, but it should make things go all kinds of wonky.

2. Autocorrelation: Lets say again that we are looking at rooms, bedrooms, and bathrooms. For a lot of situations these are going to sum to the same number, indicating a high level of correlation between the two. If the houses chosen happened to be similar in total room numbers you could have ballooned the confidence interval to beyond insane number.

It is important to realize that your numbers aren't biased, per se. But since your CI is so large you could just have a relatively unlikely sample. This seems like backwards logic, but it really isn't. If you have a large negative number but a large CI [lets say to a positive hypothesized point] it could just be the case you got unlucky.

Autocorrelation can also creep in if you aren't using a strong statistical package.

3. Misapplication of fixed effect models due to various factors such as serial correlation. These can also bias your results. Fixed effects models seem like they would be appropriate here as would difference in difference models. Truthfully the easiest way to look at this would have been to use two rather than three period data and just use a difference in difference estimator.

4. Mis reading of the results. I.E. lets say you're looking at rooms, bathrooms, and bedrooms. If rooms = all rooms. And not [all rooms minus bathrooms/bedrooms] then going from 2 to 1 bathroom indicates that the house has GAINED additional non-bathroom, non-bedroom rooms. Then the margin on "bathrooms" then is not "add one bathroom" its "add one bathroom and subtract one other room"!.

5. Misspecification of non-linearities in model. The poster seems to indicate that they looked at the addition of a bathroom and that subtracting 200k. But a more reasonable look at a house does not affix a value to the bathroom at 200k, but rather some % of the sales price.

I.E. the model is not linear-linear, but rather log-linear or log-log [or log-linear/log wooo!].

I am actually not sure what kind of errors this type of misspecification will cause and frankly don't want to consider it right now.

6. Non-zero error term.

Error in our model is by "quality" having an effect on price/ sq-ft. Bathrooms in higher price/sq-ft houses are more expensive. Houses with more bathrooms tend to be of higher quality. Ohhhh shit! And, you know, ditto for all our other variables.

No, theory should not make significant changes to the statistics. It will changed proposed models, but it should not change the results of those models.

The real answer is that it is a lot easier to get things wrong in statistics than we like to admit[and they usually tend to follow our biases]. More things get published with bad statistics/methods/models than is in any way acceptable.

In addition if we know how things get wrong we can manipulate the way data is presented and analyzed in order to achieve the desired results, or choose a specification we know is going to show what we want.

And to complicate things further, statistics are just games of probability. Run enough samples and you'll get the results you're looking for [probably]. With the sheer amount of research out there this means that its very easy to "pick and choose" the studies you want to look at.

The dirtiest secret in statistics is probably this:

All of the statistics you see rely on the model proposed being correct. Small confidence intervals do not mean that the model is correct or close to correct it just means that "given the model is correct these numbers are probably pretty close to the results". That "given the model is correct" is a fucking whopper of an assumption though. Especially given what we know about how model specifications change results.

Choosing between specifications, when you have no theoretical reason to do so is a very iffy proposition there are tests, but there aren't any good tests[imo]

Yes, in this case since the return is in % the variance will be in % as well.

2% +/- 1% would mean between 1-3%.

I think what Feral was driving at (although I could be mistaken) is that there is a difference in the two %'s used here. The first is a percentage return on investment; the second is variance on

percentage pointsof return on investment. There is a massive, staggering difference between those two meanings of %.Your reply basically agrees with my analysis, but does not really explain for the 'uninitiated' (so to speak) what is going on here.

The issue is just that because our return is listed in %, so is our variance.

You are saying that because they are both %, they are both the same. That is not the case. The variance can be interpreted as either a percent-as-relative-portion or as a percent-as-difference-between-percentages. The difference is, I believe, quite important depending on how the investment is modeled.

No, it cannot. Variances are absolute amounts, not percentages. If you see someone listing it as a % of the total they are wrong. [for a number of reasons both relating to the theoretical construct of variances and the mathematical convention]

Edit: sometimes we want to look at relative variance but we do not call it variance when we look at it in that manner, there is another name for it.

A lot of the problems with learning and teaching statistics comes from the entirely inconsistent language used within different statistical fields, and like a lot of maths subjects people are expected to understand the terminology more than the actual application of the statistics.

I studied my way through most of a degree as an actuary (before realising I didn't want to be at uni any more when there was a full-time computer programmer job for me), where you get taught statistics by going to the actual math department statistics classes, so I've got a decent grasp of a lot of statistical concepts. But when a friend of mine asked for help on her psychology statistics class, the entire thing was just incomprehensible goobledegook.

I made an Online Tool for playing D&D- - - - - - -D&D Characters:Lyedyn Soan (Vale of Buried Shadows)Play with me on Steam

You keep stating the point that I think Feral was making, and then explaining why it is not a point. I will attempt one last time:

My default assumption would be that the % in variance was percent-as-difference-of-percentages because I'm well aware that variance should be expressed as an absolute amount in the units of the distribution (in this case, percentages-as-relative-portions). But when you express a problem in that manner and then boggle at why anyone would misunderstand it, it comes across as either myopic (ie, your understanding of basic probability and statistics is so focused on your own domain that you can't see the ambiguity) or deliberately misleading. Neither of which is particularly flattering.Not everyone knows that variance is always an absolute amount, and expressing it in a potentially ambiguous manner serves only to confuse those who do not.This is only a single instance of the general problem (at least in English; anyone know if other languages make a more solid distinction?) of a lack of rigorous distinction between the two possible meanings of "percent".

Correct answer is to find the square roots of those variance figures (for standard deviation) and throw it into a Sharpe Ratio to get a risk-adjusted measure of return.

And for people who don't know what a Sharpe ratio is, it is the ratio of return to risk. You take the difference between the investments return and a risk free return, divide by the difference of the std dev of the excess return.

However, like all measures, it has some issues. It does not deal with fat-tail events very well. You can have an investment that has a great Sharpe ratio, such as sell the out of the money puts on the S&P index, that every year or two can have a catastrophic risk event and blow out all your working capital.

The key word there is "supposed to be." Remember that, in the vast majority of cases, data are drawn not from a population, but from a sample drawn from that population. You don't have data about, say, unemployment rates for everyone in America, but rather unemployment rates for some subset of people in America because getting data for every single person is essentially impossible. Each sample is going to be at least slightly different, whether by chance or by intent. It's very easy to cherry-pick a sample which will yield the data that you want. If I want to show that Americans support gun rights, I might decide to gather my data from a city hosting the NRA national convention.

Ideally, when some talking head is on Meet the Press, he should say exactly what sample he is using and how it was selected. Same thing with polls in Time magazine or whatever. But they know that nearly everyone watching or reading either doesn't know about sampling, or doesn't give a shit.

IOS Game Center ID: Isotope-X

Yeah the Sharpe, Treynor and a third (I forget it's name) all have issues with them. Still, the

conceptof risk-adjusted returns is a useful one for investors to know, if not the specific measures themselves.Agreed, source of information is always important, and many people freak the fuck out when you explain the source of a statistic because they realize the source doesnt fit their criteria. You see this alot in business numbers. (Note, I am not talking about financials necessarily, just rudimentary statistics on basic business information like active/repeat customers, etc)

The "freak the fuck out" part is generally important here, because when one political side doesn't like the numbers of another political side, or they want to spin it, they first attack the source. That is why we hear "well unemployment is X, but they aren't counting people not looking for a job anymore". While this was a way we did the metric for a long time, its actually reasonable to try and look at it this way and the way we do it. If we can determine(Which I haven't seen how we came up with this number, stats built on stats) that after 18 months of being unemployed, x%(a high %) would of stopped looking for work, you are no longer competing for a job with those people if you are looking. It makes sense not to count those people, because otherwise we should probably count 15 year old on up because, hey, there jobless. But if you want to look at joblessness on the whole, or people that if jobs were available they would get off their ass and walk into it, you would want to include these people.

But in the above, it isn't about getting a metric to try and spot/solve a problem(in the straight sense), its about trying to increase a number to make another faction look as terrible as possible, when under the same circumstances they would be pushing the opposite. It isn't about being "right" or doing the right thing (except in the sense that they can think anything they touch is gold), its about winning so they "can do right".

The best thing I run into in business everyday is where someone will take a number, and just pop it into a formula and try to extrapolate data from that. And they have no idea what the source number meant, any documentation to support their finding, and as long as someone just says it is what it is, thats good enough for them. I run into this basically daily, usually trying to do it with the results from my work and they have no idea what the results meant because they just got a number instead of understanding the number. I am a bit jaded on that one.

yeah like when the news media reported about the radiation increase at Fukushima being like 100 million times higher than the acceptable dose rate, but failed to mention that the reported dose amount changed from approximately 0% to 0% and that the effect would be a cancer risk increase from like 1% to 1.05%

hey look I just manipulated statistics too!

Actually, I recently engaged in a back-and-forth email conversation with the guy who wrote the Politifact article on Jon Stewart, in which we agreed that the one study that seemed to indicate Fox News viewers were misinformed was actually itself a very misleading and confusing study that really only showed that people on both sides of the spectrum and among all media audiences will tend to choose the answer that supports their politics as the correct answer, whether it's correct or not.

Thank you for using the word "peruse" correctly.

It's fun to consider this sort of problem when considering the effectiveness of the additional security screening measures that have been implemented for air travelers.

Also on PSN: twobadcats

veryfirst thing they'll retort with is "well you can make statistics say anything!".Them "You should vote for Obama because of he is going to make everything safer by banning guns!"

Me "You know that really hasn't worked out well for areas like Washington DC, and Britain has just seen an increase in stabbings."

Them "Well my parents say we would be safer without guns, they are Statisticians who work in Washington DC and they have the numbers"

Me "Do you know what a Statistician does?"

Them "Ummm..."

Me "Ever heard the saying there are lies, damn lies, and statistics?"

Them "Are you calling my parents liars?"

At that point I wandered off laughing. 50% of the people in that conversation apparently believe everything shown to them using numbers.

The problem is likely that the person is informed enough to know that people misuse statistics all the time, but not informed enough to know what properly-used statistics actually look like. Which is understandable, because honestly, the difference between proper statistics and an obfuscatory number orgy is hard to pick out. And it doesn't help that a lot of studies that are performed, or polls that are run, are done either by unscrupulous folks who are trying to generate a specific result, or by folks who honestly don't know what the hell they're doing (but are probably

hopingfor a specific result).The upshot is that a lot of people just ignore anything expressed as a percentage, and it's hard to blame them.

I make tweet.

Since I only watch the NBA, I never encountered it in the other sports ... do they do that in football and baseball as well?

I can live with stuff like "his 4th quarter scoring average is x points" ... but when it ventures into "they are 11-0 in games where they led by 5 or more with 3 minutes or less to go" I am like "WTF? way to try and explain something by anecdotal evidence".

On the other hand, the announcers would be sitting there saying nothing for large portions of a game, if they don't have those useless statistics as a filler.

I'd love for someone to just make stuff up during a telecast. "Whenever he eats wheaties before a game and then is behind by 5 or more points, he always misses the first 5 shots in the fourth quarter" ... totally deadpan. He'd be out of a job soon, but some of that announcing stuff is so ingrained by now, it is jarring every time you hear it.

Same with those useless interviews between quarters ... "Why are you winning right now?" Can please some coach answer with "Because we are scoring more points than the other team?"

But enough ranting ... back to your regularly scheduled statistics.

The anti-vaxers and global warming deniers being prime examples.

On Hiatus!Salvador, Chimera

Some might argue for "consensus of experts" or such, but this falls into the same problem. Without painstakingly looking to see if it is in fact the consensus of experts, rather than just being reported as such, you don't know. Perhaps what you do know is that the same childhood friend who told you that the world would run out of oil before the year 2000, and that the planet would be completely deforested by 2010, is now the one telling you that climate is undergoing dramatic and damaging changes and we'll all be X by the year Y.

Which then explains why it seems everything is trying to kill you with cancer.

notbe ignorant when you're surrounded by the sort of media barrage we have. Me, I just pretty much ignore any claim that might be even the slightest bit controversial until I can look into it myself. Because the odds of the media taking something complicated and reporting on it accurately are probably 50-50 at best. And even if they get the facts right, they've probably muffed the implications.I make tweet.

It would be nice for more people to be familiar with common logical fallacies, but if we're making wishes I'd also like a unicorn.

I wasn't making a point. I was simply sincerely confused. Like, I didn't know if the variance was stated as a percentage of your expected mean return or as a percentage of your total investment or what.

Goum cleared me up and the rest of this discussion has been enlightening!

But I really wasn't trying to make an argument there.

Well, not exactly 1 in 20. It depends on the effect you're looking for and your sample size. I'm too lazy to do the math right now, but even with a p < 0.05, the possibility of a false positive gets infinitesimally small with an arbitrarily large sample size.

But your overall point is correct. Studies looking for very subtle effects in small sample sizes (which describes, unfortunately, a lot of experiments in medicine and the social sciences) have a large enough possibility of false positives that we shouldn't take a single positive result and run with it.

But plenty of them are useful thought exercises at least.

Nope. Arbitrarily large sample size simply changes the place at which a p value of .05 occurs. If you set your p-value at .05 then you are defining a false positive 5% of the time.

But note what a p-value actually means. It means, specifically that "If the null hypothesis is true, we will reject it, when we really ought not have done so 5% of the time". So it isn't quite right to say 1/20 studies will be false positives if everyone uses a 5% p-value, but "assuming the null hypothesis is true, 1/20 studies will be false positives if everyone uses a 5% p-value".

The truth is that the type 2 error probability[vs type 1, which is the p-value] is typically unknown and that fact is may be even more so problematic than the 5% critical value.

Note that the 5% critical value does mean that "If we have a bias to testing against null hypothesis that we deem as correct then we will make that mistake 5% of the time" is relatively true. And that should be worrying because a lot of people do tend to test things that we might view as true and results that don't pass the null are less likely to be printed than those that do.

It needs to be about 20% cooler.

Also on PSN: twobadcats

So I'm really thinking about seeing if i can get the charter school my kids attend to let me do a week-long lecture on this topic, for outgoing seniors. Has anyone ever tried to do anything like this?

Yeah, you're right. I wasn't thinking. It's the possibility of a false

negativethat gets smaller with a larger sample size.Confirmation and survivor bias is always a big issue with hedge funds & mutual funds. You never see the funds run by the management firm that get shut down due to poor returns. So if you're looking at only the investment funds that are still profitable, you may get the idea that the majority of the investment funds are profitable.

Steam ID:darklite_xXbox Gamertag:Darklite 37but 60% of the time those manipulations work every time!