Options

The [ECONOMY]

17980828485

Posts

  • Options
    GoumindongGoumindong Registered User regular
    I am not sure its true to the extent you're claiming. I mean sure CNN has a few doctors there, but they aren't able to do what you're asking us to do for economics.

    wbBv3fj.png
  • Options
    GoumindongGoumindong Registered User regular
    edited April 2013
    Goumindong wrote: »
    hippofant wrote: »
    I think someone missed the part of the scandal where it was revealed the paper was never even peer reviewed. In fact, other economists savaged the work when it first came out.

    It only got to this point because there was an easy to understand error (Excel fuckup) wrapped around a good news hook (intreped grad student from state school exposes Harvard eggheads). If it had been a famous academic questioning the results based on their empirical effects, the media would have shrugged and either ignored it or done a "both sides have a side" muddle of a story.

    Ar heart, this is less a story about science and more an example of how fucked the media's priorities are.

    Well, dude, how do you want this to work exactly? The typical retractions are pretty fricking hard to understand, often only after thoroughly reading the paper and related background material, never mind being uninteresting. I've noticed mathematical errors in papers before, like a negative sign where they meant a positive, or algorithm errors, where a variable went uninstantiated, but exactly how does one turn something like this into an interesting media piece? Like... the lack of cross-study multiple hypothesis correction (i.e. Bonferroni) results in at least 1 out of every 20 random studies finding that something either causes cancer or prevents it, arising from our usage of 0.05 as a significant p-value threshold. How does one explain that to a layperson in a newspaper article, without them either a) falling asleep, or b) shooting themselves in the face?

    This paper was the backbone for a global political movement that impacted millions of lives. John Stewart had clip after clip of politicians and pundits talking about this paper. The media is a multi-billion industry that employs MDs, PHDs, MBAs and JDs as writers, and has thousands more experts in the address book.

    There are lots of reasons why this didn't get covered. An inability to find someone who could understand and explain it isn't the reason.
    MD's, MBA's, and JD's aren't really qualified to comment on things like this. If you wanted to have an infrastructure to deal with papers like this you would need to employ multiple PHD's in multiple subfields of all fields in order to write a handful of papers marginally better than you could do by just asking PHD's who have other jobs (and so have to do their own research or perish)

    Edit: It is both true and bad that no one wants to spend money to pay qualified people to fact check all the science that is done. But how would we do that without a really massive government agency/bureaucracy and would such an organization not have its own issues?

    We're not talking about an obscure paper. These guys were name checked repeatedly in the US Presidential election, the British parliament, Brussels and by dozens of conservative pundits. It was THE paper for the dominant political and economic movement of the early 21st century - austerity. I'll bet that if I fired up Lexis Nexis I'd find thousands of references to the paper and its authors.

    And it took a man bites dog story for anyone to fact check it. That's more than a bit worrisome.

    No, people were trying to replicate it for a long time. This is not the first "failure to replicate" problem the paper has had, its the first "we figured out why you failed to replicate" paper combined with "and those reasons are pretty damning"

    The papers status among economics was not particularly high and many economics were not particularly shy about saying so. For example this piece (written by a non-economist but citing two economists) is pretty clearly negative of both R+R and their paper in 2011

    http://www.businessinsider.com/reinhart-and-rogoff-dangerous-debt-ceiling-2011-8

    Goumindong on
    wbBv3fj.png
  • Options
    Inquisitor77Inquisitor77 2 x Penny Arcade Fight Club Champion A fixed point in space and timeRegistered User regular
    Out of curiosity, is there actually some sort of guild-like professional association for "practicing" economists? For example, not all psychologists treat people, but the ones that do have various professional associations and credentialing they need to go through. Even industrial/social/etc. psychologists have practical professional associations before they're allowed to, say, survey a global organization. Ditto medical doctors, psychiatrists, and lawyers.

    For a science that ranks pretty high up there in terms of difficulty in running controlled, falsifiable, and verifiable experiments, it would kind of make sense to have some standards about who is and who isn't allowed to talk to powerful people about major policy decisions.

    Not because the people in question aren't qualified, but because the people in question seem to never have had the discussion in the first place.

  • Options
    GoumindongGoumindong Registered User regular
    Did you get a PHD from an accredited university? And/or employed at one? And/or published in a high ranking journal you can probably call yourself an economist. There isn't really such thing as a "practicing economist" for the stuff we are talking about. They're all PHD's and their accreditation comes from their work and degree.

    The type of work we are talking about doesn't really have an analogue to a professional organization since most economists working for a firm will either be in a consultant position or will be subject to another professions professional bodies (like if they were working as an actuary).

    The NBER is probably the closest thing to a professional organization.

    R+R are members of it and presented their paper as an NBER working paper(which are not peer reviewed iirc, and are generally presented by the NBER for review)

    wbBv3fj.png
  • Options
    SerukoSeruko Ferocious Kitten of The Farthest NorthRegistered User regular
    Out of curiosity, is there actually some sort of guild-like professional association for "practicing" economists? For example, not all psychologists treat people, but the ones that do have various professional associations and credentialing they need to go through. Even industrial/social/etc. psychologists have practical professional associations before they're allowed to, say, survey a global organization. Ditto medical doctors, psychiatrists, and lawyers.

    For a science that ranks pretty high up there in terms of difficulty in running controlled, falsifiable, and verifiable experiments, it would kind of make sense to have some standards about who is and who isn't allowed to talk to powerful people about major policy decisions.

    Not because the people in question aren't qualified, but because the people in question seem to never have had the discussion in the first place.


    Of course not.

    "How are you going to play Dota if your fingers and bitten off? You can't. That's how" -> Carnarvon
    "You can be yodeling bear without spending a dime if you get lucky." -> reVerse
    "In the grim darkness of the future, we will all be nurses catering to the whims of terrible old people." -> Hacksaw
    "In fact, our whole society will be oriented around caring for one very decrepit, very old man on total life support." -> SKFM
    I mean, the first time I met a non-white person was when this Vietnamese kid tried to break my legs but that was entirely fair because he was a centreback, not because he was a subhuman beast in some zoo ->yotes
  • Options
    The EnderThe Ender Registered User regular
    Dude, I'm a (computational) biologist. Nobody does replication studies. Seriously. It's a common topic of discussion amongst us, the fact that nobody is willing to fund studies that attempt to reproduce the results of other studies, so nobody does them, and consequently, for all we know, bad research gets through all the time. Our quality control mostly comes from people running other, similar experiments or experiments using the already-published data and finding later inconsistencies.

    ...Which is what 'replication' means, even if you're not doing a formal replication study. I mean, if you can't replicate a result, you can't create applications from it, so the research is worthless anyway (assuming you're working in a commercial / industrial capacity).


    With Love and Courage
  • Options
    hippofanthippofant ティンク Registered User regular
    The Ender wrote: »
    Dude, I'm a (computational) biologist. Nobody does replication studies. Seriously. It's a common topic of discussion amongst us, the fact that nobody is willing to fund studies that attempt to reproduce the results of other studies, so nobody does them, and consequently, for all we know, bad research gets through all the time. Our quality control mostly comes from people running other, similar experiments or experiments using the already-published data and finding later inconsistencies.

    ...Which is what 'replication' means, even if you're not doing a formal replication study. I mean, if you can't replicate a result, you can't create applications from it, so the research is worthless anyway (assuming you're working in a commercial / industrial capacity).

    Right... so... how is what happens in economics worse than what happens in biology? Didn't what just happened serve as evidence for the system working (sorta) as opposed to it not working?

  • Options
    Inquisitor77Inquisitor77 2 x Penny Arcade Fight Club Champion A fixed point in space and timeRegistered User regular
    The Ender wrote: »
    Dude, I'm a (computational) biologist. Nobody does replication studies. Seriously. It's a common topic of discussion amongst us, the fact that nobody is willing to fund studies that attempt to reproduce the results of other studies, so nobody does them, and consequently, for all we know, bad research gets through all the time. Our quality control mostly comes from people running other, similar experiments or experiments using the already-published data and finding later inconsistencies.

    ...Which is what 'replication' means, even if you're not doing a formal replication study. I mean, if you can't replicate a result, you can't create applications from it, so the research is worthless anyway (assuming you're working in a commercial / industrial capacity).


    Right. For example, in a social psychology setting, any applied usage is treated as "white paper material", and never given the same credence as a controlled experiment. You can ask 50 different companies the same survey question and pretend that the aggregate response is some sort of benchmark, but no one in any academic setting is going to treat that as anything more than cherry-picked corporate data, because the sources are too varied and uncontrolled. You see this all the time, even for hot new things like "predictive analytics" where a retail company will try to figure out what you will or won't buy - even if it works fantastically for that company, it has to be replicated in a controlled setting for it to be treated as actual research. Otherwise it's applied science, akin to engineering. Great for white papers, sharing cool new ideas, and prompting new research, but not valid in and of itself.

  • Options
    MillMill Registered User regular
    Correct me if I'm wrong, but isn't the whole point of peer review to make sure that only accurate papers get widely circulated? Isn't it also to make sure that papers that are completely pulled out of someones opinionated ass stay labeled as opinion or unproven hypothesis, while also making sure you don't have a bugs or flaws (like incorrect coding for spreadsheets) going through unnoticed. As someone with a fair bit of schooling in a soft science field, I get some stuff is wishy washy, but I'd argue that economics is one of those areas where there is a enough hard science or number related data, where there should be some peer review to make sure the numerical evidence is in order; especially, if people are going to band it about as a blueprint for public policy.

  • Options
    The EnderThe Ender Registered User regular
    Right... so... how is what happens in economics worse than what happens in biology? Didn't what just happened serve as evidence for the system working (sorta) as opposed to it not working?

    Well, okay, let's take a look at Reaganomics - something that is taught seriously in economics schools - for example. Here is one of the core observations made to substantiate the idea that the 'nanny state' only worsens the conditions of the poor / stretches the state beyond it's means to function:

    "She has eighty names, thirty addresses, twelve Social Security cards and is collecting veteran's benefits on four non-existing deceased husbands. And she is collecting Social Security on her cards. She's got Medicaid, getting food stamps, and she is collecting welfare under each of her names. Her tax-free cash income is over $150,000."

    Quoted from Reagan in '76, and a tenet that is still held as a rigorous observation to this day.

    In the old prairie town I used to live in, we might call this, "A load of bullshit," or, "A racist load of bullshit," depending on whether or not the context of the quote is also explained (Reagan demonizing african american communities in Chicago).


    This is the type of crap that economic 'theories' tend to be built around. Whatever you want to call it, it's not science. A lot of it is supposition, most of it does not include predictive statements, and without question a person's results are going to be informed by their ideology, not the other way around. Because economies are human constructs, we can manipulate them to get whatever results we want with, essentially, whatever means we want - it's not like dealing with a fixed universal constant that can't be manipulated and will be observed in the same state no matter which researcher or lab is looking at it.

    With Love and Courage
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    wow what economics schools teach that

    aRkpc.gif
  • Options
    The EnderThe Ender Registered User regular
    edited April 2013
    ronya wrote: »
    wow what economics schools teach that

    This one, for starters.


    EDIT: I mean, if economics is comparable to, say, physics or biology in terms of it's empirical nature, can you explain why the field many mutually-exclusive theoretical frameworks surrounding how economies work? Not just which are the best, but how economies fundamentally operate. Why isn't there an economics equivalent to the theory of natural selection, the big bang theory, quantum mechanics, relativity, etc?

    The Ender on
    With Love and Courage
  • Options
    ZephiranZephiran Registered User regular
    edited April 2013
    Mill wrote: »
    Correct me if I'm wrong, but isn't the whole point of peer review to make sure that only accurate papers get widely circulated? Isn't it also to make sure that papers that are completely pulled out of someones opinionated ass stay labeled as opinion or unproven hypothesis, while also making sure you don't have a bugs or flaws (like incorrect coding for spreadsheets) going through unnoticed. As someone with a fair bit of schooling in a soft science field, I get some stuff is wishy washy, but I'd argue that economics is one of those areas where there is a enough hard science or number related data, where there should be some peer review to make sure the numerical evidence is in order; especially, if people are going to band it about as a blueprint for public policy.

    Peer review is always in order. If it can't be independently checked and verified, it's not science, is the general rule of thumb. Shit, I'm a soft science Politics Minor and we're always told how important Peer Review is, and how we're supposed to use studies that've been reviewed above everything else. There's always room for Peer Review even if you claim to study a soft science.

    Zephiran on
    Alright and in this next scene all the animals have AIDS.

    I got a little excited when I saw your ship.
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    edited April 2013
    a dominant theoretical framework exists; it is variously called neoclassical, walrasian, neo-walrasian, etc. various forms of the neoclassical synthesis have prevailed since the 1937, when jr hicks wrote 'mr. keynes and the classics'. it spans all the current 'mainstream' schools of macro and micro thought, from chicago to MIT as real institutions, and the so-called 'saltwater' and 'freshwater' schools in their vague groupings. it includes the old american keynesians under paul samuelson and the cowles-commission titans of 20th century economics, the entire monetarist revolution, and the entire new keynesian consensus that dominates virtually all central banks on the planet today, save rare exceptions like argentina and such.

    so... yeah.

    it doesn't include the anglo-italian schools and their descendants - the post-keynesians, who are aligned to the political left - and it doesn't include the austrians and their descendants, aligned to the political right. but these are non-mainstream fringe schools of thought.

    it is actually kinda weird to be slapped in the face with 'chicago' nowadays, because chicago is no longer associated with monetarism inasmuch as real business cycle theory.

    e: http://web.archive.org/web/20090427073721/http://homepage.newschool.edu/het/thought.htm#neoclassical gives a good overview if you want to read up the details, but it offers a slightly different classification than I have (e.g., it groups the neo-keynesian synthesis under 'alternative' despite conceding that the neo-keynesian synthesis is pretty dang neoclassical, and it reserves 'walras' as a term to keep track of the descendants of 19th century battles).

    ronya on
    aRkpc.gif
  • Options
    YougottawannaYougottawanna Registered User regular
    The Ender wrote: »
    Right... so... how is what happens in economics worse than what happens in biology? Didn't what just happened serve as evidence for the system working (sorta) as opposed to it not working?

    Well, okay, let's take a look at Reaganomics - something that is taught seriously in economics schools - for example. Here is one of the core observations made to substantiate the idea that the 'nanny state' only worsens the conditions of the poor / stretches the state beyond it's means to function:

    "She has eighty names, thirty addresses, twelve Social Security cards and is collecting veteran's benefits on four non-existing deceased husbands. And she is collecting Social Security on her cards. She's got Medicaid, getting food stamps, and she is collecting welfare under each of her names. Her tax-free cash income is over $150,000."

    Quoted from Reagan in '76, and a tenet that is still held as a rigorous observation to this day.

    In the old prairie town I used to live in, we might call this, "A load of bullshit," or, "A racist load of bullshit," depending on whether or not the context of the quote is also explained (Reagan demonizing african american communities in Chicago).


    This is the type of crap that economic 'theories' tend to be built around. Whatever you want to call it, it's not science. A lot of it is supposition, most of it does not include predictive statements, and without question a person's results are going to be informed by their ideology, not the other way around. Because economies are human constructs, we can manipulate them to get whatever results we want with, essentially, whatever means we want - it's not like dealing with a fixed universal constant that can't be manipulated and will be observed in the same state no matter which researcher or lab is looking at it.

    The welfare queen story is a well-known steaming pile. I'd be curious how you figure that it's "held as a rigorous observation." And what economic theory is based around it?

  • Options
    enc0reenc0re Registered User regular
    Mill wrote: »
    Correct me if I'm wrong, but isn't the whole point of peer review to make sure that only accurate papers get widely circulated? Isn't it also to make sure that papers that are completely pulled out of someones opinionated ass stay labeled as opinion or unproven hypothesis, while also making sure you don't have a bugs or flaws (like incorrect coding for spreadsheets) going through unnoticed. As someone with a fair bit of schooling in a soft science field, I get some stuff is wishy washy, but I'd argue that economics is one of those areas where there is a enough hard science or number related data, where there should be some peer review to make sure the numerical evidence is in order; especially, if people are going to band it about as a blueprint for public policy.

    1. Peer reviewers are lazy, not just in economics. If there's a table of summary statistics in a submitted article, how many professors do you think replicate those from the original data? And if there's something fancier like a non-parametric or a BACE regression? If you are proving a theorem, peer reviewers may check the math. That's the best you can hope for.

    2. The R-R paper in question was not peer-reviewed. It was a working paper.

  • Options
    nexuscrawlernexuscrawler Registered User regular
    Wait is Ender implying the welfare queen thing is legit?

    It's known bullshit has been for decades. That woman never existed

  • Options
    Gnome-InterruptusGnome-Interruptus Registered User regular
    Wait is Ender implying the welfare queen thing is legit?

    It's known bullshit has been for decades. That woman never existed

    He is saying that the Welfare Queen is an accepted school of thought for certain schools of Economics.

    steam_sig.png
    MWO: Adamski
  • Options
    Jebus314Jebus314 Registered User regular
    enc0re wrote: »
    Mill wrote: »
    Correct me if I'm wrong, but isn't the whole point of peer review to make sure that only accurate papers get widely circulated? Isn't it also to make sure that papers that are completely pulled out of someones opinionated ass stay labeled as opinion or unproven hypothesis, while also making sure you don't have a bugs or flaws (like incorrect coding for spreadsheets) going through unnoticed. As someone with a fair bit of schooling in a soft science field, I get some stuff is wishy washy, but I'd argue that economics is one of those areas where there is a enough hard science or number related data, where there should be some peer review to make sure the numerical evidence is in order; especially, if people are going to band it about as a blueprint for public policy.

    1. Peer reviewers are lazy, not just in economics. If there's a table of summary statistics in a submitted article, how many professors do you think replicate those from the original data? And if there's something fancier like a non-parametric or a BACE regression? If you are proving a theorem, peer reviewers may check the math. That's the best you can hope for.

    2. The R-R paper in question was not peer-reviewed. It was a working paper.

    I'm not sure I would call it lazy. It is a massive amount of time/effort to replicate someones work, and the nature of peer review means your giving that work to someone who already has a full time job trying to figure out their own research. Peer review is not really for flushing out specific details so much as it's a sanity check. Is this person blatantly making shit up yes/no? Also peer reviewed really just means reviewed by a peer's grad student. Like me. Basically reviewing papers sucks is what I am getting at.

    I think what happened here is actually pretty much how science is supposed to work. You publish something, people work with it for a bit until the find it isn't working right. They figure out where you fucked up, fix it and add some new insights, now they can publish this new work. The problem is when you have people trying to make drastic, wide reaching, decisions based on the most cutting edge research; when in reality cutting edge research is some dudes kinda sorta thought out idea that hasn't been proven wrong yet.

    "The world is a mess, and I just need to rule it" - Dr Horrible
  • Options
    enc0reenc0re Registered User regular
    edited April 2013
    I understand why the system is how is it is. However, let's design Science! from scratch here for a second. This is what I would consider ideal.

    Researcher writes paper based on data X and method Y. He then has to package data X and method Y into a script that is submitted along with the paper to peer reviewers. Each reviewer checks X, checks Y, and then runs Y on X; along with reading the paper and making sure it all matches.

    As far as I am concerned the fact that data and code aren't an integral part of scientific articles is a major weakness to how we currently do it.

    enc0re on
  • Options
    Inquisitor77Inquisitor77 2 x Penny Arcade Fight Club Champion A fixed point in space and timeRegistered User regular
    Don't want to derail the thread too much, but given recent discussion: http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html

    Psychology, as one of the social sciences, runs into its own fair share of problems, even when attempting to validate "reproducible" experiments.

  • Options
    nexuscrawlernexuscrawler Registered User regular
    Wait is Ender implying the welfare queen thing is legit?

    It's known bullshit has been for decades. That woman never existed

    He is saying that the Welfare Queen is an accepted school of thought for certain schools of Economics.

    It isn't. It is political theater masquerading as a Economic idea. It has zilch to do with actual economics and everything to do with racial fear and class warfare against the poor.

  • Options
    Jebus314Jebus314 Registered User regular
    enc0re wrote: »
    I understand why the system is how is it is. However, let's design Science! from scratch here for a second. This is what I would consider ideal.

    Researcher writes paper based on data X and method Y. He then has to package data X and method Y into a script that is submitted along with the paper to peer reviewers. Each reviewer checks X, checks Y, and then runs Y on X; along with reading the paper and making sure it all matches.

    As far as I am concerned the fact that data and code aren't an integral part of scientific articles is a major weakness to how we currently do it.

    To be fair, all of the information necessary to replicate the work is an integral part of scientific articles. It's just that no one wants to see your 1000's of lines of code or gigantic spreadsheets. What I want is a concise summary of what you found, with enough information that I can double check the results if I was so inclined. Research is based on code that only outputs numbers. It does me no good to take your code wholesale, press run and see that, yep I do get the same thing. If I want to check your work I have to go line by line, through every step, and decide whether or not you've done everything correctly. It's just not feasible to check everyone's work.

    The great thing is that you usually don't have to. Correct and incorrect work gets published and used for further research. New research based on the correct work does what you expect it to and things keep progressing without the need for massive oversight. While research based on incorrect work tends to lead to problems which leads to deeper investigations anyway. You could even argue that's what happened here. We had some bad research, we tried to apply it, which lead to massive economic problems. That seems like pretty good justification for going back and reviewing said research, but instead we get one side who says damn the evidence this paper agrees with my philosophies so it must be perfect, and another side who says that it doesn't even matter if the paper was right it's not conclusive proof.

    "The world is a mess, and I just need to rule it" - Dr Horrible
  • Options
    GoumindongGoumindong Registered User regular
    Don't want to derail the thread too much, but given recent discussion: http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html

    Psychology, as one of the social sciences, runs into its own fair share of problems, even when attempting to validate "reproducible" experiments.

    Reading that i find it funny that he probably would not have been caught if he had just used a computer to generate the data sets.

    wbBv3fj.png
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    edited April 2013
    nobody wants to let other people access the gigantic spreadsheets. skeptics always do yell "just give me the *.CSVs", it is disingenuous to pretend that nobody wants to see the 1000 lines of code.

    manipulation might be easiest at that point - it is easy to leave a paper trail that you've run an experiment, it is harder to detect whether you fooled around in the statistical analysis. in the case of suspected misconduct, it is crucial that someone does take the code wholesale and press run to see that you get the graphs, because there is where the analysis is most vulnerable to deliberate censorship, and the supposedly unprocessed noise-filled raw data most easily subject to tests for manipulation.

    for good or ill, the model of scientific research in almost all lab-driven work is to give each lab ownership of the raw data, because it's really expensive to collect the data. If labs could simply access each other's empirical work, nobody would do the tiresome experiments and everyone would want to do the statistical analysis.

    the present pressure for, say, climate research to upload all their raw data in a publicly-interpretable format is actually very rare amongst sciences! this is why initial AGW denier requests for "raw data" were met with "fuck off, we don't owe you shit". and this is still the case in other politicized sciences, e.g., the Lenski creationist-request-for-"raw-data" affair. Because this was exactly what was considered reasonable amongst almost all sciences. Drilling for ice cores in the Arctic is expensive and difficult, but running the math later is not.

    academics and laymen use "raw data" differently... if I were called upon to worry about R&R's "raw data" before this broke out, I would have worried about how R&R compiled and regularized public statistics into a comparable form to begin with, not how they they ran their numbers on this data. The latter is supposed to be so simple that a competent, non-malicious researcher doesn't mess it up. The typical layman concern - easily seen in virtually all public discussion of the fiasco - exactly reverses these worries. They assume that the numbers on the spreadsheet are sacrosanct but the transformation from spreadsheet to conclusion is most doubtful.

    ronya on
    aRkpc.gif
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    The key to why Stapel got away with his fabrications for so long lies in his keen understanding of the sociology of his field. “I didn’t do strange stuff, I never said let’s do an experiment to show that the earth is flat,” he said. “I always checked — this may be by a cunning manipulative mind — that the experiment was reasonable, that it followed from the research that had come before, that it was just this extra step that everybody was waiting for.” He always read the research literature extensively to generate his hypotheses. “So that it was believable and could be argued that this was the only logical thing you would find,” he said. “Everybody wants you to be novel and creative, but you also need to be truthful and likely. You need to be able to say that this is completely new and exciting, but it’s very likely given what we know so far.”

    ha.

    aRkpc.gif
  • Options
    enc0reenc0re Registered User regular
    Jebus314 wrote: »
    To be fair, all of the information necessary to replicate the work is an integral part of scientific articles. It's just that no one wants to see your 1000's of lines of code or gigantic spreadsheets. What I want is a concise summary of what you found, with enough information that I can double check the results if I was so inclined. Research is based on code that only outputs numbers. It does me no good to take your code wholesale, press run and see that, yep I do get the same thing. If I want to check your work I have to go line by line, through every step, and decide whether or not you've done everything correctly. It's just not feasible to check everyone's work.

    The great thing is that you usually don't have to. Correct and incorrect work gets published and used for further research. New research based on the correct work does what you expect it to and things keep progressing without the need for massive oversight. While research based on incorrect work tends to lead to problems which leads to deeper investigations anyway. You could even argue that's what happened here. We had some bad research, we tried to apply it, which lead to massive economic problems. That seems like pretty good justification for going back and reviewing said research, but instead we get one side who says damn the evidence this paper agrees with my philosophies so it must be perfect, and another side who says that it doesn't even matter if the paper was right it's not conclusive proof.

    Graduate students! I would argue that running the code in the first year and in later years checking it for errors would make for very valuable training and give students an opportunity to contribute and get something published. Ironically, it was of course exactly the case of a grad student exercise that let to the R-R thing being discovered.

    Only that poor guy had to try to replicate the work 'from scratch' a whole bunch first. It wasn't until R-R sent him their spreadsheet (code and data!) that he could find their mistake.

  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    I tried replicating and then extending someone's work last year, as my undergrad thesis

    I am now of the opinion that even in fields of economics where the data is entirely synthetic - 100% simulation - replication is simply not something that is readily supported. it was rare to find papers that even fully described the whole simulation, never mind the dream of running code (haha, you wish).

    aRkpc.gif
  • Options
    enc0reenc0re Registered User regular
    edited April 2013
    Would you agree then that for great science, code and data should come as a bundle with the paper as a published work these days? (Let's conveniently ignore the issue of proprietary data for a second. A lot of econ is done on public stuff.)

    enc0re on
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    edited April 2013
    I think that would be ideal. Acemoglu is great about this, you can pick up the stata do files off his site. It doesn't list all his publications, but it sure lists a lot of them.

    But, hell, certain names were screaming when the AEA dared to put forth the notion that researchers should have to disclose conflicts of interest, so...

    ronya on
    aRkpc.gif
  • Options
    Harry DresdenHarry Dresden Registered User regular
    edited April 2013
    ronya wrote: »
    wow what economics schools teach that

    They also do weird shit like this.

    http://www.bloomberg.com/news/2011-05-05/schools-find-ayn-rand-can-t-be-shrugged-as-donors-build-courses.html

    Fuck university donors like this. This shit should be illegal.

    Harry Dresden on
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    ronya wrote: »
    wow what economics schools teach that

    They also do weird shit like this.

    http://www.bloomberg.com/news/2011-05-05/schools-find-ayn-rand-can-t-be-shrugged-as-donors-build-courses.html

    Fuck university donors like this. This shit should be illegal.

    wow
    In one of the more ambitious demands made by a donor, hedge-fund manager Jim Simons tried to use his pledge to change tuition practices within the entire State University of New York system. In July, Simons’s pledge of $150 million to SUNY’s Stony Brook campus seemed like a life buoy thrown to a drowning institution.

    Tuition Law

    SUNY was facing $210 million in budget reductions. Before writing the check, Simons, 73, the founder of Renaissance Technologies LLC, demanded that the state legislature pass a law allowing the 64 SUNY campuses to set their own tuition for the purpose of reducing their dependence on state aid. The legislature rejected the proposal in August.

    "you should have less state aid, so you can be bent more to my own conditions on aid!"

    aRkpc.gif
  • Options
    Jebus314Jebus314 Registered User regular
    edited April 2013
    ronya wrote: »
    nobody wants to let other people access the gigantic spreadsheets. skeptics always do yell "just give me the *.CSVs", it is disingenuous to pretend that nobody wants to see the 1000 lines of code.

    manipulation might be easiest at that point - it is easy to leave a paper trail that you've run an experiment, it is harder to detect whether you fooled around in the statistical analysis. in the case of suspected misconduct, it is crucial that someone does take the code wholesale and press run to see that you get the graphs, because there is where the analysis is most vulnerable to deliberate censorship, and the supposedly unprocessed noise-filled raw data most easily subject to tests for manipulation.

    for good or ill, the model of scientific research in almost all lab-driven work is to give each lab ownership of the raw data, because it's really expensive to collect the data. If labs could simply access each other's empirical work, nobody would do the tiresome experiments and everyone would want to do the statistical analysis.

    the present pressure for, say, climate research to upload all their raw data in a publicly-interpretable format is actually very rare amongst sciences! this is why initial AGW denier requests for "raw data" were met with "fuck off, we don't owe you shit". and this is still the case in other politicized sciences, e.g., the Lenski creationist-request-for-"raw-data" affair. Because this was exactly what was considered reasonable amongst almost all sciences. Drilling for ice cores in the Arctic is expensive and difficult, but running the math later is not.

    academics and laymen use "raw data" differently... if I were called upon to worry about R&R's "raw data" before this broke out, I would have worried about how R&R compiled and regularized public statistics into a comparable form to begin with, not how they they ran their numbers on this data. The latter is supposed to be so simple that a competent, non-malicious researcher doesn't mess it up. The typical layman concern - easily seen in virtually all public discussion of the fiasco - exactly reverses these worries. They assume that the numbers on the spreadsheet are sacrosanct but the transformation from spreadsheet to conclusion is most doubtful.

    I don't think the bolded point is true at all. The whole point of publication is to release your raw data. What would be the point of a publication where you made a bunch of conclusions without showing any data to back them up? Obviously you wouldn't want to release the data before you published, as that would be silly, but once you published, the data is already out there. In my experience people are more than happy to give you the data for research that has already been published, as the only thing they're really doing is giving you a more exact representation than you would get from just extracting the data from the publication. Even in the case where not all the data is published for the sake of brevity, I don't think there is a motivation for labs to keep the data secret. The examples you posted seem to fall more in the category of not handing out unpublished data, which makes sense. Maybe that's unique to my field though which is more engineering/chemistry based.

    Also, I didn't mean to imply that no-one would want to see the code behind the research so to speak. It just isn't what you want to present in a publication, nor should it be. If there are errors obvious enough to notice without spending large amounts of time replicating the work you wouldn't need the code. As for the small errors, the information given in the paper should be enough to reconstruct the work and determine if there are any problems. Which is generally what happens anyway when researches want to continue or expand upon the published work. I was just trying to point out that using the code as part of the review process is most likely impossible as it would take way to long to review anything.

    edit- Actually, the more I think about it, the more I think materials type research is just fundamentally different than economic type research. If I run an experiment on a core sample from the antarctic, there's really only 1 way to analyze it. You run say x-ray diffraction, and you get 1 spectra. That's it. However, it feels like if I run say a study where I collect data on people's incomes, as well as a bunch of other factors that might be related, then there would be lots of analysis you could do with that single data set. Which may make you more hesitant to release the raw data even after publications.

    Jebus314 on
    "The world is a mess, and I just need to rule it" - Dr Horrible
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    Jebus314 wrote: »
    ronya wrote: »
    nobody wants to let other people access the gigantic spreadsheets. skeptics always do yell "just give me the *.CSVs", it is disingenuous to pretend that nobody wants to see the 1000 lines of code.

    manipulation might be easiest at that point - it is easy to leave a paper trail that you've run an experiment, it is harder to detect whether you fooled around in the statistical analysis. in the case of suspected misconduct, it is crucial that someone does take the code wholesale and press run to see that you get the graphs, because there is where the analysis is most vulnerable to deliberate censorship, and the supposedly unprocessed noise-filled raw data most easily subject to tests for manipulation.

    for good or ill, the model of scientific research in almost all lab-driven work is to give each lab ownership of the raw data, because it's really expensive to collect the data. If labs could simply access each other's empirical work, nobody would do the tiresome experiments and everyone would want to do the statistical analysis.

    the present pressure for, say, climate research to upload all their raw data in a publicly-interpretable format is actually very rare amongst sciences! this is why initial AGW denier requests for "raw data" were met with "fuck off, we don't owe you shit". and this is still the case in other politicized sciences, e.g., the Lenski creationist-request-for-"raw-data" affair. Because this was exactly what was considered reasonable amongst almost all sciences. Drilling for ice cores in the Arctic is expensive and difficult, but running the math later is not.

    academics and laymen use "raw data" differently... if I were called upon to worry about R&R's "raw data" before this broke out, I would have worried about how R&R compiled and regularized public statistics into a comparable form to begin with, not how they they ran their numbers on this data. The latter is supposed to be so simple that a competent, non-malicious researcher doesn't mess it up. The typical layman concern - easily seen in virtually all public discussion of the fiasco - exactly reverses these worries. They assume that the numbers on the spreadsheet are sacrosanct but the transformation from spreadsheet to conclusion is most doubtful.

    I don't think the bolded point is true at all. The whole point of publication is to release your raw data. What would be the point of a publication where you made a bunch of conclusions without showing any data to back them up? Obviously you wouldn't want to release the data before you published, as that would be silly, but once you published, the data is already out there. In my experience people are more than happy to give you the data for research that has already been published, as the only thing they're really doing is giving you a more exact representation than you would get from just extracting the data from the publication. Even in the case where not all the data is published for the sake of brevity, I don't think there is a motivation for labs to keep the data secret. The examples you posted seem to fall more in the category of not handing out unpublished data, which makes sense. Maybe that's unique to my field though which is more engineering/chemistry based.

    Also, I didn't mean to imply that no-one would want to see the code behind the research so to speak. It just isn't what you want to present in a publication, nor should it be. If there are errors obvious enough to notice without spending large amounts of time replicating the work you wouldn't need the code. As for the small errors, the information given in the paper should be enough to reconstruct the work and determine if there are any problems. Which is generally what happens anyway when researches want to continue or expand upon the published work. I was just trying to point out that using the code as part of the review process is most likely impossible as it would take way to long to review anything.

    edit- Actually, the more I think about it, the more I think materials type research is just fundamentally different than economic type research. If I run an experiment on a core sample from the antarctic, there's really only 1 way to analyze it. You run say x-ray diffraction, and you get 1 spectra. That's it. However, it feels like if I run say a study where I collect data on people's incomes, as well as a bunch of other factors that might be related, then there would be lots of analysis you could do with that single data set. Which may make you more hesitant to release the raw data even after publications.

    are you a fellow academic? you do realize that when you mention the word "raw data", most non-academics interpret this as your unprocessed spreadsheet, not your wonderful regression numbers, which no doubt have very good t-values. "We obtained a value for θ of -6.31" is not "raw data" in the normal sense of the words. It is replicable - but only if someone else is willing to pay the very high barrier of entry to redo all of your work, when what may be really be suspected is a fiddling of the numbers coughed up by standardized equipment and process.

    If you work in an actual wet lab, you probably have a paper notebook or such where you write down all any quick readings or observations when you're working in the fume hood. In that it case it would be the contents of that notebook that is being held as suspicious. Did you really write down anything at all? Did you write down a lot of stuff and then handpick results?

    It is highly implausible that you are colluding with the manufacturer of the crystallography machine to make it produce data files of a readily manipulable nature. But it is all too plausible that the analysis software interprets the spectra and reports a peak at -5.31, and then you write down -6.31 in the paper. There's no nice way to put it: the charge here is that you are lying.

    And the problems created by your criterion for when people should be entitled to see the details of your work are neatly highlighted by the Stapel affair above, where the misconduct was carefully tuned to only produce expected results.

    aRkpc.gif
  • Options
    Jebus314Jebus314 Registered User regular
    edited April 2013
    ronya wrote: »
    Jebus314 wrote: »
    ronya wrote: »
    nobody wants to let other people access the gigantic spreadsheets. skeptics always do yell "just give me the *.CSVs", it is disingenuous to pretend that nobody wants to see the 1000 lines of code.

    manipulation might be easiest at that point - it is easy to leave a paper trail that you've run an experiment, it is harder to detect whether you fooled around in the statistical analysis. in the case of suspected misconduct, it is crucial that someone does take the code wholesale and press run to see that you get the graphs, because there is where the analysis is most vulnerable to deliberate censorship, and the supposedly unprocessed noise-filled raw data most easily subject to tests for manipulation.

    for good or ill, the model of scientific research in almost all lab-driven work is to give each lab ownership of the raw data, because it's really expensive to collect the data. If labs could simply access each other's empirical work, nobody would do the tiresome experiments and everyone would want to do the statistical analysis.

    the present pressure for, say, climate research to upload all their raw data in a publicly-interpretable format is actually very rare amongst sciences! this is why initial AGW denier requests for "raw data" were met with "fuck off, we don't owe you shit". and this is still the case in other politicized sciences, e.g., the Lenski creationist-request-for-"raw-data" affair. Because this was exactly what was considered reasonable amongst almost all sciences. Drilling for ice cores in the Arctic is expensive and difficult, but running the math later is not.

    academics and laymen use "raw data" differently... if I were called upon to worry about R&R's "raw data" before this broke out, I would have worried about how R&R compiled and regularized public statistics into a comparable form to begin with, not how they they ran their numbers on this data. The latter is supposed to be so simple that a competent, non-malicious researcher doesn't mess it up. The typical layman concern - easily seen in virtually all public discussion of the fiasco - exactly reverses these worries. They assume that the numbers on the spreadsheet are sacrosanct but the transformation from spreadsheet to conclusion is most doubtful.

    I don't think the bolded point is true at all. The whole point of publication is to release your raw data. What would be the point of a publication where you made a bunch of conclusions without showing any data to back them up? Obviously you wouldn't want to release the data before you published, as that would be silly, but once you published, the data is already out there. In my experience people are more than happy to give you the data for research that has already been published, as the only thing they're really doing is giving you a more exact representation than you would get from just extracting the data from the publication. Even in the case where not all the data is published for the sake of brevity, I don't think there is a motivation for labs to keep the data secret. The examples you posted seem to fall more in the category of not handing out unpublished data, which makes sense. Maybe that's unique to my field though which is more engineering/chemistry based.

    Also, I didn't mean to imply that no-one would want to see the code behind the research so to speak. It just isn't what you want to present in a publication, nor should it be. If there are errors obvious enough to notice without spending large amounts of time replicating the work you wouldn't need the code. As for the small errors, the information given in the paper should be enough to reconstruct the work and determine if there are any problems. Which is generally what happens anyway when researches want to continue or expand upon the published work. I was just trying to point out that using the code as part of the review process is most likely impossible as it would take way to long to review anything.

    edit- Actually, the more I think about it, the more I think materials type research is just fundamentally different than economic type research. If I run an experiment on a core sample from the antarctic, there's really only 1 way to analyze it. You run say x-ray diffraction, and you get 1 spectra. That's it. However, it feels like if I run say a study where I collect data on people's incomes, as well as a bunch of other factors that might be related, then there would be lots of analysis you could do with that single data set. Which may make you more hesitant to release the raw data even after publications.

    are you a fellow academic? you do realize that when you mention the word "raw data", most non-academics interpret this as your unprocessed spreadsheet, not your wonderful regression numbers, which no doubt have very good t-values. "We obtained a value for θ of -6.31" is not "raw data" in the normal sense of the words. It is replicable - but only if someone else is willing to pay the very high barrier of entry to redo all of your work, when what may be really be suspected is a fiddling of the numbers coughed up by standardized equipment and process.

    If you work in an actual wet lab, you probably have a paper notebook or such where you write down all any quick readings or observations when you're working in the fume hood. In that it case it would be the contents of that notebook that is being held as suspicious. Did you really write down anything at all? Did you write down a lot of stuff and then handpick results?

    It is highly implausible that you are colluding with the manufacturer of the crystallography machine to make it produce data files of a readily manipulable nature. But it is all too plausible that the analysis software interprets the spectra and reports a peak at -5.31, and then you write down -6.31 in the paper. There's no nice way to put it: the charge here is that you are lying.

    And the problems created by your criterion for when people should be entitled to see the details of your work are neatly highlighted by the Stapel affair above, where the misconduct was carefully tuned to only produce expected results.

    Your assuming though that the intent is to mislead. Obviously if I am trying to pull a fast one then I will be hesitant to let you see the proof that I lied to you. For a crystallographic spectra, I would assume the expectation is that you show it in your publication. You don't normally see someone reporting results for experimentation like that without just reporting the actual spectra, because there's no reason to exclude it. It's compact, and makes my case that much better because you can see the actual result rather than having to believe me.

    As for the notebook, I would assume that yes I am giving you all the information that is in my notebook. If I feel as though the data isn't supported by the totality of my findings then it would be wrong for me to ignore certain aspects and just report the things that seem to agree. I don't think that's the norm, I think that is bad science. That isn't to say that you wont have to pick representative spectra some times, at which point you will probably pick the most flattering one. But again, in my experience people are more than willing to share the rest of their data because it either all fits within their conclusions or they wouldn't have published those results.

    I just don't think you can take the approach of assuming researchers are lying. The sheer volume of work in any field that is constantly being produced means there is simply not enough time to thoroughly review it all to the point of complete replication, even if you had literally everything that the original researcher had. You have to operate from a standpoint of assuming the researcher is accurately reporting their findings and let future researchers determine whether or not the work correctly predicts future observations.

    edit- I think maybe I skipped over one of your points though. It does seem like you could run into difficulty asking a researcher for data, where the implication is that you believe they have done something wrong. Not because they are trying to protect their intellectual property, but because they have egos and don't like being doubted.

    Jebus314 on
    "The world is a mess, and I just need to rule it" - Dr Horrible
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    edited April 2013
    Jebus314 wrote: »
    ronya wrote: »
    Jebus314 wrote: »
    ronya wrote: »
    nobody wants to let other people access the gigantic spreadsheets. skeptics always do yell "just give me the *.CSVs", it is disingenuous to pretend that nobody wants to see the 1000 lines of code.

    manipulation might be easiest at that point - it is easy to leave a paper trail that you've run an experiment, it is harder to detect whether you fooled around in the statistical analysis. in the case of suspected misconduct, it is crucial that someone does take the code wholesale and press run to see that you get the graphs, because there is where the analysis is most vulnerable to deliberate censorship, and the supposedly unprocessed noise-filled raw data most easily subject to tests for manipulation.

    for good or ill, the model of scientific research in almost all lab-driven work is to give each lab ownership of the raw data, because it's really expensive to collect the data. If labs could simply access each other's empirical work, nobody would do the tiresome experiments and everyone would want to do the statistical analysis.

    the present pressure for, say, climate research to upload all their raw data in a publicly-interpretable format is actually very rare amongst sciences! this is why initial AGW denier requests for "raw data" were met with "fuck off, we don't owe you shit". and this is still the case in other politicized sciences, e.g., the Lenski creationist-request-for-"raw-data" affair. Because this was exactly what was considered reasonable amongst almost all sciences. Drilling for ice cores in the Arctic is expensive and difficult, but running the math later is not.

    academics and laymen use "raw data" differently... if I were called upon to worry about R&R's "raw data" before this broke out, I would have worried about how R&R compiled and regularized public statistics into a comparable form to begin with, not how they they ran their numbers on this data. The latter is supposed to be so simple that a competent, non-malicious researcher doesn't mess it up. The typical layman concern - easily seen in virtually all public discussion of the fiasco - exactly reverses these worries. They assume that the numbers on the spreadsheet are sacrosanct but the transformation from spreadsheet to conclusion is most doubtful.

    I don't think the bolded point is true at all. The whole point of publication is to release your raw data. What would be the point of a publication where you made a bunch of conclusions without showing any data to back them up? Obviously you wouldn't want to release the data before you published, as that would be silly, but once you published, the data is already out there. In my experience people are more than happy to give you the data for research that has already been published, as the only thing they're really doing is giving you a more exact representation than you would get from just extracting the data from the publication. Even in the case where not all the data is published for the sake of brevity, I don't think there is a motivation for labs to keep the data secret. The examples you posted seem to fall more in the category of not handing out unpublished data, which makes sense. Maybe that's unique to my field though which is more engineering/chemistry based.

    Also, I didn't mean to imply that no-one would want to see the code behind the research so to speak. It just isn't what you want to present in a publication, nor should it be. If there are errors obvious enough to notice without spending large amounts of time replicating the work you wouldn't need the code. As for the small errors, the information given in the paper should be enough to reconstruct the work and determine if there are any problems. Which is generally what happens anyway when researches want to continue or expand upon the published work. I was just trying to point out that using the code as part of the review process is most likely impossible as it would take way to long to review anything.

    edit- Actually, the more I think about it, the more I think materials type research is just fundamentally different than economic type research. If I run an experiment on a core sample from the antarctic, there's really only 1 way to analyze it. You run say x-ray diffraction, and you get 1 spectra. That's it. However, it feels like if I run say a study where I collect data on people's incomes, as well as a bunch of other factors that might be related, then there would be lots of analysis you could do with that single data set. Which may make you more hesitant to release the raw data even after publications.

    are you a fellow academic? you do realize that when you mention the word "raw data", most non-academics interpret this as your unprocessed spreadsheet, not your wonderful regression numbers, which no doubt have very good t-values. "We obtained a value for θ of -6.31" is not "raw data" in the normal sense of the words. It is replicable - but only if someone else is willing to pay the very high barrier of entry to redo all of your work, when what may be really be suspected is a fiddling of the numbers coughed up by standardized equipment and process.

    If you work in an actual wet lab, you probably have a paper notebook or such where you write down all any quick readings or observations when you're working in the fume hood. In that it case it would be the contents of that notebook that is being held as suspicious. Did you really write down anything at all? Did you write down a lot of stuff and then handpick results?

    It is highly implausible that you are colluding with the manufacturer of the crystallography machine to make it produce data files of a readily manipulable nature. But it is all too plausible that the analysis software interprets the spectra and reports a peak at -5.31, and then you write down -6.31 in the paper. There's no nice way to put it: the charge here is that you are lying.

    And the problems created by your criterion for when people should be entitled to see the details of your work are neatly highlighted by the Stapel affair above, where the misconduct was carefully tuned to only produce expected results.

    Your assuming though that the intent is to mislead. Obviously if I am trying to pull a fast one then I will be hesitant to let you see the proof that I lied to you. For a crystallographic spectra, I would assume the expectation is that you show it in your publication. You don't normally see someone reporting results for experimentation like that without just reporting the actual spectra, because there's no reason to exclude it. It's compact, and makes my case that much better because you can see the actual result rather than having to believe me.

    As for the notebook, I would assume that yes I am giving you all the information that is in my notebook. If I feel as though the data isn't supported by the totality of my findings then it would be wrong for me to ignore certain aspects and just report the things that seem to agree. I don't think that's the norm, I think that is bad science. That isn't to say that you wont have to pick representative spectra some times, at which point you will probably pick the most flattering one. But again, in my experience people are more than willing to share the rest of their data because it either all fits within their conclusions or they wouldn't have published those results.

    I just don't think you can take the approach of assuming researchers are lying. The sheer volume of work in any field that is constantly being produced means there is simply not enough time to thoroughly review it all to the point of complete replication, even if you had literally everything that the original researcher had. You have to operate from a standpoint of assuming the researcher is accurately reporting their findings and let future researchers determine whether or not the work correctly predicts future observations.

    And the point being made here is that, well, no, in such-and-such field there is simply too much suggestion of misconduct and bad faith that we want the paper trail to be included with the published work, so that you can't claim in three year's time that you've lost the original logged data, should your work happen to suddenly attract interest.

    At that point, reverse-engineering the data series that gives the graphed spectrum from the printed graph would not be able to be tested with, say, Benford's law of digit distribution. So the data generated even before you have what you call "raw data" must be archived as well.

    Naturally you would know whether you are lying, but who else would know?

    There's a reason why climate science got blindsided by the rage and fury of the AGW denialist crowd, because they started with a view of data disclosure very similar to yours ("the data is in the paper! The data is in the paper!"), and only relatively recently have begun acting more like economists and having their entire time series logged onto public archives. The nature of politicization is that your opponents automatically suspect you of bad conduct. Economics is, naturally, almost always politicized.

    ronya on
    aRkpc.gif
  • Options
    Jebus314Jebus314 Registered User regular
    ronya wrote: »
    Jebus314 wrote: »
    ronya wrote: »
    Jebus314 wrote: »
    ronya wrote: »
    nobody wants to let other people access the gigantic spreadsheets. skeptics always do yell "just give me the *.CSVs", it is disingenuous to pretend that nobody wants to see the 1000 lines of code.

    manipulation might be easiest at that point - it is easy to leave a paper trail that you've run an experiment, it is harder to detect whether you fooled around in the statistical analysis. in the case of suspected misconduct, it is crucial that someone does take the code wholesale and press run to see that you get the graphs, because there is where the analysis is most vulnerable to deliberate censorship, and the supposedly unprocessed noise-filled raw data most easily subject to tests for manipulation.

    for good or ill, the model of scientific research in almost all lab-driven work is to give each lab ownership of the raw data, because it's really expensive to collect the data. If labs could simply access each other's empirical work, nobody would do the tiresome experiments and everyone would want to do the statistical analysis.

    the present pressure for, say, climate research to upload all their raw data in a publicly-interpretable format is actually very rare amongst sciences! this is why initial AGW denier requests for "raw data" were met with "fuck off, we don't owe you shit". and this is still the case in other politicized sciences, e.g., the Lenski creationist-request-for-"raw-data" affair. Because this was exactly what was considered reasonable amongst almost all sciences. Drilling for ice cores in the Arctic is expensive and difficult, but running the math later is not.

    academics and laymen use "raw data" differently... if I were called upon to worry about R&R's "raw data" before this broke out, I would have worried about how R&R compiled and regularized public statistics into a comparable form to begin with, not how they they ran their numbers on this data. The latter is supposed to be so simple that a competent, non-malicious researcher doesn't mess it up. The typical layman concern - easily seen in virtually all public discussion of the fiasco - exactly reverses these worries. They assume that the numbers on the spreadsheet are sacrosanct but the transformation from spreadsheet to conclusion is most doubtful.

    I don't think the bolded point is true at all. The whole point of publication is to release your raw data. What would be the point of a publication where you made a bunch of conclusions without showing any data to back them up? Obviously you wouldn't want to release the data before you published, as that would be silly, but once you published, the data is already out there. In my experience people are more than happy to give you the data for research that has already been published, as the only thing they're really doing is giving you a more exact representation than you would get from just extracting the data from the publication. Even in the case where not all the data is published for the sake of brevity, I don't think there is a motivation for labs to keep the data secret. The examples you posted seem to fall more in the category of not handing out unpublished data, which makes sense. Maybe that's unique to my field though which is more engineering/chemistry based.

    Also, I didn't mean to imply that no-one would want to see the code behind the research so to speak. It just isn't what you want to present in a publication, nor should it be. If there are errors obvious enough to notice without spending large amounts of time replicating the work you wouldn't need the code. As for the small errors, the information given in the paper should be enough to reconstruct the work and determine if there are any problems. Which is generally what happens anyway when researches want to continue or expand upon the published work. I was just trying to point out that using the code as part of the review process is most likely impossible as it would take way to long to review anything.

    edit- Actually, the more I think about it, the more I think materials type research is just fundamentally different than economic type research. If I run an experiment on a core sample from the antarctic, there's really only 1 way to analyze it. You run say x-ray diffraction, and you get 1 spectra. That's it. However, it feels like if I run say a study where I collect data on people's incomes, as well as a bunch of other factors that might be related, then there would be lots of analysis you could do with that single data set. Which may make you more hesitant to release the raw data even after publications.

    are you a fellow academic? you do realize that when you mention the word "raw data", most non-academics interpret this as your unprocessed spreadsheet, not your wonderful regression numbers, which no doubt have very good t-values. "We obtained a value for θ of -6.31" is not "raw data" in the normal sense of the words. It is replicable - but only if someone else is willing to pay the very high barrier of entry to redo all of your work, when what may be really be suspected is a fiddling of the numbers coughed up by standardized equipment and process.

    If you work in an actual wet lab, you probably have a paper notebook or such where you write down all any quick readings or observations when you're working in the fume hood. In that it case it would be the contents of that notebook that is being held as suspicious. Did you really write down anything at all? Did you write down a lot of stuff and then handpick results?

    It is highly implausible that you are colluding with the manufacturer of the crystallography machine to make it produce data files of a readily manipulable nature. But it is all too plausible that the analysis software interprets the spectra and reports a peak at -5.31, and then you write down -6.31 in the paper. There's no nice way to put it: the charge here is that you are lying.

    And the problems created by your criterion for when people should be entitled to see the details of your work are neatly highlighted by the Stapel affair above, where the misconduct was carefully tuned to only produce expected results.

    Your assuming though that the intent is to mislead. Obviously if I am trying to pull a fast one then I will be hesitant to let you see the proof that I lied to you. For a crystallographic spectra, I would assume the expectation is that you show it in your publication. You don't normally see someone reporting results for experimentation like that without just reporting the actual spectra, because there's no reason to exclude it. It's compact, and makes my case that much better because you can see the actual result rather than having to believe me.

    As for the notebook, I would assume that yes I am giving you all the information that is in my notebook. If I feel as though the data isn't supported by the totality of my findings then it would be wrong for me to ignore certain aspects and just report the things that seem to agree. I don't think that's the norm, I think that is bad science. That isn't to say that you wont have to pick representative spectra some times, at which point you will probably pick the most flattering one. But again, in my experience people are more than willing to share the rest of their data because it either all fits within their conclusions or they wouldn't have published those results.

    I just don't think you can take the approach of assuming researchers are lying. The sheer volume of work in any field that is constantly being produced means there is simply not enough time to thoroughly review it all to the point of complete replication, even if you had literally everything that the original researcher had. You have to operate from a standpoint of assuming the researcher is accurately reporting their findings and let future researchers determine whether or not the work correctly predicts future observations.

    And the point being made here is that, well, no, in such-and-such field there is simply too much suggestion of misconduct and bad faith that we want the paper trail to be included with the published work, so that you can't claim in three year's time that you've lost the original logged data, should your work happen to suddenly attract interest.

    At that point, reverse-engineering the data series that gives the graphed spectrum from the printed graph would not be able to be tested with, say, Benford's law of digit distribution. So the data generated even before you have what you call "raw data" must be archived as well.

    Naturally you would know whether you are lying, but who else would know?

    But what's the point? If I am going to lie about my data, than I can just make up the "raw" data as well. There's no reason for me to limit myself to manipulation of the regression values, as I can fairly easily go back to the original spreadsheets and change the entered values. There simply isn't a very good way of ensuring that research isn't a total fabrication through peer review. Instead you have to rely on other researches to confirm or deny that your results line up with the data that they have collected as well.

    "The world is a mess, and I just need to rule it" - Dr Horrible
  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    edited April 2013
    Jebus314 wrote: »
    ronya wrote: »
    Jebus314 wrote: »
    ronya wrote: »
    Jebus314 wrote: »
    ronya wrote: »
    nobody wants to let other people access the gigantic spreadsheets. skeptics always do yell "just give me the *.CSVs", it is disingenuous to pretend that nobody wants to see the 1000 lines of code.

    manipulation might be easiest at that point - it is easy to leave a paper trail that you've run an experiment, it is harder to detect whether you fooled around in the statistical analysis. in the case of suspected misconduct, it is crucial that someone does take the code wholesale and press run to see that you get the graphs, because there is where the analysis is most vulnerable to deliberate censorship, and the supposedly unprocessed noise-filled raw data most easily subject to tests for manipulation.

    for good or ill, the model of scientific research in almost all lab-driven work is to give each lab ownership of the raw data, because it's really expensive to collect the data. If labs could simply access each other's empirical work, nobody would do the tiresome experiments and everyone would want to do the statistical analysis.

    the present pressure for, say, climate research to upload all their raw data in a publicly-interpretable format is actually very rare amongst sciences! this is why initial AGW denier requests for "raw data" were met with "fuck off, we don't owe you shit". and this is still the case in other politicized sciences, e.g., the Lenski creationist-request-for-"raw-data" affair. Because this was exactly what was considered reasonable amongst almost all sciences. Drilling for ice cores in the Arctic is expensive and difficult, but running the math later is not.

    academics and laymen use "raw data" differently... if I were called upon to worry about R&R's "raw data" before this broke out, I would have worried about how R&R compiled and regularized public statistics into a comparable form to begin with, not how they they ran their numbers on this data. The latter is supposed to be so simple that a competent, non-malicious researcher doesn't mess it up. The typical layman concern - easily seen in virtually all public discussion of the fiasco - exactly reverses these worries. They assume that the numbers on the spreadsheet are sacrosanct but the transformation from spreadsheet to conclusion is most doubtful.

    I don't think the bolded point is true at all. The whole point of publication is to release your raw data. What would be the point of a publication where you made a bunch of conclusions without showing any data to back them up? Obviously you wouldn't want to release the data before you published, as that would be silly, but once you published, the data is already out there. In my experience people are more than happy to give you the data for research that has already been published, as the only thing they're really doing is giving you a more exact representation than you would get from just extracting the data from the publication. Even in the case where not all the data is published for the sake of brevity, I don't think there is a motivation for labs to keep the data secret. The examples you posted seem to fall more in the category of not handing out unpublished data, which makes sense. Maybe that's unique to my field though which is more engineering/chemistry based.

    Also, I didn't mean to imply that no-one would want to see the code behind the research so to speak. It just isn't what you want to present in a publication, nor should it be. If there are errors obvious enough to notice without spending large amounts of time replicating the work you wouldn't need the code. As for the small errors, the information given in the paper should be enough to reconstruct the work and determine if there are any problems. Which is generally what happens anyway when researches want to continue or expand upon the published work. I was just trying to point out that using the code as part of the review process is most likely impossible as it would take way to long to review anything.

    edit- Actually, the more I think about it, the more I think materials type research is just fundamentally different than economic type research. If I run an experiment on a core sample from the antarctic, there's really only 1 way to analyze it. You run say x-ray diffraction, and you get 1 spectra. That's it. However, it feels like if I run say a study where I collect data on people's incomes, as well as a bunch of other factors that might be related, then there would be lots of analysis you could do with that single data set. Which may make you more hesitant to release the raw data even after publications.

    are you a fellow academic? you do realize that when you mention the word "raw data", most non-academics interpret this as your unprocessed spreadsheet, not your wonderful regression numbers, which no doubt have very good t-values. "We obtained a value for θ of -6.31" is not "raw data" in the normal sense of the words. It is replicable - but only if someone else is willing to pay the very high barrier of entry to redo all of your work, when what may be really be suspected is a fiddling of the numbers coughed up by standardized equipment and process.

    If you work in an actual wet lab, you probably have a paper notebook or such where you write down all any quick readings or observations when you're working in the fume hood. In that it case it would be the contents of that notebook that is being held as suspicious. Did you really write down anything at all? Did you write down a lot of stuff and then handpick results?

    It is highly implausible that you are colluding with the manufacturer of the crystallography machine to make it produce data files of a readily manipulable nature. But it is all too plausible that the analysis software interprets the spectra and reports a peak at -5.31, and then you write down -6.31 in the paper. There's no nice way to put it: the charge here is that you are lying.

    And the problems created by your criterion for when people should be entitled to see the details of your work are neatly highlighted by the Stapel affair above, where the misconduct was carefully tuned to only produce expected results.

    Your assuming though that the intent is to mislead. Obviously if I am trying to pull a fast one then I will be hesitant to let you see the proof that I lied to you. For a crystallographic spectra, I would assume the expectation is that you show it in your publication. You don't normally see someone reporting results for experimentation like that without just reporting the actual spectra, because there's no reason to exclude it. It's compact, and makes my case that much better because you can see the actual result rather than having to believe me.

    As for the notebook, I would assume that yes I am giving you all the information that is in my notebook. If I feel as though the data isn't supported by the totality of my findings then it would be wrong for me to ignore certain aspects and just report the things that seem to agree. I don't think that's the norm, I think that is bad science. That isn't to say that you wont have to pick representative spectra some times, at which point you will probably pick the most flattering one. But again, in my experience people are more than willing to share the rest of their data because it either all fits within their conclusions or they wouldn't have published those results.

    I just don't think you can take the approach of assuming researchers are lying. The sheer volume of work in any field that is constantly being produced means there is simply not enough time to thoroughly review it all to the point of complete replication, even if you had literally everything that the original researcher had. You have to operate from a standpoint of assuming the researcher is accurately reporting their findings and let future researchers determine whether or not the work correctly predicts future observations.

    And the point being made here is that, well, no, in such-and-such field there is simply too much suggestion of misconduct and bad faith that we want the paper trail to be included with the published work, so that you can't claim in three year's time that you've lost the original logged data, should your work happen to suddenly attract interest.

    At that point, reverse-engineering the data series that gives the graphed spectrum from the printed graph would not be able to be tested with, say, Benford's law of digit distribution. So the data generated even before you have what you call "raw data" must be archived as well.

    Naturally you would know whether you are lying, but who else would know?

    But what's the point? If I am going to lie about my data, than I can just make up the "raw" data as well. There's no reason for me to limit myself to manipulation of the regression values, as I can fairly easily go back to the original spreadsheets and change the entered values. There simply isn't a very good way of ensuring that research isn't a total fabrication through peer review. Instead you have to rely on other researches to confirm or deny that your results line up with the data that they have collected as well.

    Because people often manipulate entered values badly. Stapel got caught when his entered values had repeated entries. And Benford's law and other tests of the amount of noise that should exist have caught fraudulent researchers before. All this just raises the difficulty of misconduct.

    Or, you know, it could just be the case that your Excel formula, that reports the final value of interest, accidentally drops the bottom three cells... that's not so much malevolence as incompetence, to be sure. Still, it is misconduct.

    ronya on
    aRkpc.gif
  • Options
    Inquisitor77Inquisitor77 2 x Penny Arcade Fight Club Champion A fixed point in space and timeRegistered User regular
    I can certainly understand how, if obtaining the data is the bulk of the cost and effort in the work, labs (in any scientific discipline) would be loathe to give up that information immediately, because it would be more difficult for them to reap the benefits of that investment if everyone has access to it. That being said, there is probably some sort of middle ground that could and should be reached, such as delayed release of data X years after initial publication. Of course, that implies that more rigorous standards are held across the board when it comes to experiments.

    Re: Economics, it seems like a lot of the original data gathering is done by governments and non-profit organizations (on the macro end). [Corporations generally only care about their specific data (customers, employees, etc.), and would pretty much never release that data for public consumption anyway.]

    If that is true, then it stands to reason that replication of economic models and studies should actually not be all that difficult - it's a matter of tracking down the original sources and replicating whatever normalization/manipulation was done to the data set, and then running that data through whatever computations are being described. My understanding is that the RR paper was in this vein, which is why the graduate student was tasked with replicating the work to begin with. Obviously, if his advisor didn't think that he could get his hands on some form of the source data, he wouldn't have been told to even bother with the attempt.

    When RR handed over their "spreadsheets", what they were doing was providing the graduate student insight into the type of data manipulation they performed in order to put those numbers into their model. (Keeping in mind "manipulation" should have a neutral connotation here, as any time you pull across multiple data sets you must normalize them somehow.) It's when the graduate student was looking at those next-step numbers that he noticed something was really weird.

    Micro-economic experiments, on the other hand are almost directly analogous to psychology experiments (and often done in the same departments or multi-disciplinary groups), so if someone wanted to replicate those results, generally speaking, they would have to put in the same investment (most likely with an eye towards tweaking the parameters to see if the results are replicable). A common type of this work are those experiments where you put a bunch of people in a room and have them participate in some sort of game or competition involving money - those experiments are replicated constantly because researchers are curious to see if the same experiment done in India produces results that are different than the original one performed in the U.S.

  • Options
    ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    edited April 2013
    There is quite a lot of work done in making sure that data provided by different governments are actually comparable to each other, and collecting ancient data from periodicals that may not be issued on the Internet. Not every country has a FRED2 that neatly collates everything for you into a CSV.

    And that's just for standard macro aggregates. Much of the time, in economics, empirical work entails going out there and getting some data. The volunteer lab is widely mocked but it's done, as you said; sometimes one has to sift through data provided by a company or a public agency and censor it into a non-privacy-violating form. In these cases, an issue that may be familiar to those paying attention to climate science denialism can occur, whereby one agency making up a component of the data set insists that their data is proprietary and so forces you to hide quite a lot of it.

    The reason why replicating R&R is difficult is because there are very shoddy measures and wildly inconsistent measures of public debt out there, and so going out there and regularizing it into something that actually fits the economic notion of government debt is actually quite difficult. Many countries have different accounting standards, and different classifications for debt, and different classifications for what constitutes a private versus a public body. The US alone has a Social Security Trust Fund, and all sorts of state-level debt and pension funds, and so forth. It's not just "download this data from this agency and run this command". In a pattern that is fairly standard across sciences, R&R compiled a database of public debt and then kept it private for a few years to mine a lot of papers out of it.

    Notice that nobody has yet criticized the quality of R&R's raw data; what is criticized is how they interpreted it. It is one incident where the un-interpreted raw data would really have made a difference.

    ronya on
    aRkpc.gif
Sign In or Register to comment.