The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Organizing data: help me be a respectable scientist.

AustralopitenicoAustralopitenico Registered User regular
So, I am currently trying to get a PhD (operative word: TRYING) in neuroscience. So far, so good, but I am running into trouble when organizing all my data. My boss always used Excel, but I loathe that program and getting data in there generates infinite workbooks where you have to scroll forever until the sweet release of death embraces you. It's also very inefficient, since you need a cell for each variable and not all types of data have all the variables (those different types of data still need to stick together, though).

So, since there seems to be a good number of knowledgeable people here about technology and such, I was wondering whether any of you knows of some tool that could make this easier. What I need is some kind of hierarchical organizer. Each particular data point must have a number of "sub points" (each with their particular variables) stored within it and must have some metadata attached to it.

Is there anything out there that I can use that does not require l33t programming skills? I'm literate in matlab, but I don't know if I have the time to learn a programming language from scratch.

Posts

  • tynictynic PICNIC BADASS Registered User, ClubPA regular
    Well, in theory you could use matlab, by making clever use of structures. The problem there is that accessing and viewing your data becomes a little unintuitive, though of course you can always write subfunctions to read it in sensibly and later print it out all nicely for you.

    That said, there are definitely other programs; likely much better. Sorry I can't suggest anything specific; I spent my PhD time juggling matlab and excel :P

  • k-mapsk-maps I wish I could find the Karnaugh map for love. 2^<3Registered User regular
    edited March 2013
    It's the 21st century; being able to program is being literate. Okay, maybe I'm being a little harsh. But, Python (which is not my favorite, by far) is easy and will make your life so much happier. What you're describing is something any decent language can do in less than 5 lines.

    I love my job security and everything but it pains me to see that scientists aren't trained to learn even rudimentary programming skills. Don't be like that.

    EDIT: I missed the MATLAB part here. It should suffice for at least CSV format.

    k-maps on
  • AustralopitenicoAustralopitenico Registered User regular
    edited March 2013
    k-maps wrote: »
    It's the 21st century; being able to program is being literate. Okay, maybe I'm being a little harsh. But, Python (which is not my favorite, by far) is easy and will make your life so much happier. What you're describing is something any decent language can do in less than 5 lines.

    I love my job security and everything but it pains me to see that scientists aren't trained to learn even rudimentary programming skills. Don't be like that.

    It's not about lack of will, you see, it's about time. I have to dedicate my time to my actual primary purpose, which is research. I already design my own Matlab analysis tools and if I had to learn a new language for every tool I want to use I would never get any actual work done.

    So I'm sorry for your pain, but if I can find an easier way to do this than learning Python, I totally will use it.



    Australopitenico on
  • mtsmts Dr. Robot King Registered User regular
    honeslty you need to suck it up and use excel. if your boss is using excel and that is how he wants it, then you need to do it that way. the thing with excel is that everyone knows how to use it and you cna just send someone a file and they should be able to open it up on any computer, you cant say that about matlab etc.

    camo_sig.png
  • k-mapsk-maps I wish I could find the Karnaugh map for love. 2^<3Registered User regular
    It's not about lack of will, you see, it's about time. I have to dedicate my time to my actual primary purpose, which is research. I already design my own Matlab analysis tools and if I had to learn a new language for every tool I want to use I would never get any actual work done.

    So I'm sorry for your pain, but if I can find an easier way to do this than learning Python, I totally will use it.

    This is a classic "teach a man to fish" type problem. You're employing a false economy to think that finding a "quick-fix" solution in the form of some GUI tool is going to solve your problem. How long is it going to take you to learn this new tool? Deal with its bugs? What if your requirements vary slightly from the specifications, and now you're scouring some dedicated help forums on another website for hours or days waiting for some hacky patch or plugin.

    Your stated primary purpose --- research, is increasingly becoming computational. I'm not saying you have to be a machine learning maven, but you're seriously crippling yourself by not knowing at least one general programming language; you don't need to "learn a new language for every tool you want to use." Not trying to start a D&D thread, but I see fellow scientists struggling with this all the time, and this is my honest advice for them. The alternative is to become a professor and use a high-level language (i.e., your grad students) do it for you :P.

  • DaenrisDaenris Registered User regular
    k-maps wrote: »
    I'm not saying you have to be a machine learning maven, but you're seriously crippling yourself by not knowing at least one general programming language; you don't need to "learn a new language for every tool you want to use."

    He already said he knows MATLAB, which is a general programming language and widely used in Neuroscience (and other) research. Based on what I've seen in the neuroimaging labs I've worked in, he's already ahead of the curve if he knows MATLAB.

    Australopitenico, you could try using a database to organize the data (MySQL/PostgreSQL are both common open source options, but even an Access database could work fine). For most of our questionnaire/demographic/etc data, we have it entered in a MySQL database based on subject ID, visit date or number, etc. Though honestly, for the majority of work that actually gets done with the data, it ends up in either Excel, SPSS, or just text CSV files (for analysis in MATLAB or R) eventually.

  • ceresceres When the last moon is cast over the last star of morning And the future has past without even a last desperate warningRegistered User, Moderator Mod Emeritus
    k-maps wrote: »
    It's not about lack of will, you see, it's about time. I have to dedicate my time to my actual primary purpose, which is research. I already design my own Matlab analysis tools and if I had to learn a new language for every tool I want to use I would never get any actual work done.

    So I'm sorry for your pain, but if I can find an easier way to do this than learning Python, I totally will use it.

    This is a classic "teach a man to fish" type problem. You're employing a false economy to think that finding a "quick-fix" solution in the form of some GUI tool is going to solve your problem. How long is it going to take you to learn this new tool? Deal with its bugs? What if your requirements vary slightly from the specifications, and now you're scouring some dedicated help forums on another website for hours or days waiting for some hacky patch or plugin.

    Your stated primary purpose --- research, is increasingly becoming computational. I'm not saying you have to be a machine learning maven, but you're seriously crippling yourself by not knowing at least one general programming language; you don't need to "learn a new language for every tool you want to use." Not trying to start a D&D thread, but I see fellow scientists struggling with this all the time, and this is my honest advice for them. The alternative is to become a professor and use a high-level language (i.e., your grad students) do it for you :P.

    What you need to understand about biologists and biology-based scientists is that unless the term "engineer" comes after the name, programming is not usually part of training or education or even something you'll need day-to-day in the workplace pretty much ever, so saying it's part of basic scientific literacy is just mind-blowingly off. You should probably can the reproachful tone since it sounds like you don't actually have an idea that doesn't include "learn programming". I'll admit that my statement comes off sounding a lot more like a suggestion than it actually is, but I assure you it is not one.

    And it seems like all is dying, and would leave the world to mourn
  • k-mapsk-maps I wish I could find the Karnaugh map for love. 2^<3Registered User regular
    ceres wrote: »
    What you need to understand about biologists and biology-based scientists is that unless the term "engineer" comes after the name, programming is not usually part of training or education or even something you'll need day-to-day in the workplace pretty much ever, so saying it's part of basic scientific literacy is just mind-blowingly off. You should probably can the reproachful tone since it sounds like you don't actually have an idea that doesn't include "learn programming". I'll admit that my statement comes off sounding a lot more like a suggestion than it actually is, but I assure you it is not one.

    This is H&A, right? Through undergrad and into graduate school I worked with several "non-engineers" on various multidisciplinary projects including with "non-scientists" in humanities such as linguistics. I gave my honest advice about what this guy should do given what I've seen in these projects. Any further discussion on this will be a debate. I'll be happy to defend my position any day, but it is obvious that because people don't like this opinion then it will be construed as a form of trolling.

    This is an extremely violent response to what basically boils down to me encouraging someone to learn a new skill. If I'm working on a new project involving linguistics, I don't get all huffy if someone suggest that I take a basic course about grammars and lexicons. I'm guessing I would get a similar response if I wanted to work on a neuroscience-related topic, but people wouldn't be nearly as vehement about touting the impracticality of me, the engineer, learning something about the structure of the brain. In the same vein, I don't think it's ludicrous of me to suggest he take a course, or mini-course on some programming language, regardless of what they do in the "workplace."

    So, agree to disagree?

  • mtsmts Dr. Robot King Registered User regular
    While I agree with ceres point that bio people don't use programming language, I disagree that kmaps was off in the way he made his suggestion. I have seen way worse.

    camo_sig.png
  • AustralopitenicoAustralopitenico Registered User regular
    edited March 2013
    Well, I didn't mean to sparn anything here. k-maps, you are totally right that knowing more computer languages would be useful, as would be knowing many other stuff. However, learning a new computer language for the sole purpose of building my own database program, despite being trivial in your eyes, would mean a lot of time and effort for me, time that I don't currently have.

    The question of whether everyone should know how to program and to which extent should they know it is something for another thread. I'll just say that if there exists a tool that already does what I want I am not going to build a new one from scratch, no matter how many engineers I piss off, the same way I send my defective hardware to the electronics workshop when it breaks down instead of learning electronics from scratch, as useful a skill as it may be.


    Australopitenico on
  • k-mapsk-maps I wish I could find the Karnaugh map for love. 2^<3Registered User regular
    edited March 2013
    Well, I didn't mean to sparn anything here. k-maps, you are totally right that knowing more computer languages would be useful, as would be knowing many other stuff. However, learning a new computer language for the sole purpose of building my own database program, despite being trivial in your eyes, would mean a lot of time and effort for me, time that I don't currently have.

    The question of whether everyone should know how to program and to which extent should they know it is something for another thread. I'll just say that if there exists a tool that already does what I want I am not going to build a new one from scratch, no matter how many engineers I piss off, the same way I send my defective hardware to the electronics workshop when it breaks down instead of learning electronics from scratch, as useful a skill as it may be.

    Oh man, I wasn't suggesting that you write your own database program, at all. That would be stupid even if you were a "l33t programmer." If I wasted time writing my own database, not only would I not get anything done, I would probably get fired. Maybe this is why we have a misunderstanding? It was more along the lines of writing a small script (not even an application) to store/retrieve from whatever data format you choose (be it SQL, CSV, etc.). While I dislike MATLAB with a passion, it should be sufficient for this purpose. As suggested, it would be optimal for you to use a relational database of some sort, but I think the learning curve for that is quite higher than, let's say, a language like Python IMO.

    If you're sick of Excel, improving your programming skills is the next logical step. To take your analogy of defective hardware...Excel does not have any actual bugs that are stopping you, it's just that you want more control. It would be like if you came to me and said that you're sick of all the extra crap that comes with buying a pre-built computer. Then, naturally I would suggest that you might want to look into building your own. Yet, no one in this thread would decry me for being an elitist douchebag for suggesting such a thing, because for some reason that is relatively common.

    A language like Python would be ideal, but MATLAB is totally fine for what you're describing. I'm not a MATLAB expert, so I can't help you without going through the documentation myself; but, if you'd like a sample of a simple Python + DB solution you can PM me (you can output from the DB to whatever standard format you want).

    EDIT: I guess what I'm trying to say is that if you hate Excel, then learning a high-level language, or improving your skills in MATLAB is the next step. It's the only "game in town" AFAIK. Either that, or as others have mentioned, bite the bullet and use Excel. If anyone has another solution, I, personally, would be very interested in hearing it. As this tool would be very useful for my research as well.

    k-maps on
  • ceresceres When the last moon is cast over the last star of morning And the future has past without even a last desperate warningRegistered User, Moderator Mod Emeritus
    You're fine. I actually agree with mts and think you should try to figure out a way to do what you need to in excel, just because it's what your boss already uses. Have you asked him what system is for dealing with this kind of data generation? Back in the day mine actually sat and explained to me how he wanted things organized and equations entered in and so forth for my projects, to keep things consistent with the way he was running his data. We weren't dealing with spreadsheets anywhere near the size you're talking about though.

    And it seems like all is dying, and would leave the world to mourn
  • EsseeEssee The pinkest of hair. Victoria, BCRegistered User regular
    If what you really hate about Excel happens to be the interface (maybe it isn't?) you can always try the Calc program in LibreOffice (a free office suite which branched off from OpenOffice) and see if that makes you less angry. Because it sure as hell makes ME a lot less angry. I hate using all Microsoft Office programs, but Excel is especially painful (the only thing that beats it is IE before IE7, ugh). LibreOffice also contains Base for making databases, which... well, I've never used it, but you can take a look and see if that's something you need. Corel also has the WordPerfect suite (and apparently a seperate product called Corel Office, but I have no idea what the difference is between the two), which is what I used to use before OpenOffice/LibreOffice came around, but you'd have to buy that and I'm not sure it'd be worth it for you. But it's a thought. Both programs should be compatible with Microsoft Office, but you'll want to double-check that things are talking to each other properly when you start using it.

  • Twenty SidedTwenty Sided Registered User regular
    edited March 2013
    Sigma Plot?
    It's basically a more powerful Excel.
    It has the same general cell layout as Excel but with quite a number of handy functions stapled onto it.

    The thing I've discovered about scientists is that they are as lazy about computer software as anybody else. Some don't like doing a lot more math than they can get away with either.

    Engineers and mathematicians also have all manner of specialized commercial software for more rigorous data crunching.
    Maple and Mathcad are examples where calculus can get involved. Basically they can spit out symbolic evaluations of whatever thing it is you're trying to do (at varying degrees of effectiveness). Maple seems like it can handle pretty complicated graphing tasks, though I haven't played with that program yet. Those might be what you're looking for.

    I'm pretty sure those two programs are tip-of-the-iceberg though.

    Twenty Sided on
  • Fuzzy Cumulonimbus CloudFuzzy Cumulonimbus Cloud Registered User regular
    What could you possibly be doing for a neuro project that requires something more advanced than excel? I worked in a physical chemistry lab. We used matlab all the time but matlab is not really appropriate for data storage, and thus, our results were put into excel. Matlab is appropriate for running simulations but it is not a very good data processor if you want to archive a bunch of stuff. I'm in the life sciences and I either store my data in the program/instrument I used or compile it in Excel.

    1) Is the issue the VOLUME of data? If so, you can set up a template and try to hide parameters that your instrument gives you that you do not need.
    2) Is the issue the TYPE of data? Is this numerical data? If so, it probably belongs in excel. If it is not numerical data, it might just belong in your lab notebook until you do a manuscript.

    Can you give an example of what you are generating and why you need to make a hierarchy out of it? Are you sure you even need to do that? I feel like (lol neuro people do this all the time :P) you might be unnecessarily reinventing the wheel.

  • Fuzzy Cumulonimbus CloudFuzzy Cumulonimbus Cloud Registered User regular
    @Ceres

    I wish in my heart of hearts that we had an H/A subforum for scientists. It would be like all those molbioforums except everyone would be intelligible. :P

  • FrysdiskenFrysdisken Registered User regular
    One alternative to start looking into is Qlikview, most of the programming done is by scripting.

  • ceresceres When the last moon is cast over the last star of morning And the future has past without even a last desperate warningRegistered User, Moderator Mod Emeritus
    @Ceres

    I wish in my heart of hearts that we had an H/A subforum for scientists. It would be like all those molbioforums except everyone would be intelligible. :P

    It's a neat idea, but I don't think it would see enough use. I toyed for about 6 months with the idea of starting a math help thread, but ultimately decided that it didn't come up often enough that there would be much traffic.

    I did once start a science-y thread in SE but it only went for like 4 pages or something. I figure if it couldn't last there it probably wouldn't last here.

    And it seems like all is dying, and would leave the world to mourn
  • mtsmts Dr. Robot King Registered User regular
    honeslty as someone who is applying for faculty positions now, if a grad student came up to me complaining about excel i would probably laugh at them.

    I would give them two options:

    1. find a better system that is easier, faster, and more convenient
    2. Suck it up

    Sigmaplot is great but its really more of a graphing program. its a good one, but i wouldn't want to store anything in there since it has some weird rules to it

    camo_sig.png
  • ClearlyNotAGoombaClearlyNotAGoomba Registered User regular
    As a fellow Ph.D.-seeker (Zoology), I can't imagine what kind of data you could have that couldn't be most easily handled (both for storage and analysis purposes) in a spreadsheet format. Generally if you have "sub-points" (?) for your data, you would have a separate column for that variable. Your value for your first variable would be the same for each row, and your second variable would be different. And you could just have a separate comment field (or fields) for metadata. For instance:
    individual     time     body mass     comment
    1              0        5
    1              1        6.2
    1              2        10            ate a lot
    

    This makes conversion into a CSV and input into some other analysis program (like R, which I use) super easy. Perhaps this isn't what you're looking for, but I think you may be overcomplexifying things. And always make sure you know how you're analyzing your data before you start to collect it in any particular format, because converting it later is a huge pain in the arse.

  • KiplingKipling Registered User regular
    I thought Matlab has some limited object oriented design. I remember looking through their GUI maker code and see something similar. Since IT banned Matlab for being a programming language, I don't have it to check and confirm anymore.

    Another speculative idea is to use VAMPP to run a SQL database on a local machine, and pull a Matlab file from FileExchange for SQL-Matlab communications. But make sure you lock VAMPP down to localhost access if you do that.

    Just don't use Excel for graphs in publications. Please.

    3DS Friends: 1693-1781-7023
  • Fuzzy Cumulonimbus CloudFuzzy Cumulonimbus Cloud Registered User regular
    Kipling wrote: »
    I thought Matlab has some limited object oriented design. I remember looking through their GUI maker code and see something similar. Since IT banned Matlab for being a programming language, I don't have it to check and confirm anymore.

    Another speculative idea is to use VAMPP to run a SQL database on a local machine, and pull a Matlab file from FileExchange for SQL-Matlab communications. But make sure you lock VAMPP down to localhost access if you do that.

    Just don't use Excel for graphs in publications. Please.
    It does. You can make horrific GUIs to your heart's desire. I really don't think it would be necessary to code an entire GUI in the awful Matlab language and then code a SQL database and then input data. :P

  • GdiguyGdiguy San Diego, CARegistered User regular
    ceres wrote: »
    @Ceres

    I wish in my heart of hearts that we had an H/A subforum for scientists. It would be like all those molbioforums except everyone would be intelligible. :P

    It's a neat idea, but I don't think it would see enough use. I toyed for about 6 months with the idea of starting a math help thread, but ultimately decided that it didn't come up often enough that there would be much traffic.

    I did once start a science-y thread in SE but it only went for like 4 pages or something. I figure if it couldn't last there it probably wouldn't last here.

    I'd support this as well - I basically avoid SE like the plague, so I definitely wouldn't have seen it there.

    As to the original question - can you give an example of what kind of data structure you're talking about? It does (unfortunately) sound like some sort of programming language might be the easiest answer (since Matlab is ungodly terrible at dealing with text structures), but honestly learning enough Perl to do basic parsing if you know Matlab shouldn't be very difficult at all. This will depend a lot on how many layers & subgroups of data features you're talking about, though

    If it's just storage, it may be that you want something like http://www.filemaker.com/products/filemaker-pro/ , which I know at least a few labs use to maintain strain / plasmid / etc databases, and doesn't require much back-end knowledge to get up & running

  • Twenty SidedTwenty Sided Registered User regular
    ceres wrote: »
    @Ceres

    I wish in my heart of hearts that we had an H/A subforum for scientists. It would be like all those molbioforums except everyone would be intelligible. :P

    It's a neat idea, but I don't think it would see enough use. I toyed for about 6 months with the idea of starting a math help thread, but ultimately decided that it didn't come up often enough that there would be much traffic.

    I did once start a science-y thread in SE but it only went for like 4 pages or something. I figure if it couldn't last there it probably wouldn't last here.

    Oh I'm sure we can provisionally include math help into this thread?
    Is that not a thing?
    I could use a service like that.

  • ceresceres When the last moon is cast over the last star of morning And the future has past without even a last desperate warningRegistered User, Moderator Mod Emeritus
    I'll take it under advisement. :) We don't get many math/science questions for the moment really, and H/A just doesn't see the traffic that other parts of the forum do to the point where the individual threads would get lost. Also, since H/A isn't so much intended for discussion per se it wouldn't really be a place where we could just all hang out and chat science. That's why I was hoping the SE thread would last a little longer than it did; I really was hoping for a thread where I could sit and talk science-y things with science-y people that would be school/job related more than news headlines or something, which I don't necessarily enjoy discussing. I am much more interested in people's day-to-day lives with it... shit that happens at work, sort of thing. I would have to figure out how to frame something like this, basically, and right now I'm not sure how to do it in a way that makes sense for H/A.

    If you have an idea how it might work, what you'd like to see, or just want to express a general interest in the idea, feel free to PM me about it and we'll see what happens if I get enough interest. For now let's let this thread get back on topic. :)

    And it seems like all is dying, and would leave the world to mourn
  • Twenty SidedTwenty Sided Registered User regular
    Sorry, I said, "thread."
    I meant "forum."

    Blah.

  • Pure DinPure Din Boston-areaRegistered User regular
    I've never posted in SE, but it would be awesome to have a place to chat about scientist things. It's just that I don't tend to notice threads outside of H/A because I'm shy to post much on less strictly moderated forums (have had trouble with online harassment in the past :( )

    Anyway, OP, it seems like you already have a pretty good idea of how you want your data to be formatted, but are having trouble finding or building a tool that will convert data from one format to another. If I were you I'd start with the "bribe an undergrad CS major with pizza and beer" approach. If it turns out that the job is actually bigger than that, tell your advisor that it would help you be more efficient and do better research if he would cough up the $10/hour to pay a decent CS undergrad to do it right. If every 2 hours of undergrad work saves you 1 hour of noodling around, that's money well spent. However this would only work if you are very certain about what you want and need, otherwise it would just be another time sink.

  • AustralopitenicoAustralopitenico Registered User regular
    Wow, lots of answers, great :D.

    @k-maps, it was definitely a misunderstanding, your suggestion is very sound and it would also make a good practice exercise (I had actually already started to learn Python a bit).

    I will also check MySQL, Libre Office and some of the other alternatives that have been proposed here.

    @mts, as you can see, I AM looking for an alternative that's easier, faster and more convenient, that is exactly the point of this post. The fact that my boss uses Excel does not mean anything, my boss was using some really unnecessarily complex data analysis procedures that I am already optimizing so that any clueless undergrad can click a button and get what we need. I don't think it's wrong to streamline the whole data processing and storage systems if I feel they are making our life harder for no reason. If I don't find any good alternatives of course I will "suck it up".

    As for the data examples some guys asked me for. You see, on the one hand there is a handful of detailed data for each animal. Then for each animal you have a number of recording sites, a.k.a the main data points, which have their own plethora of associated variables. The kicker is that those variables are each gathered from specific recordings (one file type gives you X, another gives you Y). The particular characteristics and identifier of these files must also be known and be easily associated with their particular data point (the summary of the data point is the "meat", is what you use for figures, statistical analysis etc.). Last but not least, the raw data from each recording are kept in a matlab file on a separate folder, and sometimes, depending on the analysis, they have their own small associated Excel spreadsheet.

    Of course it can be done with Excel a but I just find the current method of having the animals on one spreadsheet, the associated files in other, and the SPSS-ready datapoints somewhere else impractical, I still might have to do it, but just wanted to check how other people did it.

  • k-mapsk-maps I wish I could find the Karnaugh map for love. 2^<3Registered User regular

    As for the data examples some guys asked me for. You see, on the one hand there is a handful of detailed data for each animal. Then for each animal you have a number of recording sites, a.k.a the main data points, which have their own plethora of associated variables. The kicker is that those variables are each gathered from specific recordings (one file type gives you X, another gives you Y). The particular characteristics and identifier of these files must also be known and be easily associated with their particular data point (the summary of the data point is the "meat", is what you use for figures, statistical analysis etc.). Last but not least, the raw data from each recording are kept in a matlab file on a separate folder, and sometimes, depending on the analysis, they have their own small associated Excel spreadsheet.

    Of course it can be done with Excel a but I just find the current method of having the animals on one spreadsheet, the associated files in other, and the SPSS-ready datapoints somewhere else impractical, I still might have to do it, but just wanted to check how other people did it.

    Yeah, you're describing an object-oriented data structure. This means that your instincts are naturally leading you to a more representationally(?)/computationally complex problem. This is a great thing! I think a lot of new generation scientists tend to think this way (probably from growing up on complex strategy/rpg video games).

    I second Pure Din that you should bribe a cs undergrad. I would have happily done this for you as an undergrad...that sort of thing is a great resume builder for someone with little experience, and it is a great small project for any decent sophomore+.

    But, if you're already teaching yourself Python, great! :D. Just know that Python object-oriented features suck, but learning Java or Scala instead would be a PITA if you have no experience. At any rate you can probably get by with just using lists and dictionaries, maybe stored as JSON. Although if you're learning sql, even better. I would recommend sqlite over MySQL for now, as it requires zero work setting up.

    Good luck with your endeavors, and PM me if you run into any problems.

  • mtsmts Dr. Robot King Registered User regular
    Wow, lots of answers, great :D.

    @k-maps, it was definitely a misunderstanding, your suggestion is very sound and it would also make a good practice exercise (I had actually already started to learn Python a bit).

    I will also check MySQL, Libre Office and some of the other alternatives that have been proposed here.

    @mts, as you can see, I AM looking for an alternative that's easier, faster and more convenient, that is exactly the point of this post. The fact that my boss uses Excel does not mean anything, my boss was using some really unnecessarily complex data analysis procedures that I am already optimizing so that any clueless undergrad can click a button and get what we need. I don't think it's wrong to streamline the whole data processing and storage systems if I feel they are making our life harder for no reason. If I don't find any good alternatives of course I will "suck it up".

    As for the data examples some guys asked me for. You see, on the one hand there is a handful of detailed data for each animal. Then for each animal you have a number of recording sites, a.k.a the main data points, which have their own plethora of associated variables. The kicker is that those variables are each gathered from specific recordings (one file type gives you X, another gives you Y). The particular characteristics and identifier of these files must also be known and be easily associated with their particular data point (the summary of the data point is the "meat", is what you use for figures, statistical analysis etc.). Last but not least, the raw data from each recording are kept in a matlab file on a separate folder, and sometimes, depending on the analysis, they have their own small associated Excel spreadsheet.

    Of course it can be done with Excel a but I just find the current method of having the animals on one spreadsheet, the associated files in other, and the SPSS-ready datapoints somewhere else impractical, I still might have to do it, but just wanted to check how other people did it.

    yea, i totally see where you are coming from. finding the perfect management tool is like finding a gryffin. maybe see if you guys have a license for SPSS or whatever they are calling it now. it can do a shit ton of variables and the benefit is you can do all your stats in it. requires a bit of tinkering setting everything up but once you get it going you can just copy and paste directly from excel

    most universities will have a enterprise license for it, though if that fails you can get a student license i think.

    that may actually be your best bet. plus with some basic programming you can do some complex shit with it, though i may be thinking of mplus

    camo_sig.png
  • acidlacedpenguinacidlacedpenguin Institutionalized Safe in jail.Registered User regular
    maybe you could 'contract out' having a tool created for you by a CS or CE student? I imagine if you could work it out you could pay them in pizza/beer, actual money (if you can budget for it), or even in having them do it as a project for one of their courses.

    GT: Acidboogie PSNid: AcidLacedPenguiN
  • Bliss 101Bliss 101 Registered User regular
    I'm a biologist working with fairly large datasets, and I use R (http://www.r-project.org/) for everything. It hits that sweet spot of being sufficiently powerful and flexible without being overly complicated to learn. You do need to have a bit of a programming mindset to use it properly, but not much. It is free, you can transfer data between R and Excel easily, it'll handle all your statistical analysis and graphing needs, you can google up free packages capable of handling almost any statistical problem, and while there is a bit of a learning curve, it isn't all that complicated.

    MSL59.jpg
  • YoSoyTheWalrusYoSoyTheWalrus Registered User regular
    R is great but it is definitely more difficult to learn than many other programs considering the lack of any kind of GUI.

    I'd offer a few suggestions:
    1) Rather than finding a CS major, see if there is anyone in your department or nearby that would be willing to handle your data management issues in exchange for an author listing or an acknowledgement. This almost always worked for me before I learned for myself.

    2) SAS is very easy to learn and very powerful. Your uni likely has a deal where your lab can get it for ~$50, and The Little SAS Book will teach you everything you need to know. It can handle its own language or SQL, and will be a valuable tool in the future. It may be overkill for your current dilemma but IMO it is the easiest program to learn, and it has a nice GUI that will do a lot of your work for you.

    3) I disagree respectfully but strongly about biologists not needing to be able to handle large datasets. Granted, my background is genetics, but in my opinion Big Data will become more and more important in the near future. Knowing basic SQL, Python or Perl will make you infinitely more marketable. I know it's not pertinent to the current problem, but it's my two cents.

    tumblr_mvlywyLVys1qigwg9o1_250.png
  • BenditBendit Cømþü†€r Šýš†emš Anålýš† Ðeñv€r¸ ColørådøRegistered User regular
    Each particular data point must have a number of "sub points" (each with their particular variables) stored within it and must have some metadata attached to it.

    IMO, there's the kicker. I work with those types of data structures pretty often. Excel would not be the right tool for this.

    I think that a database with different tables is what you need.

    Here is an example of a file system table that can contain meta data for each file stored. Files, in this case, is exactly like the files and folders stored on your hard drive. One file can reside in another file (which is a folder), for example. Folder item A has File item B under it, therefore B has a parent of A. Sort of a pyramid architecture, know what I mean?

    Here is my SQL table definition (scaled down):
    CREATE TABLE [dbo].[tblFileSystemItem](
    	[ItemID] [uniqueidentifier] NOT NULL,
    	[ParentItemID] [uniqueidentifier] NOT NULL,
    	[isDirectory] [bit] NOT NULL,
    	[Extension] [nvarchar](50) NULL,
    	[FriendlyName] [nvarchar](250) NULL,
    	[SizeBytes] [int] NULL,
    	[Description] [nvarchar](500) NULL,
    	[Active] [bit] NOT NULL
    )
    


    Now, for your meta-data, you could extend and carry over the ItemID into another table (disregard the data types):
    CREATE TABLE [dbo].[tblMetaData](
    	[ItemID] [uniqueidentifier] NOT NULL,
    	[Weight] [nvarchar](50) NULL,
                  [FavoriteFood] [nvarchar](50) NULL,
                  [Color] [nvarchar](50) NULL,
                  [NumberOfAntennae] [nvarchar](50) NULL
    )
    


    You then need an engine and a front-end to manage this data. You could use SQL scripts to do it all. You could also use any language that can support SQL Server data access.

    My Live-Tracked Electronica: https://www.youtube.com/watch?v=XhSn2rozrIo
  • AustralopitenicoAustralopitenico Registered User regular
    edited March 2013
    Well, for what I see here you have convinced me to try and use SQL databases. They certainly look exactly like what I am looking for and you are absolutely right that knowing some SQL is a very useful skill to have in the future. I will also check with the computational guys if I run into trouble.

    Many thanks @k-maps and @YoSoyTheWalrus for the software recomendations, and to @Bendit for the snippets of code :D. And you are right, Walrus, in my past as a field biologists data structures were pretty simple, but neuroscience is a totally different animal.

    And about a science- or science careers-related D&D thread, I think it would be an awesome idea.

    Australopitenico on
  • mtsmts Dr. Robot King Registered User regular
    edited March 2013

    And about a science- or science careers-related D&D thread, I think it would be an awesome idea.

    i don't know at this point, the best science careers advice right now is probably not to get into it. at least until things change funding wise

    mts on
    camo_sig.png
  • zagdrobzagdrob Registered User regular
    I work in Research Infomatics (primarily Clinical)...so these questions come up quite a bit.

    Without knowing the full details of your study, I'm going to add another vote to the SQL camp. For what you are working with, it should be relatively straightforward to build a simple database. At most a couple dozen tables, probably much less depending on the specifics of your study...MySQL or PostGRE are good free options...plus you've always got the standard MS & Oracle offerings - getting them isn't usually an issue if you are public and doing research.

    If I were working with you, SQL would be a given, and all my questions would be based on your detailed requirements for data capture / entry and reporting / analysis. It's really just a matter of choosing the right tool for your application - there are countless ones out there, and they all work with the standard MySQL / PostGRE / Oracle database types.

    A piece of advice...data management is a big part of any research, and it's only getting more important. In most labs, having a passing knowledge of your data tools is enough to make you the guru, and learning how they actually work is a major skill that offers a lot of benefits down the line. If you can develop your data management skills to a moderate level, you'll be able to use those skills and your degree to leverage a number of extremely lucrative positions. People who can do research are a dime a dozen. People who can do research and understand how to fully utilize their tools make A LOT of money and are always in high demand. At this point in history...don't discount having a fallback outside of research either. If you go IT with an unrelated PHD? Talk about $$$.

    Another thing that I find a bit painful to suggest...depending on the size of your data set, and your scalability needs, you may want to look at Access. It's easy to use, ubiquitous, and may be solid 'enough'. If this is just for you though, you're probably better off using MatLab since you are already familiar with it.

Sign In or Register to comment.