The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

How do I database?

garroad_rangarroad_ran Registered User regular
I have to analyse some data from word association tests I've done with a number of subjects, and I'd like to learn how to use a database to help me organize the data cleanly. There may be a way to do this nicely using a simple spreadsheet as well, which I would not be opposed to, since I actually have some experience working with spreadsheets, and database work is completely new to me.

The only software I have right now is Open Office on my Macbook. I am not fully opposed to purchasing something if it's necessary, but I'm nearly certain I should be able to do what I want to do with what Open Office offers. I just can't seem to figure it out. This has to be possible, right?

I have some data for each subject of my study, and then their responses to a series of prompt words. So a simple table might look like

Subject Name - Age - Location - English Level - Response to Prompt 1 - Response to Prompt 2 - Response to Prompt 3

The issue is that for each response I'd also like to add an additional field wherein I classify the response according to various types. These might look like:

Response to Prompt 1 (black)
white (opposite)
horse (set phrase)
hack (similar sound)

I will then go and analyze the data based on a number of different criteria. For example, who gave the most "opposite" responses? Did younger subjects give more "similar sound" responses?

The number of subjects in this study is small enough that I could go through it all by hand and do this with a few different spreadsheets. But I suspect learning how to use database software will go a long way toward making this faster, as well as reducing the likelihood of errors.

Any advice on how to go about storing this data would be appreciated!

Posts

  • davidsdurionsdavidsdurions Your Trusty Meatshield Panhandle NebraskaRegistered User regular
    You've basically described the last comprehensive problem from an Intro to Database class.

    Work yourself through this tutorial and maybe have a look/listen at a youtube video, though my cursory search for one brought up a lot of useless amateur videos. :*(

    Basically that pdf tutorial I have linked will walk you through setting up a database, and you could probably just swap the info you want into the examples and just go from there. If you run into specific questions, though, that will be where we (or just me?) can help.

  • UncleChetUncleChet N00b Lancaster, PARegistered User regular
    Subject Name - Age - Location - English Level - Response to Prompt 1 - Response to Prompt 2 - Response to Prompt 3
    I see 2 tables. Table Subject with fields 1: auto number (primary key), name (first) name (last) Age Location English Level
    Table 2 Response 1 (drop down list) Response 2 (drop down list) Response 3 (drop down list)
    You could do this within a spread sheet as table 1 = worksheet 1, and table 2 = worksheet 2.
    Then, you can use filters to get the various responses (queries) to questions like age = 23 and response 1 = horse and english level = proficient.
    MySQL is free, and fairly easy to use to build from spreadsheets. OracleSQL is also free, but runs on java, and Finally Microsoft SQL is also free.
    I would build your spreadsheets (row 1 is header) and then import into any of these database programs.
    Then, for really fancy, you can use FlySpeed SQL Query to throw visual queries of the related tables and fields and it will give you both results and the SQL built to run those results (SQL = structured query language, or how do i ask database)

    I'm sometimes grumpy and random, feel free to overlook the strange man in the corner.
  • garroad_rangarroad_ran Registered User regular
    Ok, thanks for the responses guys. I tried working through the tutorial that davidsdurions linked, but I can't say that it really helped me understand anything at all. I think I'll try hunting around for a lower-level tutorial and see if that helps.

    As for UncleChet's advice, I'm not sure that I follow. I understand table 1 just fine, but I really don't understand table 2. Would I need to create a new table 2 for each prompt word?

  • UncleChetUncleChet N00b Lancaster, PARegistered User regular
    edited March 2015
    @garroad_ran Actually, I may be over complicating things. I think 1 table might work just fine, as there's no need to establish relationships between sets of data.

    UncleChet on
    I'm sometimes grumpy and random, feel free to overlook the strange man in the corner.
  • GhotiGhoti Registered User regular
    What would probably work best is 2 tables:

    Subject table with the following fields. These records would be unique per subject.
    SubjectID
    SubjectName
    Age
    EnglishLevel

    The second table would have the responses. There will be 3 records for every subject, per response.
    SubjectID
    PromptNumber
    Response
    ResponseType

    You would then link the tables in whatever queries you would use through the SubjectID. This way, you can aggregate total number of response types over the whole set, or to whatever level you like (response number, subject, etc.).

    Depending on what tool you might be using, I might be able to provide extra guidance.

  • ArchangleArchangle Registered User regular
    Okay, from your responses I'm going to take a step back from Fish and UncleChet's advice to table concepts. It may not be necessary to database it up, you might be fine with Calc (or whatever Open Office package you're using's Excel-equivalent).

    Basic databases are a series of tables of information that are linked somehow. You probably already know that. There's two main reasons why you'd want to do this.

    1. You have a lot of data.

    Imagine you're running a Gym of 20 members. For each member, you have 3 sets of data which are used for different purposes:

    A) Their gym routines (pullups, weights, cardio, etc.) - helps you decide what gear to buy, or whether enough people would be interested in group classes for a particular exercise etc.
    B) Their billing information - helps you decide if you should offer per session payments, monthly autopayments, annual memberships etc.
    C) Their demographics (age, gender, etc.) - helps you target your advertising and marketing to who seems to like your gym.

    You've only got 20 members, so you say "Screw it, I'm just going to jam it all in one table". One day you're approached by National Gym Franchise - "Your system is amazing! You always know what classes to schedule, where to advertise, and what billing packages to offer! We want you to take your system and put our 800,000 national members in your database".

    Now whenever you run your Excel-equivalent on your 1 Table to get your reports and analytics, you can go and have lunch in the time it takes for it to go through the Demographics AND the Billing AND the Routines - even though each individual report will ignore 2/3 of the information when it runs. "Sheesh", you think. "It would have been so much easier to break it into 3 chunks right back at the start so it'd run in a fraction of the time".

    In your case, you seem to have two main divisions of data - the demographics (1) and the test results (2). Since you mention you can enter this all by hand, it doesn't sound like you have enough data where multiple tables would give a performance bump, but people who regularly work with databases would probably do it anyway because (a) It's good practice, and (b) You never know when you might get bought out by Facebook.

    2. Your data is collected and laid out in different formats.

    Now, instead of a Gym imagine you're running a Bank of 20 members. When each member signed up, they were assigned a Bank Account number, you took all their contact information, assigned them an account type, and gave them a Bank of Garroad_Ran creditcard.

    Every day, these members go about their daily routines... with their creditcards. Some may use it hardly ever, some may use it multiple times per day. At the end of each day, your system spits out a list of all the transactions which your members have undertaken during the day that you have to settle. So you have 2 tables - 1 of all your members and their personal details, 1 of all the transactions (in which members may be listed multiple times or not at all) - laid out completely differently but still linked because both contain the Bank Account numbers.

    In your case, all the information is collected at the same time in the same format so it's not particularly necessary to do the extra work to separate them out.

    Excel and Excel-Equivalents (like Calc) can act as multitable databases with their Sheets. Sheet 1 would just contain all the Demographics, while Sheet 2 would contain all the Test Scores (with each sheet having their SubjectID on the left column so when you manipulate the data you can still work out which test scores go with which demographic data). If you're going to use Pivot Tables (or equivalent), it may be easier just to keep it all in one sheet.

    Hope that's useful!

  • garroad_rangarroad_ran Registered User regular
    Ok, thanks for the suggestions guys. It's all super useful stuff for me. As I said, I'm interested in learning how to do this not only so that it will help me with this data analysis, but because I want to get at least a basic grip on database work so that if I ever -really- need to use it, I'll have that knowledge. I'm working through this series of tutorials which are pretty good at getting me going from point zero.

    Unfortunately, due to travel and lack of time it's going to take me a few weeks to really make this happen. I'll post again if and when I get it all figured out!

  • UncleChetUncleChet N00b Lancaster, PARegistered User regular
    I'd also recommend hitting Udemy/Stack Social (both .com) as they have bundles and freebies of online/youtube classes that a lot of them are really well done. I have the front end/back end and java developers series, as well as a few SQL.

    I'm sometimes grumpy and random, feel free to overlook the strange man in the corner.
  • Dr. FrenchensteinDr. Frenchenstein Registered User regular
    if you just want to get shit done and bang it through, a pivot table might give you what you need, but best practice would probably be a simple database. i think Open Office comes with an Access clone, right? see if you can play around in there and get what you are looking for.

  • YogoYogo Registered User regular
    I can recommend Google Spreadsheets if you want a free alternative to Open Office.

    Reasons I enjoy Google Spreadsheet:
    • It's free.
    • Google has integrated a lot of functions with their other products (Google Forms for external data input).
    • It allows scripting (you can find really usable scripts by others or make your own).
    • It allows pseudo SQL queries in your code when pulling data from different sheets.
    • Very active community for help.

    The most basic con is that it's Google and that there are some limitations in regards to how much data you can call upon in Spreadsheets. In my own work, I am experiencing some tendency for my code to 'hang' due to server capacity limitations on Google's end (but then again, it's free so I can't really complain).

  • HypatiaHypatia Registered User regular
    edited April 2015
    Archangle wrote: »
    Basic databases are a series of tables of information that are linked somehow. You probably already know that. There's two main reasons why you'd want to do this.

    1. You have a lot of data.
    2. Your data is collected and laid out in different formats.

    There are a few other reasons to use a database, for example:
    - if there may be multiple sources that are all trying to modify data and/or read data at the same time
    - databases can generally guarantee that you won't lose data because your machine crashes
    - if you need to run queries on your data very quickly and want it to handle a lot of the optimization for you

    The thing that strikes me about your post is it's a little unclear what you're hoping to get help on other than organizing your data, which is why some of the responses went directly into the table schema (organization).

    It seems like there might be several things you might be asking about:
    - how to set up a database to use (e.g., what software to use and how to get it configured)
    - how to organize your data in the database (e.g., the schema)
    - how to "use" a database (e.g., the commands to actually get the data into the database and how to access your data)

    Is it just one of those things or all of them or some subset? We might be able to get you connected to better resources for any of those steps if you can provide a bit more detail on where you're at :)

    Hypatia on
Sign In or Register to comment.