As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/

Software for large amounts of data?

Ravenhpltc24Ravenhpltc24 So RavenRegistered User regular
I've been tasked with finding a solution for my work's huge data management problem. Currently, we use Excel for everything, including making summary reports from huge amounts of data (bordering on 1 million rows). For context, Excel shut down on me at least 50 times today, it got old really fast!

Does anyone have any recommendations for software that I should look into purchasing? Preferably something user-friendly, that can perform calculations for lots of data and present them clearly? We also use Access for some of the larger client databases, but many of the formulas we use in Excel are too complicated for queries.

Any pointers in the right direction would be appreciated! Thanks!

(V) ( ;,,; ) (V)

Posts

  • oldsakoldsak Registered User regular
    It sounds like you might need a database.

    Access is a database and it's supposed to be easy to use. I am not really familiar with access so I can't really comment on your statement that the formulas you use are too complicated for queries.

    Two free database solutions are MySQL and PostgreSQL. I think PostgreSQL is probably a little easier to set up and administer. You will have to learn some SQL if you don't have a handle on it already. You can import your excel data to your database(s) if you save your excel tables as .csv.

    As for presenting the data, you have plenty of options. The simplest would probably be to use Excel to connect to a database and pull in tables or the output of queries, then use Excel to present your data. There are also a number of reporting and business intelligence tools that will work for either database system, but that is a whole discussion unto itself.

  • DaenrisDaenris Registered User regular
    We also use Access for some of the larger client databases, but many of the formulas we use in Excel are too complicated for queries.

    This could be true -- although you can do a LOT with queries when you really spend the time to learn relational algebra -- but if so, that just means you'd move to something like stored procedures or triggers to do the calculations, which gives you a lot more flexibility in what calculations you can perform. It does seem that you'd be well served moving to a database solution, though I don't have enough experience with recent versions of Access to say whether it would meet your needs or if you need to move to another option (MySQL, PostgreSQL, MS SQL).

    Alternatively, if it's more about calculations on one-off datasets rather than reporting on continuously updating datasets, then something like R might be a good option. It can easily read in very large CSV files (or other formats) and then you could perform your calculations, create graphs/visualizations, etc. Of course R could be used for calculations/visualizations/etc even if you're using a database, but if the complexity isn't too bad a simple database frontend (like Access) could handle it.

  • Dis'Dis' Registered User regular
    We also use Access for some of the larger client databases, but many of the formulas we use in Excel are too complicated for queries.

    Any pointers in the right direction would be appreciated! Thanks!

    If you know some VBA you could set up Excel to pull the data piecemeal from Access and then execute the formulas on a reduced dataset, or do running calculations as you go through the data a piece at a time without swamping excel's memory.

    As Daenris said R is also pretty good for free software assuming you're comfortable with command line interfaces.

  • MrDelishMrDelish Registered User regular
    Power Query for Excel should be able to help with the loading from Access part. It's a free plugin from Microsoft.

  • hsuhsu Registered User regular
    Microsoft SQL Server Express
    It's Microsoft's free SQL offering, all the same great software as their full blown offering (MS SQL Server is easily the most friendly, easiest to use SQL software out there), just limited to 10gb database size with 1gb memory usage.
    It'll work perfectly fine with your small 1 million row Excel workbook after you import it, as that's 100-300mb in size, at most.
    And by the way, SQL can easily do all your Excel calculations.

    iTNdmYl.png
  • bowenbowen How you doin'? Registered User regular
    MSSQL would be the obvious choice since you're already working with Excel.

    I'd avoid Access because it's just not that great, and you'd be better off learning to do it the right way then partially doing it both ways.

    MySQL and Postgre are more open sourced, you'd see them on linux based systems more frequently, and they're both a pain in the butt to get working compared to MSSQL Express.

    You should be able to do all the Excel stuff you do in SQL, but there's an obvious learning curve there.

    You'd want to look into stored procedures and functions.

    First Link:
    http://www.microsoft.com/en-us/server-cloud/products/sql-server-editions/sql-server-express.aspx

    Stored procedure example:
    http://msdn.microsoft.com/en-us/library/ms345415.aspx

    Function example:
    http://msdn.microsoft.com/en-us/library/ms191320.aspx

    There isn't a huge amount of difference between stored procedures and functions other than what you put in and out, and how you call them. You'll want to install this on a server environment. You can run a query on any machine, honestly. There's ways to get this data back into your spreadsheets if you need them there, but you really need to move to a database, excel is for small data sets (like 1000 or so tops). You'll want the management studio on your local machine. I think the first link has both, if you install it on your local machine, just do the management studio by itself.

    not a doctor, not a lawyer, examples I use may not be fully researched so don't take out of context plz, don't @ me
  • Ravenhpltc24Ravenhpltc24 So Raven Registered User regular
    Thank you for the suggestions, everyone! I am gathering that SQL is the way to go. I'll get to work on that right away, it'll make the weekly reports a lot less painful. :)

    (V) ( ;,,; ) (V)
  • useless4useless4 Registered User regular
    Couple commercial solutions, the big one is 1010 data. It's basically "endless excel" or some sort of similar marketing slogan. As a power excel guy doing 500k rows by 40-50 columns I understand your pain and 1010 data looked like a winner but I never got very far into "what does it cost"

Sign In or Register to comment.