The new forums will be named Coin Return (based on the most recent
vote)! You can check on the status and timeline of the transition to the new forums
here.
The Guiding Principles and New Rules
document is now in effect.
Software for large amounts of data?
I've been tasked with finding a solution for my work's huge data management problem. Currently, we use Excel for everything, including making summary reports from huge amounts of data (bordering on 1 million rows). For context, Excel shut down on me at least 50 times today, it got old really fast!
Does anyone have any recommendations for software that I should look into purchasing? Preferably something user-friendly, that can perform calculations for lots of data and present them clearly? We also use Access for some of the larger client databases, but many of the formulas we use in Excel are too complicated for queries.
Any pointers in the right direction would be appreciated! Thanks!
(V) ( ;,,; ) (V)
0
Posts
Access is a database and it's supposed to be easy to use. I am not really familiar with access so I can't really comment on your statement that the formulas you use are too complicated for queries.
Two free database solutions are MySQL and PostgreSQL. I think PostgreSQL is probably a little easier to set up and administer. You will have to learn some SQL if you don't have a handle on it already. You can import your excel data to your database(s) if you save your excel tables as .csv.
As for presenting the data, you have plenty of options. The simplest would probably be to use Excel to connect to a database and pull in tables or the output of queries, then use Excel to present your data. There are also a number of reporting and business intelligence tools that will work for either database system, but that is a whole discussion unto itself.
This could be true -- although you can do a LOT with queries when you really spend the time to learn relational algebra -- but if so, that just means you'd move to something like stored procedures or triggers to do the calculations, which gives you a lot more flexibility in what calculations you can perform. It does seem that you'd be well served moving to a database solution, though I don't have enough experience with recent versions of Access to say whether it would meet your needs or if you need to move to another option (MySQL, PostgreSQL, MS SQL).
Alternatively, if it's more about calculations on one-off datasets rather than reporting on continuously updating datasets, then something like R might be a good option. It can easily read in very large CSV files (or other formats) and then you could perform your calculations, create graphs/visualizations, etc. Of course R could be used for calculations/visualizations/etc even if you're using a database, but if the complexity isn't too bad a simple database frontend (like Access) could handle it.
If you know some VBA you could set up Excel to pull the data piecemeal from Access and then execute the formulas on a reduced dataset, or do running calculations as you go through the data a piece at a time without swamping excel's memory.
As Daenris said R is also pretty good for free software assuming you're comfortable with command line interfaces.
It's Microsoft's free SQL offering, all the same great software as their full blown offering (MS SQL Server is easily the most friendly, easiest to use SQL software out there), just limited to 10gb database size with 1gb memory usage.
It'll work perfectly fine with your small 1 million row Excel workbook after you import it, as that's 100-300mb in size, at most.
And by the way, SQL can easily do all your Excel calculations.
I'd avoid Access because it's just not that great, and you'd be better off learning to do it the right way then partially doing it both ways.
MySQL and Postgre are more open sourced, you'd see them on linux based systems more frequently, and they're both a pain in the butt to get working compared to MSSQL Express.
You should be able to do all the Excel stuff you do in SQL, but there's an obvious learning curve there.
You'd want to look into stored procedures and functions.
First Link:
http://www.microsoft.com/en-us/server-cloud/products/sql-server-editions/sql-server-express.aspx
Stored procedure example:
http://msdn.microsoft.com/en-us/library/ms345415.aspx
Function example:
http://msdn.microsoft.com/en-us/library/ms191320.aspx
There isn't a huge amount of difference between stored procedures and functions other than what you put in and out, and how you call them. You'll want to install this on a server environment. You can run a query on any machine, honestly. There's ways to get this data back into your spreadsheets if you need them there, but you really need to move to a database, excel is for small data sets (like 1000 or so tops). You'll want the management studio on your local machine. I think the first link has both, if you install it on your local machine, just do the management studio by itself.