So for reasons not interesting enough to explain, I find myself needing to deal with a bunch of data on a bunch of excel spreadsheets. About 30, with ~20,000 entries each.
I've never been that big on excel because it hasn't really been necessary tool for anything I've done, but I know enough to shove the whole lot into a pivot table and sort out what data I actually need displayed.
The problem I have is one of the data points I'm most in need of is times, and these all come in the form of '1 h 23 m' or '5 h 12 m' etc.
All that matters to me is the hour intervals, set to the next highest hour interval. So I don't care about '1 h 23 m' and '1 h 55 m', they're both just '2 hours' to me. I can't however, figure out how to tell excel to sort these into ranges of 0-1, 1.01-2, 2.01-3, etc.
I suspect part of the problem here is that entering times as 'X h Y m' is not an excel valid format, unfortunately that's how they are presented and I can't do anything about it.
Anyone know how to sort this mess out?
Posts
=LEFT(A1, FIND(" h", A1))+IF(VALUE(MID(A1, FIND("h", A1)+1, FIND("m", A1)-FIND("h", A1)-1))>0, 1, 0)
EDIT: That is to say, you would add a column/row, copy this formula in and fix the reference as needed, then paste it all the way down the column or across the row.
=IFERROR(IFERROR(IF(INT(MID(A1,LEN(A1)-3,2))>0,INT(LEFT(A1,LEN(A1)-7))+1,INT(LEFT(A1,LEN(A1)-7))),IF(INT(MID(A1,LEN(A1)-2,1))>0,INT(LEFT(A1,LEN(A1)-6))+1,INT(LEFT(A1,LEN(A1)-6)))),"Something Wrong")
I like IFERROR.
Thanks, Orogogus, for getting me to think a little deeper about this.
1 | h | 00 | m
3 | h | 45 | m
etc format.
Well now, that is a far more elegant solution. Good thinking.
=IFERROR(IFERROR(IF(INT(MID(A1,LEN(A1)-3,2))>0,INT(LEFT(A1,LEN(A1)-7))+1,INT(LEFT(A1,LEN(A1)-7))),IF(INT(MID(A1,LEN(A1)-2,1))>0,INT(LEFT(A1,LEN(A1)-6))+1,INT(LEFT(A1,LEN(A1)-6)))),"Something Wrong")
And it almost worked perfectly.
There seems to be some inconsistencies though, '9 h 20 m' appears (incorrectly) as 9, but '9 h 18 m' correctly appears as 10.
I also failed to mention times sub 1 hour just appear as XX m, which is also throwing it off. The '-' segments are fine because they aren't included in the count anyway.
Any ideas on fixing this? It's really close right now.
=IF(MID(A1,3,1)="h",LEFT(A1,2),"")
Blizzard: Pailryder#1101
GoG: https://www.gog.com/u/pailryder
The backstory behind this, btw, is that one of the departments where I work has been shitting the bed for 3 years and it just came to light in a big way. So a lot of people are getting fired and some people from other departments (like me) are being brought in to sort out 3 years of data that hasn't been processed. If I can get this conversion working though it will literally save me ~2 weeks of tedious data handling.
The iferror function looks for an excel result that gives a #N/A error due to a formula problem and returns a specific result if that error occurs. It's of the form of =iferror(function,result) where result only displays if the function errors out.
Also, a trick to make your spreadsheets look nice is to recognize that Excel looks at blanks as a double quote, so "" means blank. You can use that to write a function such as:
=if(A1="","",function)
This will generate a blank if the cell is blank, and the function otherwise.
Maybe just combine Heffling's solution with some basic math?
1) Use text-to-columns to sort out the data in hours and minutes.
2) Do a quick filter or pivot table of each column to ensure that there were no errors in the conversion process. This is highly likely given the errors you encountered with the formula - I'm betting the formatting was inconsistent or there are typos somewhere along the line that need to be resolved. Based on the errors you find, I'm betting you can do Find-Replace to get rid of the vast majority of them, and then deal with the rest by hand.
3) In another column, use Excel to multiply the hours column by 60 to convert them into minutes.
4) Add a new column again, and use Excel to add the converted hours column and the minutes column in to a single consolidated minutes value.
I feel like this process is important. It gives you a clean data value. As you can see, managing custom text strings in Excel is a pain in the ass. What you want is a single value that gives you everything you need that can then be easily converted into something else. If this is something your company needs to manage over the long term, they should find a way to stick with minutes as the underlying unit, if at all possible.
You can then use this consolidated minutes value in any number of ways that don't require you to parse out what someone meant when they put in "8 h 022 m" instead of "8h 22m" or "8 h 22m" within the formula itself.
An alternative thought: If you are intent on continuing the progress you've made and don't care to clean up the existing data and just want to move forward with the interval values, is to filter the column by errors so that you can look for their patterns. Then you can do a Find/Replace for each error pattern on the original data to clean it up so that it no longer returns errors.
For example, just replacing all the columns with no hour values with "1 h 00 m" so that they are read correctly.
If you have trailing spaces, you can use the TRIM function to get rid of the trailing spaces. E.g., =TRIM(A1)
IFERROR(A,B) means that if A gives you an error in excel, it returns B. So for a simple example, a formula "=A1/B1" will give you an error if B1 is 0. The formula "=IFERROR(A1/B1,"Cannot divide by 0")" will return A1/B1 so long as B1 isn't zero. If B1 is zero, the formula will return "Cannot divide by 0".
I didn't consider that you could just have minutes, but that makes sense. That can be solved by working in additional IF statements, perhaps.
=IFERROR(IFERROR(IFERROR(IF(INT(MID(A1,LEN(A1)-3,2))>0,INT(LEFT(A1,LEN(A1)-7))+1,INT(LEFT(A1,LEN(A1)-7))),IF(INT(MID(A1,LEN(A1)-2,1))>0,INT(LEFT(A1,LEN(A1)-6))+1,INT(LEFT(A1,LEN(A1)-6)))),IF(INT(LEFT(A1,LEN(A1)-2))>0,1,0)),"Something Wrong")
My official recommendation is to parse the data with Python first and then put it in Excel.
The TRIM fixed the times rounding down, and that revised IFERROR works on the sub 1 hour formats
Thanks so much guys, this is going to save me a lot of time that I can spend sorting out this dumpster fire instead of manually processing 3 years of data.