Statistics problem

Jimothy · April 2011

I'm working on a bit of a side project at the moment, and it involves some math I'm very rusty on. I thought maybe the intelligent folks on these forums might be able to help out.

A company is trying to reduce their turn time, and they've got a spreadsheet of orders that have yet to be completed. The orders that they've had the longest have priority, and the newer an order is, the longer it is allowed to wait. In addition, new orders are going to come in throughout the day.

If they shift their focus so that some of the newer orders are completed immediately, while not allowing the oldest to get left out and drag them down, then their turn time can improve.

I need to first find a way to determine what the uppermost threshold is on older outlying orders (which ones need to be completed to prevent the turn time's average from being dragged down), and then come up with a formula (preferably something that could be used in Excel) to determine the right combination of newer and older orders to give new priority to in order to improve turn time.

Anyone have any good ideas? Or at least a starting point or useful reference? I can provide some specific numbers if needed.

Richy · April 2011

I don't have an exact algorithm at hand. But for what you're describing, you might want to read up on scheduling algorithms, like those they use to schedule tasks to run on CPUs in operating systems.

From what you're describing, it sounds like they had a FIFO scheduling system (first-in-first-out, the oldest task goes first). That one does have the worst turnaround time, in part because of the convoy effect (an extra-long task came in first, takes a lot of time to complete, meanwhile everything else in the system has to wait).

Which does raise a question: do all the orders the company gets take the same amount of time/work to complete, or are some longer while others can be done quickly and easily? In the second case, that's probably something you want to take into account in your scheduling algorithm.

Another question, while we're on it, is can you interrupt the execution of an order and get to work on another order? Or, once an order is started, does it have to be completed before you can move on to something else? That too will define what your scheduling function can be.

Getting back on topic, what you're describing is a priority-based scheduling system with aging. The priority is function of how recently an order was entered in the system, so newest orders have the highest priority, but aging means that older orders have their priority gradually increased to avoid starvation (an order that never gets done because there's always something with a higher priority ahead of it).

There's a number of ways of doing that. The simplest thing would be for the priority to be a function of the date and time the order was made. The most recent has the highest priority. Meanwhile, the aging would give a date-bonus to older tasks in function of how long they've been in, to make them look more recent than they really are.

An easy implementation that I can think of right off the bat would be for the priority to be the absolute time difference between the current time and the order time, with no regard for the date. Lowest absolute value is the highest priority. So if right now it's 3PM and I have an order that came in today at 2PM and one that came in yesterday at 4PM, they both have an equal 1h priority. Half an hour later, it's 3:30PM, the 2PM order has a priority of 1h30 and the yesterday 4PM one has a priority of 30 mins and wins. That kind of circular ordering works well with two days and if the orders came in during business hours; but for example an order that came in at midnight will always be a large time away and therefore a low priority. That's where date-based ageing might come in, giving for example a 2-hours reduction per day the order has been waiting (that figure was just pulled out of my ass). So the midnight task would have an 8h priority at 8am the next morning, but a 6h priority at 8am two days after it came in, and a 4h priority at 8am the day after that, 2h the next day, and 0h after five days, at which point it is guaranteed to run right away.

Jimothy · April 2011

I think I follow you pretty well on the absolute value part of that, but how would you make it account for the aging? Is it as simple as l (current time) - (order received) l - 2? Or is there another step in there that I'm missing?

I did a quick search for scheduling algorithms and came up with something called the Critical Path Algorithm. Does that sound like it might be in the right direction?

At any rate, thanks a ton. This seems like a big help.

Richy · April 2011

Yeah, the idea I described was simply l (current time) - (order received) l - 2*(days waiting). I can't guarantee it will work perfectly (in fact it probably won't since I don't know enough about your problem and some of these values I just threw in randomly as examples), but it's a starting point.

The Critical Path Algorithm is for scheduling activities in a project. The thing to note here is that activities in a project are related - some activities can only be done after other activities have been completed, some activities can be done concurrently with each other, some activities might be alternatives to each other. I don't know how this relates to your orders?

Jimothy · April 2011

In that case, I don't think it does. They would be pretty distinct.

Richy · April 2011

Project scheduling is certainly an interesting topic to read about and it might give you some ideas, but in that case it won't be directly applicable to your problem.

What I had in mind was CPU scheduling in operating systems - where the jobs being scheduled can be done in any order, but they must all be done, and they are independent of each other (except in some problem cases, like the convoy effect I already mentioned, or priority inversion or deadlocks, but I don't think those will happen in your system from what I understand). That said, reading about those will also be good to give you ideas but will not be directly applicable - modern OS are optimized for response time and interactiveness, not for throughput, and they assume that processes can be preempted, or interrupted at random intervals and resumed later on, which may not be your case.

Anyway, that's all I can say for now, without knowing more about your system.

Since you're doing this for a company, a simple thing you should try to help your development is get their work order spreadsheets for a few days, noting when orders came in and how long each order takes to complete. With that information you can make work simulations, and show how each scheduling algorithm you try changes the output of the company. But while you do that, be careful not to do the common mistake of over-specializing your system to your development data; in other words, of developing a scheduling algorithm that's perfectly fine-tuned to give the best output for the simulation you have, but is no longer general enough to work well for other work orders.

Penny Arcade

Quick Links

Statistics problem

Posts