Z-Scores, Weighted Averages, and Application Rankings

LaOs · November 2008

I do hope the title is somewhat helpful, but I've basically got a question about z-scores VS raw scores when building Application scores to rank applicants for acceptance into our dental program.

What we do now is we judge applicants on three scores:

Academic Average is worth 65% of their Application Score
Dental Aptitude Test (DAT) is worth 15%
Interview scores is worth 20%

We add all the portions together and get their final Applicant score, which we rank against the other applicants.

Someone has mentioned that we may be doing things wrong and that we should be z-transforming all our scores before adding them together, because the way we do it now is like adding apples to oranges to grapefruits. This person also mentioned that there is the possibility that our weighting is not playing out the way we want it to and say it will--the DAT, for example, may be playing a larger part than we think.

Now, I sort of understand z-scores and deriving them, and I get how they can be used to determine who actually did "better" compared to his peers if you're looking at Bob's 97 in Geography and Sam's 93 in English. What I'm not getting is how the z-scores will actually help our process here, and it may just be that I'm not applying them correctly.

grungebox · November 2008

I am not a statistician or statistics expert.

You could reasonably do the z-score for the DAT, since standardized tests have a published standard deviation and mean for national test-takers' performance. I don't think the z-score is practical for academic average and interview scores given the ridiculously small sample sizes (you only have access to the records of those applying, usually). You'd have no objective basis for the deviation/mean and thus could greatly overweight a relatively good/absolutely terrible academic record or distort the data distribution. I think.

Heh, "z-transform" means something entirely different, at least to us electrical engineers.

LaOs · November 2008

grungebox wrote: »

I am not a statistician or statistics expert.

You could reasonably do the z-score for the DAT, since standardized tests have a published standard deviation and mean for national test-takers' performance. I don't think the z-score is practical for academic average and interview scores given the ridiculously small sample sizes (you only have access to the records of those applying, usually). You'd have no objective basis for the deviation/mean and thus could greatly overweight a relatively good/absolutely terrible academic record or distort the data distribution. I think.

Heh, "z-transform" means something entirely different, at least to us electrical engineers.

Yeah, it sounds like z-transforming and standardising mean different things to different groups (statisticians, engineers, psychologists, etc).

Right, so I could get a proper z-score for the DAT, and we can do that for our Interview because we have the whole population of marks. Should we be doing that, though? And if so, why? I don't see how it is necessary for our process, and it's that explanation that I've yet to hear from anyone who has been a proponent of z-transforming and all that.

Cauld · November 2008

I'm no statistician either, but if this is a problem for you then I would suggest that you try it out using z-scores and see what changes.

I'll give you an example of what may be happening, since I couldn't think of how to explain it. For simplicity all scores are out of 100.

Say the average score on the DAT is 80, with a SD of 5. If someone scores a 90, that would put them in the 99th percentile. So if you weight their score, you'll do .15 * 90 = 13.5 for that component. But if you weight their percentile you'll give them ~ 14.9. A difference of 1.4 "points" in their application score.

See what I'm getting at? To use the same numbers, but switch it to the interview the difference would be 1.8 points (90*.2 vs 99 * .2). Again, the same numbers but switching to academic performance yields a difference of ~4.4 points.

I think this is what the person was trying to tell you.

LaOs · November 2008

Okay, that makes sense, but why is it better to use their relative result rather than their raw result for the weighted average making up the application score?

Dman · November 2008

basically it depends how happy you are with your weightings of 65,15,20. If everyone applying has say academic averages of 90-95 the difference in pts between the best and the worse academically is very little, maybe everyone knows how to do an interview also and get 90-95 in the interview process. Suddenly THE deciding factor of who has the highest scores is the DAT (scores range from 55-95).

If that's fine with you then you don't need to change anything, relative vs raw only really makes a difference if your data is weird in some way. If the applicant academic, DAT and interview scores each followed standard bell curves straddling 50% then there would be reason for you to switch from raw to relative.

LaOs · November 2008

So, then it would just happen that the DAT (15%) was where the difference was between those two applicants. Why might that not be a good thing?

Just because the Academic Average is worth 65% doesn't mean it always has to be the deciding factor.

Anyway, none of this is a very serious problem for anyone yet... it's just something that's come up and I've unofficially been tasked with finding out what the deal is and whether or not we should be changing our methods. I ran the numbers from the last admission cycle to compare what would have happened if we used z-scores instead of raw scores, and while there was some slight movement, there were only three out of thirty significant changes where someone else would have been offered admission before the person who was originally offered that spot.

Inquisitor77 · November 2008

Methodologically speaking, it sounds like you're quibbling over the the size of the fender when the engine doesn't work. Yeah, ok, that's a little over the top, but I think you get the point.

The truth of the matter is your weighted score process isn't really all that sound, anyway, so it doesn't really matter. Unless the Academic Average and Interview Scores are placed in a systematically objective context and are also weighted (i.e. you have some established process by which scores are weighted appropriately and transformed into a standard distribution), the 15% weigh of the DAT isn't going to do much. (The irony here being that the DAT is probably the place in your overall scoring process where you can get the most objective data.) The fact that your own audit is showing only very slight movement would seem to back up my argument.

Ideally, the answer to your question is yes, you should be accounting for the fact that scores on the DAT will be distributed, and if that data is readily available you may as well take it into account. But you will probably get better mileage out of seriously evaluating the weighting in the first place, and ensuring that the other two parts of the overall score are being calculated appropriately for what you want to accomplish.

As someone who works in Human Capital Metrics for a living, I can tell you that, for a laundry list of reasons I won't get into here, interviews or even open-ended comment questions are often treated as qualitative data - that is, they are used to provide more insight into something that has been more objectively, quantitatively established. A good interview process would allow you to differentiate between two candidates who are, based on standard metrics, roughly equal according to your established criteria (aptitude tests, income level, years of experience, etc.). At it stands, it sounds like you're just having a bunch of people interview folks and then give them scores on some sort of scale. Unless you have rigorous training methods to establish equally-applied standards, a scoring system designed to push people towards the mean, and a whole lot of scoring data from each interviewer, you can never be sure that the interview process is methodologically sound.

The real questions you should be asking yourself are:

1. What is this scoring system intended to accomplish?

2. What criteria do we want this system to utilize (i.e., what factors should this system take into account, and how should they be weighted)?

3. Based on past admissions, how well does our current system match up to #1 and #2 based on class performance and admissions metrics? (Examples: If you want 20% Latino class composition, is that what you are getting each year? If you want students who average X score on Y test, is that how they are performing?)

For my two cents, I think your weighting scheme is way off, but I'm not going to pretend to know the specifics of your situation, so maybe it's perfectly adequate for what it is trying to do. Hope this helps!

Penny Arcade

Quick Links

Z-Scores, Weighted Averages, and Application Rankings

Posts