I do hope the title is somewhat helpful, but I've basically got a question about z-scores VS raw scores when building Application scores to rank applicants for acceptance into our dental program.
What we do now is we judge applicants on three scores:
Academic Average is worth 65% of their Application Score
Dental Aptitude Test (DAT) is worth 15%
Interview scores is worth 20%
We add all the portions together and get their final Applicant score, which we rank against the other applicants.
Someone has mentioned that we may be doing things wrong and that we should be z-transforming all our scores before adding them together, because the way we do it now is like adding apples to oranges to grapefruits. This person also mentioned that there is the possibility that our weighting is not playing out the way we want it to and say it will--the DAT, for example, may be playing a larger part than we think.
Now, I sort of understand z-scores and deriving them, and I get how they can be used to determine who actually did "better" compared to his peers if you're looking at Bob's 97 in Geography and Sam's 93 in English. What I'm not getting is how the z-scores will actually help our process here, and it may just be that I'm not applying them correctly.
Posts
You could reasonably do the z-score for the DAT, since standardized tests have a published standard deviation and mean for national test-takers' performance. I don't think the z-score is practical for academic average and interview scores given the ridiculously small sample sizes (you only have access to the records of those applying, usually). You'd have no objective basis for the deviation/mean and thus could greatly overweight a relatively good/absolutely terrible academic record or distort the data distribution. I think.
Heh, "z-transform" means something entirely different, at least to us electrical engineers.
Yeah, it sounds like z-transforming and standardising mean different things to different groups (statisticians, engineers, psychologists, etc).
Right, so I could get a proper z-score for the DAT, and we can do that for our Interview because we have the whole population of marks. Should we be doing that, though? And if so, why? I don't see how it is necessary for our process, and it's that explanation that I've yet to hear from anyone who has been a proponent of z-transforming and all that.
I'll give you an example of what may be happening, since I couldn't think of how to explain it. For simplicity all scores are out of 100.
Say the average score on the DAT is 80, with a SD of 5. If someone scores a 90, that would put them in the 99th percentile. So if you weight their score, you'll do .15 * 90 = 13.5 for that component. But if you weight their percentile you'll give them ~ 14.9. A difference of 1.4 "points" in their application score.
See what I'm getting at? To use the same numbers, but switch it to the interview the difference would be 1.8 points (90*.2 vs 99 * .2). Again, the same numbers but switching to academic performance yields a difference of ~4.4 points.
I think this is what the person was trying to tell you.
If that's fine with you then you don't need to change anything, relative vs raw only really makes a difference if your data is weird in some way. If the applicant academic, DAT and interview scores each followed standard bell curves straddling 50% then there would be reason for you to switch from raw to relative.
Just because the Academic Average is worth 65% doesn't mean it always has to be the deciding factor.
Anyway, none of this is a very serious problem for anyone yet... it's just something that's come up and I've unofficially been tasked with finding out what the deal is and whether or not we should be changing our methods. I ran the numbers from the last admission cycle to compare what would have happened if we used z-scores instead of raw scores, and while there was some slight movement, there were only three out of thirty significant changes where someone else would have been offered admission before the person who was originally offered that spot.
The truth of the matter is your weighted score process isn't really all that sound, anyway, so it doesn't really matter. Unless the Academic Average and Interview Scores are placed in a systematically objective context and are also weighted (i.e. you have some established process by which scores are weighted appropriately and transformed into a standard distribution), the 15% weigh of the DAT isn't going to do much. (The irony here being that the DAT is probably the place in your overall scoring process where you can get the most objective data.) The fact that your own audit is showing only very slight movement would seem to back up my argument.
Ideally, the answer to your question is yes, you should be accounting for the fact that scores on the DAT will be distributed, and if that data is readily available you may as well take it into account. But you will probably get better mileage out of seriously evaluating the weighting in the first place, and ensuring that the other two parts of the overall score are being calculated appropriately for what you want to accomplish.
As someone who works in Human Capital Metrics for a living, I can tell you that, for a laundry list of reasons I won't get into here, interviews or even open-ended comment questions are often treated as qualitative data - that is, they are used to provide more insight into something that has been more objectively, quantitatively established. A good interview process would allow you to differentiate between two candidates who are, based on standard metrics, roughly equal according to your established criteria (aptitude tests, income level, years of experience, etc.). At it stands, it sounds like you're just having a bunch of people interview folks and then give them scores on some sort of scale. Unless you have rigorous training methods to establish equally-applied standards, a scoring system designed to push people towards the mean, and a whole lot of scoring data from each interviewer, you can never be sure that the interview process is methodologically sound.
The real questions you should be asking yourself are:
1. What is this scoring system intended to accomplish?
2. What criteria do we want this system to utilize (i.e., what factors should this system take into account, and how should they be weighted)?
3. Based on past admissions, how well does our current system match up to #1 and #2 based on class performance and admissions metrics? (Examples: If you want 20% Latino class composition, is that what you are getting each year? If you want students who average X score on Y test, is that how they are performing?)
For my two cents, I think your weighting scheme is way off, but I'm not going to pretend to know the specifics of your situation, so maybe it's perfectly adequate for what it is trying to do. Hope this helps!