Use the Right Rating Scale for your Performance Reviews
In our talent management consulting group, we focus on the implementation of technology that matches our clients’ maturity level and business needs. For most of our clients the starting point is a web-delivered performance management tool that serves as the first introduction of talent management technology to all employees and the move from HR-centered to employee-centered talent management. During the implementation, we are often asked to provide the “best practices” performance ratings scale. When we are asked the “best practices” question, we normally seize on the opportunity to educate our clients on the meaning of “best practices” and why there is no best single solution, but that the best practice for one organization may not be the best practice for another. We find that the performance rating scale, however, is an exception to that rule.
Commonly Used Rating Scales
We have seen many disparate practices in rating scales, from a two-point scale to “percentage of achievement” scales with hundreds of data points.
These scales arise in organizational cultures that encourage binary thinking. Management is driving the message that employees either get the job done or they don’t. Those that do can stay; those that don’t must go. This may work well for a lawn service provider with one employee, but for an organization that employees more than one or two employees a great deal of the value of good people management is lost.
These scales arise in organizational cultures that want to differentiate high and low performers without imposing too much of a decision-making burden on managers. In my experience these organizations have been at the point where they are introducing automated talent management tools for the first time and are trying to keep it as simple as possible.
According to Steve Hunt, thought leader at SuccessFactors, Inc., a 3-point scale doesn’t allow for enough differentiation. The evaluator can only choose either neutral or best, which completely leaves out potential better scores, introducing inaccuracies. This rating scale often results in too many employees rated as neutral.
These scales also do not differentiate between employees who are disengaged, non-performing, or misplaced from those who are not yet at full productivity because they need development. They also do not differentiate between those who do more than expected and those who should be identified as top talent. Those who need development and top talent that must be engaged and developed toward leadership, in my studied opinion, are those who present the greatest opportunity for improve organizational performance. Although there may be informal differentiation in place, at the low end there is a greater risk of post-employment challenges and at the higher end a much greater risk of losing the talent that will lead the organization in the future.
Four and six point scales
None of the current literature or practices we reviewed makes a strong case for an even number of rating choices, but we frequently see this in practice. The most common rationale for four- and six-level scales is where an organization is attempting to compensate for a strong central tendency or force managers to “make a choice.” While it is true that managers who are not engaged in the process may have a tendency to rate employees “down the middle” to avoid notice or controversy, we believe that attempting to use a rating tool to change the tendency avoids the real issues. There may be many reasons why this “down the middle” rating occurs:
- An organizational culture or process where rating at the extremes requires strong justification but does not train, support, or coach managers to be able to make the justifications.
- A process where any rating other than “satisfactory” requires strong justification. This is especially true in organizational units where literacy requirements for managers are low, such as in skilled trades or unskilled labor.
- Disengaged managers.
- Unionized environments where there is a strong cultural bias toward equity with regard to performance.
- Merit systems where budget constraints force managers to the middle.
Dr. Hunt offers this advice: “A 4-point scale lacks a neutral midpoint, forcing evaluations to be lopsided towards one end or the other. Generally, evaluations veer toward the high end, resulting in excessive positive evaluations.”
There are many other considerations we have not listed here, but these are common. Since I am biased toward differentiation and organizational excellence, my thinking is that HR should address these issues head-on and solve them at their source rather than manipulating rating scales to remove the midpoint.
Five point scales
The most common use of ratings scales is for performance evaluation, and the most common scale in use is a five-point Likert-like scale. At first glance, this may seem like an acceptable practice – after all, we find Likert scales to be very useful in measuring accurate variances in opinions on a mass scale and for a very many other information-gathering purposes. When applied to individuals, however, this practice falls far short of optimum results when evaluating performance of job competencies. Take this example from DDI:
|5||Much more than acceptable||Significantly above criteria required for successful job performance|
|4||More than acceptable||Generally exceeds criteria relative to quality and quantity of behavior required|
|3||Acceptable||Meets criteria relative to quality and quantity of behavior required|
|2||Less than acceptable||Generally does not meet criteria relative to quality and quantity of behavior required|
|1||Much less than acceptable||Significantly below criteria required for successful job performance|
|Source: William C. Byham, Ph.D. and Reed P. Moyer, Ph.D.: Using Competencies to Build a Successful Organization, DDI, 1996|
This scale is used to value performance around a central “acceptable” level of performance. Dick Grote, considered by many to be the “godfather” of effective performance management, has written extensively on the perception of “average.” The breakdown in this central norm is that most of us perceive ourselves to be “above average” and individuals will be disappointed at being labeled “acceptable.” This about this – you are probably reading this because you want to expend some extra effort to improve your knowledge about performance management. Is that “acceptable” behavior? The term reeks of minimum.
The scale breaks down completely in the remaining levels. What is the exact meaning of “significantly above criteria?” What two individuals will agree exactly on the meaning of “generally exceeds criteria?”
I see two key factors in making a rating scale effective;
- Precision of language. The language used in defining levels of performance must be precise enough that there is no question in the mind of the rater, the rate, or any other party that the performance level is clearly differentiated from other levels.
- Valuing the midpoint. The organization culture and the language at the midpoint must clearly define the norm to be a highly valued contributor to the organizational effort. This assumes that the criteria for quality and quantity of behavior required have been specifically defined for the individual being evaluated. The concept of the midpoint being a highly valued performer must be reinforced in respect, relationships, opportunities and compensation. Being evaluated as a highly valued contributor will have little meaning if the individual is not treated as such.
Percent of achievement
Ratings scales based on percent of achievement require the rater to assign a percentage value to performance factors that are not quantifiable. Unless you have actual metrics, this can only be guesswork at best.
Recommended rating scale
Steve Hunt recommends a 5-point scale. He states, “The greatest advantage of using a 5-point rating scale is that it has a midpoint and allows for just enough differentiation without introducing scores that are too close to be of much value. In a 5-point scale, 3 is a neutral midpoint. One score up means better, two scores up means best.”
Once the midpoint is clearly defined, we can examine the next higher level and describe that level as a person who contributes more than what is required of a highly valued contributor. This will be a person who completes his/her own work then seeks out ways to help others complete their tasks, do their tasks better, or be more effective. It will also include those who volunteer (this is the key word) ideas and effort to improve organizational results. Once it is clearly established that these self-starters are valued more highly than valued contributors you will see a shift toward more employees being evaluated at this level, with a corresponding increase in employee engagement and productivity.
Next consider those who have a measurable and lasting impact on organizational results. These will be individuals who change the way business is done in such a way that the organizational norm of performance is shifted to a higher level. This becomes the basis for identifying your top talent and high potentials.
Now consider those who are not performing at the level of a highly valued performer but can be trained, coached, developed and encouraged to achieve that level. The clear differentiation between this level and the lowest rating is potential for improvement in terms of aptitude and attitude. These individuals will require focused attention to ensure that they know they are both valued and expected to improve.
The final rating is reserved for those who have demonstrated that they will not be successful. They should be moved out of the organization as soon as practicable bearing in mind that the reason for failure may be a job mismatch. The individual may be valuable in another role in the organization.
Careful attention to differentiating language gives us a scale such as this:
|5||Distinguished Contributor||Effects measurable and lasting improvements in organizational performance|
|4||Performance Improver||Contributes more than effective performance of essential functions and enhances the performance of self and others.|
|3||Valued Contributor||Performs all essential functions of the job effectively|
|2||Improving Contributor||Learning the essential functions of the job, or improving toward effective performance of all essential functions|
|1||Non-contributor||Not able or willing to perform the essential functions of the job|
Behaviorally Anchored Rating Scales (BARS)
Many organizations attempt to achieve a greater degree of accuracy by clearly defining the specific behaviors for the competency or behavioral dimension being rated and for each of the rating levels. The effectiveness of this approach is dependent on clear, easily understood language defining each competency or behavioral dimension. Common errors include:
- Overlapping definitions
- Specificity that excludes dimensions of some roles
- Lack of specificity that clearly defines behavior
- Confusing or unclear language
Byham and Moyer point out these problems associated with BARS:
- provides no better validity than a numerical rating scale;generally is more difficult to construct;
- is more difficult to keep current;
- can cause raters much confusion; and
- requires considerable training
While I agree that the complexity and therefore cost of using behaviorally anchored rating scales makes them a less desirable alternative than a numerical scale, I disagree for the reasons stated above that the DDI rating scale presented in Table 1 is appropriate for evaluating performance.
Erik Berggren’s team of analysis’s at SuccessFactors have provided us the following key points on the number of levels in rating scales and conclusions on rating scale quality. SuccessFactors has analyzed the performance data of thousands of customers and millions of users.
- The most accurate rating are associated with 5 or 7 point scales, but 7 point scales only tend to provide more accurate ratings with appropriate training.
- Rating scales are much more accurate if the scale includes the use of behavioral anchors that define what good and poor performance look like in terms of observable, well-defined behaviors. Avoid scales that use emotional adjectives that sound good but mean nothing like “passion”, “winning attitude” and “A player”.
- Providing rater training significantly increases the accuracy of ratings. Without training, ratings are often more a function of the person doing the rating than the person being rated.
Dr. Hunt adds: “When would a 7-point scale make sense? An optimal use for a 7-point scale is to treat the 1 and 7 as exclamation points which means a discussion should take place immediately and the need for action is imminent. The midpoint discussion of the 5-point scale still applies and score distribution will be heaviest on the 2-5 range. All employees who are rated a 1 are candidates to be aggressively worked out of the organization “there is SUCH a problem here we should probably act now”. Conversely, employees rated as a 7 are “so good – this person is one of the best I have seen; we should aggressively plan for this person’s future here”.
Next Steps – Calibration
A growing practice among the best of organizations is calibration. Dick Grote recommends calibration sessions in which managers are required to justify the ratings of the subordinates to their peers. These sessions can be seen as an opportunity for HR to serve as a trusted consultant to provide valuable insight into rater error without becoming the “evaluation police” as groups of managers take on the responsibility for fair and accurate evaluations.
At least for the present, performance evaluations are here to stay. With a well-written, clearly defined scale and effective calibration sessions can provide big dividends in manager engagement, employee engagement, and productivity.