More on this book
Community
Kindle Notes & Highlights
We developed deliciously complicated formulae to make sure that if your rating was a hair higher than someone else’s, you’d be rewarded with a slightly higher raise.
But it didn’t matter. For all the time we spent on assigning ratings, when it came time to set raises or bonuses, managers or later reviewers changed the pay outcomes two-thirds of the time.
Our managers were spending thousands of hours every three months assigning ratings that were ludicrously precise but that weren’t a...
This highlight has been truncated due to consecutive passage length restrictions.
The same goes for measuring performance four ...
This highlight has been truncated due to consecutive passage length restrictions.
we found that we were spending up to twenty-four weeks each year either assigning ratings, calibrating ratings
a waste to have to review fifty thousand people so that you could find the five hundred who were struggling.
Ultimately, three things were clear:
Consensus was impossible.
In the absence of clear evidence, everyone became an expert and there were constituencies arguing ...
This highlight has been truncated due to consecutive passage length restrictions.
Even when making changes to the least popular process at Google, it was impossible to find a solution that made everyone happy.
It seemed that even though many people disliked the current system, they disliked every other option even more!
People took performance managemen...
This highlight has been truncated due to consecutive passage length restrictions.
The clearest trend was a desire for seriousness and clarity, not whimsy.
Experimentation was vital.
people screamed, people cried, people nearly quit.
because we afford Googlers so much freedom, because we are so data driven, and because Googlers care about fairness and how we treat one another,
changes like this are Hercule...
This highlight has been truncated due to consecutive passage length restrictions.
Every team we approached was frustrated with the current system, and every team was resist...
This highlight has been truncated due to consecutive passage length restrictions.
We were relieved to see that the loss of “precision” didn’t hurt us.
Some Googlers had worried that the loss of the precision conveyed by a 41-point rating scale would mean that our ratings would become less useful and meaningful.
Instead, Googlers’ survey responses revealed what we’d suspected all along: The forty-one points created only an illusion of precision.
Most Googlers admitted that for many ratings it wasn’t possible to distinguish w...
This highlight has been truncated due to consecutive passage length restrictions.
“This created the possibility that the ratings were neither reliable nor valid.
Managers would take the number and then ascribe real meaning to it, so if someone went from a 3.3 to a 3.5 it must be because they were improving,
when in reality they could have been performing at the same level.
And think of how much worse it would be if your rating dropped, and you were told it was because of your performance when in re...
This highlight has been truncated due to consecutive passage length restrictions.
Group B, despite having more performance labels, which they hoped would create more differentiation across people, actually had far less differentiation than Group A.
By simply having more rating categories to choose from, Group B unconsciously, inadvertently, and incorrectly decided that they have almost no star performers.
Without meaning to, they dropped 80 percent of their top performers (four out of five) out of their top performance category.
As of late 2013, it was still an experiment, but the early signs were good. First, it provided employees with more consequential feedback, replacing the murky differences between a 3.2 and a 3.3 rating. Second, it resulted in a wider performance distribution. As we shed performance rating categories, managers became more likely to use the extreme ends of the rating system.
having five categories was superior to having more in at least these two ways.
managers doubled their usage of the extremes of the rating system. Expanding the proportion of people receiving the top rating better reflected their actual performance
And reducing the stigma of being in the bottom performance category made it easier for managers to have direct, compassionate conversations with their struggling employees about how to improve.
On the other hand, the soul of performance assessment is calibration. It’s fair to say that without calibration, our rating process would be far less fair, trusted, and effective.
I believe that calibration is the reason why Googlers were twice as favorable toward our rating system as people at other companies were to theirs.
Google’s rating system was (and is) distinctive in that it isn’t just the direct man...
This highlight has been truncated due to consecutive passage length restrictions.
A manager assigns a draft rating to an employee—say, “exceeds expectations”—based on nailing OKRs but...
This highlight has been truncated due to consecutive passage length restrictions.
Before this draft rating becomes final, groups of managers sit down together and review all of their employees’ draft ratings together in a process we call calibration.
Calibration adds a step. But it is critical to ensure fairness. A manager’s assessments are compared to those of managers leading similar teams, and they review their employees collectively:
This allows us to remove the pressure managers may feel from employees to inflate ratings. It also ensures that the end results reflect a shared expectation of performance, since managers often have different expectations for their people and interpret performance standards in their own idiosyncratic manner—just
Calibration diminishes bias by forcing managers to justify their decisions to one another. It also increases perceptions of fairness among employees.
The power of calibration in assessing people for ratings is not that different from the power of having people compare no...
This highlight has been truncated due to consecutive passage length restrictions.
The goal is the same: to remove sources of i...
This highlight has been truncated due to consecutive passage length restrictions.
Even if you’re a small company, you’ll have better results, and happier employees, if assessments are based on a group discussion rath...
This highlight has been truncated due to consecutive passage length restrictions.
Even when calibrating, however, managers in group settings can make bad decisions. A host of errors in how we make decisi...
This highlight has been truncated due to consecutive passage length restrictions.
recenc...
This highlight has been truncated due to consecutive passage length restrictions.
is when you overweight a recent experience because it’s fr...
This highlight has been truncated due to consecutive passage length restrictions.
We addressed this by starting most calibration meetings with a single handout, describing the most common errors assessors make and how to fix them,
We’d start each calibration meeting by revisiting these errors.
simply focusing managers on these phenomena, even for a moment, was enough to eliminat...
This highlight has been truncated due to consecutive passage length restrictions.