a red flag goes up if members of different groups—say, boys and girls—who have the same score on the test as a whole show markedly different performance on some items. This is a sign of possible bias in those items, which would undermine the validity of inferences for one of the groups.

