This is really nice work, but isn't it making an assumption that humans are always consistent? I think giving expert chess players the rotated examples might well yield similar results, for example, unless the transformed examples were somehow obvious, like being presented consecutively.
Evaluating superhuman models with consistency checks
This is really nice work, but isn't it making an assumption that humans are always consistent? I think giving expert chess players the rotated examples might well yield similar results, for example, unless the transformed examples were somehow obvious, like being presented consecutively.