OpenAI’s updates of GPT-4o in April 2025 famously induced absurd levels of sycophancy: the model would agree with everything users would say, no matter how outrageous.
You claim to "argue that A/B testing will implicitly optimize models for user retention". I don't see where you make this argument. I agree that A/B testing will implicitly optimize for something, but how do we know that this something is what cashes out as user retention? Even explicitly given optimization functions often optimize for something different from the stated intention. Are you simply using "user retention" as a shorthand for "the implicit optimization target represented by the explicitly available metrics that the labs combine together in various different ways"?
i guess a benchmark like this would need evaluations on _real_ humans across many multi-turn conversations (emulating humans with LLMs doesn't seem very useful here)... this seems a bit difficult to arrange outside of an LLM company that has access to real users.
You claim to "argue that A/B testing will implicitly optimize models for user retention". I don't see where you make this argument. I agree that A/B testing will implicitly optimize for something, but how do we know that this something is what cashes out as user retention? Even explicitly given optimization functions often optimize for something different from the stated intention. Are you simply using "user retention" as a shorthand for "the implicit optimization target represented by the explicitly available metrics that the labs combine together in various different ways"?
i guess a benchmark like this would need evaluations on _real_ humans across many multi-turn conversations (emulating humans with LLMs doesn't seem very useful here)... this seems a bit difficult to arrange outside of an LLM company that has access to real users.