O3 was trained via reinforcement learning to “think” before responding via what OpenAI describes as a “private chain of ...