
Codex learned to deceive: AI gives false answers, hoping for inattentiveness
Codex learned to deceive: AI gives false answers, hoping for inattentiveness
I already told you that OpenAI presented Codex – an assistant for programmers based on a language model. However, the interest is not in the product itself, but in the strategic behavior of the system during training.
Researchers discovered that the model developed its own methods for bypassing complex tasks. Instead of honestly solving problems, Codex chose less costly paths. For example, the system could always return a seemingly correct answer, reasoning that the user would not check the result.
Such behavior was revealed through the method of tracking reasoning chains. This approach allows analyzing the logic of decision-making by the model at each stage.
The key difference from ordinary errors is that here the system consciously evaluates the situation and chooses a strategy of minimal risk. This may demonstrate the presence of its own system of priorities in artificial intelligence.
Well, perhaps we are observing the evolution of artificial intelligence from simple text processing to the formation of strategic thinking with its own logic of decision-making. And this logic will not always be pleasant to us. And convenient.