Kimi-K2 with 1 trillion parameters surpassed GPT-4.1 in programming

Post Thumbnail

Chinese technology company Moonshot AI introduced a new player in the AI arena! Meet Kimi-K2. This is a large language model with open source code, ready to challenge recognized industry leaders like Claude Sonnet 4 and GPT-4.1. And such a loud and powerful start reminds of Deepseek’s appearance.

The technical specifications of this model are impressive. Kimi-K2 combines a colossal volume of knowledge and has 1 trillion parameters. The most important advantage is the open weight coefficients. Making the model accessible for research, additional tuning and adaptation to specific tasks.

The Kimi-K2-Instruct version, optimized for real-world application conditions, demonstrates exceptional results in standard tests. On the most difficult SWE-bench Verified test, it achieved 65.8% in agent mode. This indicator is only slightly inferior to Claude Sonnet 4, but significantly surpasses GPT-4.1.

Particularly impressive is that Kimi-K2 leads in specialized programming tests. LiveCodeBench with 53.7% and OJBench with 27.1%. The model generates any games, applications and plans trips through dozens of tools in the browser as an agent.

The model also brilliantly handles tasks in mathematics and natural sciences. Surpassing competitors in such difficult tests as AIME, GPQA-Diamond and MATH-500. And already now it’s part of the elite group of best models also in multilingual tests. And it seems this is the new king of neural networks right now.

Почитать из последнего
UBTech will send Walker S2 robots to serve on China's border for $37 million
Chinese company UBTech won a contract for $37 million. And will send humanoid robots Walker S2 to serve on China's border with Vietnam. South China Morning Post reports that the robots will interact with tourists and staff, perform logistics operations, inspect cargo and patrol the area. And characteristically — they can independently change their battery.
Anthropic accidentally revealed an internal document about Claude's "soul"
Anthropic accidentally revealed the "soul" of artificial intelligence to a user. And this is not a metaphor. This is a quite specific internal document.
Jensen Huang ordered Nvidia employees to use AI everywhere
Jensen Huang announced total mobilization under the banner of artificial intelligence inside Nvidia. And this is no longer a recommendation. This is a requirement.
AI chatbots generate content that exacerbates eating disorders
A joint study by Stanford University and the Center for Democracy and Technology showed a disturbing picture. Chatbots with artificial intelligence pose a serious risk to people with eating disorders. Scientists warn that neural networks hand out harmful advice about diets. They suggest ways to hide the disorder and generate "inspiring weight loss content" that worsens the problem.
OpenAGI released the Lux model that overtakes Google and OpenAI
Startup OpenAGI released the Lux model for computer control and claims this is a breakthrough. According to benchmarks, the model overtakes analogues from Google, OpenAI and Anthropic by a whole generation. Moreover, it works faster. About 1 second per step instead of 3 seconds for competitors. And 10 times cheaper in cost per processing 1 token.