Post Thumbnail

Offline tests showed worse results for AI models

Artificial intelligence has officially surpassed the average human in IQ. GPT-5 Pro showed results from 110 to 138 points, while the average human IQ is 100 points. But let’s understand what this really means.

The portal Tracking AI conducted large-scale testing of AI models. At the top were 2 versions of GPT-5 Pro. With computer vision enabled and disabled. Behind them follow Gemini 2.5 Pro, Claude Opus 4 and Grok 4.

But there’s an important nuance. Unlike humans, artificial intelligence isn’t limited in time and can make up to 10 attempts to solve each task. This was done to bypass security systems that sometimes block words like exam or training.

For testing they used 2 types of tasks. First — the official Mensa Norway test, 35 tasks in 25 minutes. Second — a special offline test, created from scratch and absent from the internet. This excluded the possibility of pre-training models on these tasks.

They tested models with computer vision enabled and disabled. In the second case, tasks were described completely in text. And you know what? Almost all models showed worse results on the offline test. This proves — quality benchmarks for artificial intelligence need to be created from scratch and protected from leaks to the internet.

Artificial intelligence formally surpassed humans in classical intelligence tests. But the comparison is incorrect. IQ tests measure a specific type of pattern recognition in which artificial intelligence is naturally strong. Plus 10 attempts without time limits – these are completely different testing conditions. It looks like these “researchers” are deliberately creating conditions to make artificial intelligence look smarter.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Thousands of people experienced a breakup with GPT-5 simultaneously

Imagine — thousands of people worldwide simultaneously experienced a breakup. They were dumped by one and the same partner — ChatGPT. After updating to GPT-5, the artificial intelligence began categorically rejecting any romantic feelings from users.

Anthropic introduces limits because of Chinese "hacker"

An incredible story about how one user hacked the artificial intelligence economy. And forced Anthropic to change the rules of the game for everyone.

Offline tests showed worse results for AI models

Artificial intelligence has officially surpassed the average human in IQ. GPT-5 Pro showed results from 110 to 138 points, while the average human IQ is 100 points. But let's understand what this really means.

How to create an infinite universe with one text prompt

Forget everything you knew about creating game worlds. Tencent just released the open-source model Hunyuan-GameCraft. Which generates interactive virtual worlds directly on your graphics card. Link in description. One text prompt — and you have an infinite universe.

How synchronization of 3 light sources protects against forgeries

Artificial intelligence has learned to create video fakes that are impossible to distinguish from reality. And this is a huge problem and question of trust in society. But scientists from Cornell University found a brilliant solution. They hid watermarks right in ordinary lighting.