Post Thumbnail

Google and Cambridge taught AI to think with images instead of text

Scientists from Google and Cambridge presented a fundamentally new approach to artificial intelligence work, called “Visual Planning”. The feature of this technology is that the model thinks not with text, but with images. Which is much closer to natural human thinking, especially when solving spatial and mathematical tasks.

Researchers published an article titled “Visual Planning: Let’s Think Only with Images”, where they described the process of training a model to navigate mazes using only visual thinking, without textual reasoning. This approach mimics people’s ability to think with schemes and pictures when solving complex tasks. Link in the description.

Model training proceeded in two stages. First, it was shown many images of mazes and taught to predict any possible next step. For example, if in the picture an agent is located in cell B, the model should generate a new image where the agent moves to one of the available neighboring cells.

At the second stage researchers applied reinforcement learning. The model received positive reward for a correct step, zero — for incorrect, and negative — for an inadmissible action. Thus, gradually it learned to choose optimal paths through the maze. Relying only on visual images.

Results exceeded expectations! Visual Planning surpasses even such an advanced model as Gemini 2.5 Pro think by one and a half to two times in efficiency of solving tasks requiring spatial thinking. I am in pleasant shock!

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

OpenAI and Anthropic researchers criticized xAI safety

Researchers from OpenAI, Anthropic and other organizations publicly criticized the safety culture at Elon Musk's xAI company. Calling it "reckless" and "completely irresponsible".

Meta invests hundreds of billions in superclusters with 5 GW power consumption

Mark Zuckerberg announced unprecedented investments in next-generation AI infrastructure! The company will invest hundreds of billions of dollars in creating server superclusters. Which will consume 1 to 5 GW each. This is power consumption at the level of an entire nuclear power plant!

Paradise for introverts: AI will talk to company employees

Google released a new feature and now artificial intelligence can call local companies on your behalf. To find out information about prices and service availability. You no longer need to pick up the phone yourself and talk to employees. This is exactly what an introvert's paradise looks like.

OpenAI combined ChatGPT, Deep Research and Operator in one agent

OpenAI company introduced ChatGPT Agent. A powerful combination of ChatGPT, Deep Research and Operator in a unified solution. The working principle is maximally simple. You set a goal, for example, send emails, create tables, buy tickets or book hotels. ChatGPT Agent independently breaks this goal into separate tasks, navigates to needed websites, searches for information and fills forms. Before critically important actions such as payment, publication or sending, the agent necessarily requests your confirmation.

Only 1 programmer in the world could beat OpenAI's AI

Imagine a world where artificial intelligence competes with the best programmers on the planet. Such a confrontation took place at the prestigious AtCoder World Tour Finals tournament. This is one of the most elite programming competitions in the world, where it's extremely difficult to get in.