Post Thumbnail

Salesforce test: Gemini 2.5 Pro solves only 58% of business tasks

The Salesforce CRMArena-Pro test shows that even leading artificial intelligence models face serious limitations when solving everyday business tasks.

Imagine: the flagship model Gemini 2.5 Pro successfully handles only 58% of requests with a single query. And what happens with multi-step dialogue? Efficiency rapidly drops to 35%!

CRMArena-Pro tests large language models under real conditions of sales, customer service and pricing. Researchers created 4280 unique tasks across 19 types of business operations using synthetic Salesforce data.

Particularly revealing are the results in multi-step dialogues — a key element of any business interaction. Almost half of Gemini 2.5 Pro’s failed attempts are related to inability to request critically important information. Models that ask more clarifying questions demonstrate significantly better results.

The highest performance was achieved in automating simple workflows — 83% success in routing support service requests. However, tasks requiring deep text understanding or following complex rules remain a serious challenge for modern artificial intelligence technologies.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.

Latest News

Imagry created a drone without HD maps

Imagry company created a unique autonomous control technology that works without HD maps. This engineering marvel is based on a bio-inspired approach that mimics human perception and decision-making. Imagine — the system sees the road with eye-cameras and makes decisions with a brain-neural network, exactly like an experienced driver!

US Department silently replaced report with AI-fake quotes

The US Department of Health and Human Services report on chronic childhood diseases found itself at the center of a scientific scandal. Experts discovered that the document "Make Our Children Healthy Again Assessment" contains falsified quotes and non-existent studies.

Meta AI blocks thousands of Facebook groups for "terrorism"

Earlier I told you that Meta's artificial intelligence was massively blocking Instagram accounts. Now it's blocking Facebook groups too. An unprecedented wave of blocks is happening. Thousands of Facebook groups fell under unfounded sanctions from the moderation system. This technical collapse affected communities both in the USA and abroad, covering the most diverse thematic categories.

Google released Gemini CLI: AI agent for code

Google company presented Gemini CLI. This is an official agent for using artificial intelligence to write code. Directly from the command line. And this tool provides access to all capabilities of the Gemini 2.5 Pro model.

Salesforce test: Gemini 2.5 Pro solves only 58% of business tasks

The Salesforce CRMArena-Pro test shows that even leading artificial intelligence models face serious limitations when solving everyday business tasks.