Post Thumbnail

New platform for fair AI competition in science

The Paul Allen Institute for Artificial Intelligence has launched a new platform called SciArena. The link is in the description. It’s similar to Chatbot Arena but designed specifically for comparing neural networks in solving scientific problems. Now, for learning or research, you can get two verified answers for free, each with references to scientific sources.

How is model performance evaluated? The platform uses the AI2 ScholarQA search engine to find articles related to your query in the Semantic Scholar database. Then, two randomly selected models receive the same data: your question and the retrieved scientific papers. The AI must write a detailed response, backing up each claim with a citation.

Currently, 23 models from OpenAI, Google, Anthropic, Alibaba, and other companies are ranked in SciArena. Before the launch, 102 experts conducted over 13,000 matchups to build the initial leaderboard.

At present, OpenAI o3 leads the rankings. This model consistently delivers top results in all categories — from engineering to medicine. Also in the top three are Claude 4 Opus and Gemini 2.5 Pro. You can ask your question in Russian, but note that some models only respond in English.

Autor: AIvengo
For 5 years I have been working with machine learning and artificial intelligence. And this field never ceases to amaze, inspire and interest me.
Latest News
Altman's insomnia: the burden of responsibility for hundreds of millions of users

In an interview with Tucker Carlson, OpenAI CEO Sam Altman shared unprecedented revelations about moral dilemmas. Which he faces as head of one of the most influential technology projects of our time. "I haven't slept peacefully at night since ChatGPT's launch in 2022," Altman confessed in conversation with Tucker Carlson.

GPT-5 Codex vs Claude Code: free attack on Anthropic

OpenAI introduced GPT-5 Codex. A specialized version of their flagship model, completely reimagined for programming!

Grok 4 Fast operates 10x faster with 2 million token context

Grok 4 Fast enters the AI arena! Elon Musk's company introduced a revolutionary update to its flagship model. Available in early access for premium users. According to TestingCatalog, the newcomer functions 10x faster than standard Grok 4. While maintaining all advantages of the full reasoning model.

$200 USB cable transforms into autonomous AI hacker

Researchers from Palisade Research created a new cybersecurity threat. A modified USB cable that becomes a conduit for autonomous AI into computer systems. The $200 device contains a programmable microchip that loads a digital agent directly onto the target machine.

xAI lays off 500 annotators for Grok's expert specialization

A strategic pivot from xAI is emerging. The company is radically changing its approach to training its Grok language model! Elon Musk's team fired 500 universal annotators in one day. Instead, it's increasing the number of specialized AI tutors by 10 times.