New platform for fair AI competition in science
The Paul Allen Institute for Artificial Intelligence has launched a new platform called SciArena. The link is in the description. It’s similar to Chatbot Arena but designed specifically for comparing neural networks in solving scientific problems. Now, for learning or research, you can get two verified answers for free, each with references to scientific sources.
How is model performance evaluated? The platform uses the AI2 ScholarQA search engine to find articles related to your query in the Semantic Scholar database. Then, two randomly selected models receive the same data: your question and the retrieved scientific papers. The AI must write a detailed response, backing up each claim with a citation.
Currently, 23 models from OpenAI, Google, Anthropic, Alibaba, and other companies are ranked in SciArena. Before the launch, 102 experts conducted over 13,000 matchups to build the initial leaderboard.
At present, OpenAI o3 leads the rankings. This model consistently delivers top results in all categories — from engineering to medicine. Also in the top three are Claude 4 Opus and Gemini 2.5 Pro. You can ask your question in Russian, but note that some models only respond in English.