ai benchmark - Search News

Allen Institute for AI challenges DeepSeek on key benchmarks with big new open-source AI model

Amid the industry fervor over DeepSeek, the Seattle-based Allen Institute for AI (Ai2) released a significantly larger version of its Tülu 3 AI model, aiming to further advance the field of open-source artificial intelligence and demonstrate its own techniques for enhancing the capabilities of AI models.

Digit · 1h

Leave Deepseek, China’s new AI model Kimi k1.5 also surpasses ChatGPT in key benchmarks

Moonshot AI's Kimi k1.5 outperforms OpenAI's GPT-4o and Claude 3.5 Sonnet in key areas, showcasing superior multimodal abilities.

TechCrunch on MSN · 3d

DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarks

Chinese AI lab DeepSeek has released an open version of DeepSeek-R1, its so-called reasoning model, that it claims performs as well as OpenAI’s o1 on certain AI benchmarks. R1 is available from the AI dev platform Hugging Face under an MIT license,

11hon MSN

Ai2 says its new AI model beats one of DeepSeek’s best

Move over, DeepSeek. Seattle-based nonprofit AI lab Ai2 has released a benchmark-topping model called Tulu3-405B.

'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?

A new academic benchmark aims to 'test the limits of AI knowledge at the frontiers of human expertise.' So far, these LLMs ...

3don MSN

How DeepSeek achieved its AI breakthrough, Benchmark partner Chetan Puttagunta explains

Chinese AI startup DeepSeek is sending tech stocks plunging as the market digests what its cheaper and more efficient model ...

7don MSN

Even some of the best AI can’t beat this new benchmark

The nonprofit Center for AI Safety and Scale AI have released a challenging new benchmark for frontier AI systems.

Alibaba’s Qwen2.5-Max challenges U.S. tech giants, reshapes enterprise AI

Alibaba's Qwen2.5-Max AI model sets new performance benchmarks in enterprise-ready artificial intelligence, promising reduced ...

Alibaba releases AI model it says surpasses DeepSeek

Max's release points to the pressure DeepSeek's meteoric rise in the past three weeks has placed on overseas rivals and ...

3don MSN

DeepSeek’s new model shows that AI expertise might matter more than compute in 2025

Created by DeepSeek, a Chinese AI startup that emerged from the High-Flyer hedge fund, their flagship model shows performance ...

Revealed AMD Ryzen AI Max “Strix Halo” benchmarks could be bad news for Nvidia

AMD has revealed new gaming benchmarks for the Ryzen AI Max "Strix Halo" APU via Wccftech, implying the integrated Radeon ...

20h

93% of IT leaders will implement AI agents in the next two years

Surpassing most projections for AI adoption, organizations are leveraging digital labor across all lines of business, according to a new report from MuleSoft and Deloitte Digital.

1don MSN

DeepSeek: everything you need to know about the AI that dethroned ChatGPT

Chinese startup DeepSeek has been taking the AI industry by storm with a new chatbot rivaling ChatGPT and Gemini that uses a ...

Hosted on MSN6d

OpenAI Accused of Manipulating Benchmark Results as Chinese Models Close AI Performance Gap

It was recently revealed that OpenAI secretly funded and accessed data related to the FrontierMath AI benchmark. The ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results