Anthropic says its new Claude 3 AI chatbot scores better on key benchmarks than GPT-4

Claude 3’s most powerful ‘Opus’ model has ‘near-human’ abilities in some benchmarks, the company claims.

Steve Dent

Tue, Mar 5, 2024

Anthropic says its new Claude 3 AI chatbot scores better on key benchmarks than GPT-4 | DeviceDaily.com — Anthropic

The battle between AI chatbots is more than a two-horse race. Anthropic, the company formed by several ex-OpenAI employees, claims its new Claude 3 language model outperforms ChatGPT and Google’s Gemini in several key industry benchmarks. It even hit “near-human” levels on some tasks, the company wrote in a blog.

There are three new chatbots under the Claude 3 umbrella, including Haiku, Sonnet, and Opus. Sonnet powers the Claude.ai chatbot and is offered for free with an email sign-in. Meanwhile, Opus is the largest and most powerful LLM and will be available with a $20 per month subscription via the “Claude Pro” service. It’s also multi-modal, so it can work with both text and image inputs, unlike past versions.

All Claude 3 models “can power live customer chats, auto-completions and data extraction tasks where responses must be immediate and in real-time,” the company said. On top of promising “near-instant results,” they can supposedly handle longer, multi-step instructions with increased accuracy.

Opus showed better graduate-level reasoning than GPT-4, scoring 14.7 percent higher in that test than GPT-4. It also beat OpenAI’s chatbot in tasks involving math, coding, reasoning and knowledge.

They also top past Claude models. “For the vast majority of workloads, Sonnet is 2x faster than Claude 2 and Claude 2.1 with higher levels of intelligence. It excels at tasks demanding rapid responses, like knowledge retrieval or sales automation. Opus delivers similar speeds to Claude 2 and 2.1, but with much higher levels of intelligence,” according to Anthropic.

Meanwhile Haiku, the smallest version of Claude 3, is “the fastest and most cost-effective model on the market.” To that end, it’s capable of reading a dense research paper complete with charts and graphs in under three seconds.

The company also noted that Claude 3 “can process a wide range of visual formats, including photos, charts, graphs and technical diagrams,” aiding companies that use PDFs, flowcharts, or presentation slides. It’ll also be less likely to refuse harmless content thanks to a more nuanced understanding of requests, while still recognizing “real harm.”

Anthropic has said that Claude AI is guided by 10 secret foundational pillars of fairness. Claude 3 was trained on both nonpublic internal and public-facing data, using hardware from Amazon Web Services (AWS) and Google Cloud (Amazon recently invested $4 billion in Anthropic).

Claude 3 Opus and Claude 3 Sonnet are available now through Anthropic’s API, with Haiku set to follow soon. Sonnet is also accessible through Amazon Bedrock and in private preview on Google Cloud’s Vertex AI Model Garden.

Engadget is a web magazine with obsessive daily coverage of everything new in gadgets and consumer electronics

(13)

Anthropic Benchmarks Better Chatbot Claude GPT4 Says scores Than

Anthropic says its new Claude 3 AI chatbot scores better on key benchmarks than GPT-4

Anthropic says its new Claude 3 AI chatbot scores better on key benchmarks than GPT-4

Claude 3’s most powerful ‘Opus’ model has ‘near-human’ abilities in some benchmarks, the company claims.

Related