Mercury 2 icon

Mercury 2

Mercury 2 is Inception’s diffusion-based reasoning language model for production AI workflows that need low latency, OpenAI API compatibility, and structured output support. It is aimed at coding, agentic, voice, and retrieval-heavy applications.

Mercury 2

What Mercury 2 is

Mercury 2 is Inception’s fastest reasoning language model, built on diffusion rather than autoregressive decoding. It is presented as a production-oriented LLM for teams that need reasoning-level quality inside tight latency budgets.

The product is aimed at workflows where latency compounds across many steps, such as coding assistants, agentic loops, voice interfaces, and retrieval pipelines. Inception says Mercury 2 is available now through its API and chat experience, and that it is OpenAI API compatible for easier adoption in existing stacks.

Key capabilities

Diffusion-based reasoning

Mercury 2 uses diffusion-based generation rather than left-to-right token decoding, refining outputs in parallel over a small number of steps.

High-throughput generation

The models page describes Mercury 2 as running at 1,000+ tokens per second on commercial NVIDIA GPUs, and the launch post cites 1,009 tokens per second on NVIDIA Blackwell GPUs.

Drop-in API compatibility

The model is presented as OpenAI API compatible and usable as a drop-in replacement for existing LLM workflows.

Production-oriented output controls

Inception lists tunable reasoning, a 128K context window, native tool use, and schema-aligned JSON output for production workflows.

Latency-focused design

The launch post says Mercury 2 is designed to keep p95 latency, turn-to-turn behavior, and throughput stable under high concurrency.

General-purpose reasoning model

Inception positions Mercury 2 as its most powerful model for complex applications where both performance and speed matter.

Where Mercury 2 fits

  • Coding and editing

    For autocomplete, next-edit suggestions, refactors, and interactive coding agents, Mercury 2 is positioned to keep suggestions responsive enough to stay in the developer’s flow.

  • Agentic loops

    For workflows that chain many inference calls, such as subagents or campaign optimization, lower per-call latency can change how many steps are practical to run.

  • Real-time voice and interaction

    For voice interfaces and real-time conversation systems, Inception frames Mercury 2 as a way to keep text generation aligned with natural speech cadence.

  • Search and RAG pipelines

    For multi-hop retrieval, reranking, and summarization pipelines, Mercury 2 can add reasoning without pushing the end-to-end latency outside a practical range.

Pros and Cons

Pros

  • Designed for real-time or near-real-time applications where latency matters at every step.
  • Uses diffusion-based generation to produce multiple tokens in parallel instead of sequential decoding.
  • Supports tunable reasoning, tool use, structured JSON output, and a 128K context window.
  • Presented as OpenAI API compatible, which can reduce migration work for existing teams.
  • Supported by documented use cases across coding, agents, voice, and search workflows.

Cons

  • The public pricing page was unavailable, so plan details beyond the site’s pricing references are not fully documented on the source page.
  • The source materials do not provide a formal integration directory or deployment matrix.

FAQ

How do teams access Mercury 2?

Mercury 2 is available now through Inception’s API and chat interface. The site also says it is OpenAI API compatible, so it can be dropped into existing stacks without rewrites.

What is Mercury 2 best suited for?

Inception positions Mercury 2 for latency-sensitive work such as coding and editing, agentic workflows, real-time voice and interaction, and search or RAG pipelines.

What capabilities are described on the product page?

The product page lists tunable reasoning, a 128K context window, native tool use, and schema-aligned JSON output. It also emphasizes fast generation through diffusion-based parallel refinement.

What pricing options are shown on the site?

The pricing page on the site was not available, but the models page shows Free, Developer, and Enterprise access paths. Free includes access to all models and 10 million free tokens; Developer is usage-based; Enterprise adds custom rate limits, SLA guarantees, security and privacy, and volume-based pricing.

Which integrations are documented?

The source materials do not list a formal integration catalog. They do say Mercury 2 is OpenAI API compatible and supported through libraries including AISuite, LiteLLM, and LangChain.

Quick Facts

Category
AI model / developer tool
Product
Mercury 2
Company
Inception
Platform
API and chat
Compatibility
OpenAI API compatible
Context window
128K

Mercury 2 Alternativen

OpenAI icon

OpenAI

OpenAI is an AI research and deployment company centered on ChatGPT, the API, Platform tools, and Codex. The site helps individuals, developers, and businesses explore conversational AI, build with models, and follow product and research updates.

AakarDev AI icon

AakarDev AI

AakarDev AI helps teams manage AI provider access, project-level setups, logs, and analytics from one dashboard. It supports BYOK workflows and lists providers including OpenAI, Google Gemini, Anthropic, Groq, Mistral AI, and Perplexity AI.

DDS Hub icon

DDS Hub

DDS Hub is an AI API platform for Claude and OpenAI-family model workflows, with token-based pricing, model selection, and Claude Code setup guidance. It is aimed at developers who want API access, usage-based billing, and basic troubleshooting in one place.

LiteLLM icon

LiteLLM

LiteLLM provides an OpenAI-compatible way to call and manage 100+ LLMs through a Python SDK or proxy server. It helps teams route requests, track spend, and work across multiple providers from one interface.

AI Writers Assistants icon

AI Writers Assistants

AI Writers Assistants is a token-based AI writing and content platform for drafting text, generating keywords, chatting, coding, and paraphrasing. It suits users who want several AI workflows in one place rather than separate tools.

NavtoAI API icon

NavtoAI API

NavtoAI API is a unified AI API gateway that lets developers and teams route requests across 200+ models through one account and one API shape. The collected pages also show API key usage lookup, routing controls, and centralized management for keys, quota, billing, users, and observability.

Mercury 2 - AI Tool, Features, Use Cases & Alternatives | Findings24