Diffusion-based reasoning
Mercury 2 uses diffusion-based generation rather than left-to-right token decoding, refining outputs in parallel over a small number of steps.
Mercury 2 is Inception’s fastest reasoning language model, built on diffusion rather than autoregressive decoding. It is presented as a production-oriented LLM for teams that need reasoning-level quality inside tight latency budgets.
The product is aimed at workflows where latency compounds across many steps, such as coding assistants, agentic loops, voice interfaces, and retrieval pipelines. Inception says Mercury 2 is available now through its API and chat experience, and that it is OpenAI API compatible for easier adoption in existing stacks.
Mercury 2 uses diffusion-based generation rather than left-to-right token decoding, refining outputs in parallel over a small number of steps.
The models page describes Mercury 2 as running at 1,000+ tokens per second on commercial NVIDIA GPUs, and the launch post cites 1,009 tokens per second on NVIDIA Blackwell GPUs.
The model is presented as OpenAI API compatible and usable as a drop-in replacement for existing LLM workflows.
Inception lists tunable reasoning, a 128K context window, native tool use, and schema-aligned JSON output for production workflows.
The launch post says Mercury 2 is designed to keep p95 latency, turn-to-turn behavior, and throughput stable under high concurrency.
Inception positions Mercury 2 as its most powerful model for complex applications where both performance and speed matter.
For autocomplete, next-edit suggestions, refactors, and interactive coding agents, Mercury 2 is positioned to keep suggestions responsive enough to stay in the developer’s flow.
For workflows that chain many inference calls, such as subagents or campaign optimization, lower per-call latency can change how many steps are practical to run.
For voice interfaces and real-time conversation systems, Inception frames Mercury 2 as a way to keep text generation aligned with natural speech cadence.
For multi-hop retrieval, reranking, and summarization pipelines, Mercury 2 can add reasoning without pushing the end-to-end latency outside a practical range.
Mercury 2 is available now through Inception’s API and chat interface. The site also says it is OpenAI API compatible, so it can be dropped into existing stacks without rewrites.
Inception positions Mercury 2 for latency-sensitive work such as coding and editing, agentic workflows, real-time voice and interaction, and search or RAG pipelines.
The product page lists tunable reasoning, a 128K context window, native tool use, and schema-aligned JSON output. It also emphasizes fast generation through diffusion-based parallel refinement.
The pricing page on the site was not available, but the models page shows Free, Developer, and Enterprise access paths. Free includes access to all models and 10 million free tokens; Developer is usage-based; Enterprise adds custom rate limits, SLA guarantees, security and privacy, and volume-based pricing.
The source materials do not list a formal integration catalog. They do say Mercury 2 is OpenAI API compatible and supported through libraries including AISuite, LiteLLM, and LangChain.
AakarDev AI helps teams manage AI provider access, project-level setups, logs, and analytics from one dashboard. It supports BYOK workflows and lists providers including OpenAI, Google Gemini, Anthropic, Groq, Mistral AI, and Perplexity AI.
Happycapy is a browser-based agent platform that lets users run Claude Code, manage skills, and delegate tasks inside a secure sandbox. It offers a free tier plus paid plans for more automation, email handoff, and larger workloads.
Nichesim 允许您在公开之前测试您的想法,模拟真实社区对您的产品、内容或活动的反应,涵盖成千上万的 AI 人物。
FreeLLMAPI 是一款兼容 OpenAI 的代理,通过单一 `/v1` 端点路由多个 LLM 提供商的免费额度请求,支持自动故障转移、加密密钥存储和内置管理面板,适合个人实验。
Rumora is a TikTok and YouTube comment marketing platform that helps brands discover relevant videos, generate and post comments, and track visibility over time. It offers subscription plans for individuals, teams, and larger organizations.
Agentset 是一个开源平台,用于在私有或内部知识库之上构建 AI 聊天和搜索体验。支持带引用的生产级 RAG、图像/图表/表格等多模态文档,并提供免费版、Pro 版和企业版方案。