LLM and agent tracing
Capture hierarchical traces for LLM calls, tool invocations, retrieval steps, and other application logic. Filter and inspect traces by user, session, cost, latency, or custom metadata.
Langfuse is an open-source AI engineering platform for tracing, evaluating, and improving LLM applications and agents with observability, prompt management, experiments, and human review.
Langfuse is an open-source AI engineering platform for tracing, evaluating, and improving LLM applications and agents. It combines observability, prompt management, experiments, and human annotation in one workflow so teams can move from prototype to production using real usage data.
The platform is built around OpenTelemetry-compatible tracing, native SDKs, and broad integrations, which lets teams capture LLM and non-LLM activity without locking into a single framework. Langfuse also supports cloud deployment and self-hosting, with the product and core features available under an MIT license.
Capture hierarchical traces for LLM calls, tool invocations, retrieval steps, and other application logic. Filter and inspect traces by user, session, cost, latency, or custom metadata.
Track multi-turn conversations as sessions and add user tracking for production debugging and usage analysis. Agents can also be represented as graphs for more complex workflows.
Manage prompts separately from application code with versioning, deployments by label, rollbacks, prompt caching, and playground testing. Prompt history and change tracking help teams review how prompts evolve.
Run evaluations with LLM-as-a-judge, heuristic functions, or human review on production data or experiments. Support for datasets, experiments, evaluation scores, and human annotation helps compare changes over time.
Monitor quality, cost, and latency through dashboards, alerts, and trace-linked metrics. This makes it easier to understand the impact of prompt or model changes on production behavior.
Connect through native SDKs, OpenTelemetry, proxy-based logging, APIs, exports, and more than 100 integrations. The platform also supports self-hosting and data portability.
Instrument production LLM applications to inspect traces, sessions, and user-level behavior when debugging latency, cost spikes, or unexpected outputs.
Manage prompts as versioned assets, deploy them by label, roll back changes, and compare prompt variants in the playground before shipping updates.
Run offline or online evaluations on datasets, then compare experiments with LLM-as-a-judge, heuristics, or human review to assess quality changes.
Create human annotation queues and review traces to build golden datasets or validate model behavior with collaborators.
Use the same platform across prototypes and production systems to connect instrumentation, experimentation, and iteration in one workflow.
Langfuse is built for teams that want to trace LLM and agent workflows, manage prompts, and evaluate outputs in one system. It supports tracing, prompt management, evaluation, experiments, and human annotation.
The source highlights native SDKs for Python and JavaScript, OpenTelemetry support, 100+ integrations, and options to capture traces through an LLM gateway such as LiteLLM. The product is designed to work with existing stacks rather than require a single framework.
Langfuse supports traces, sessions, user tracking, prompt versioning, prompt deployments, playground testing, experiments, evaluation scores, datasets, and human review workflows. The docs describe it as a connected workflow from prototype to production.
The pricing page shows a free Hobby plan, paid Cloud plans, and a self-hosted option. It also lists managed-cloud plans with longer retention, higher limits, and enterprise features such as SSO, SCIM, audit logs, and support options.
The product is designed for LLM applications and agents. The documentation emphasizes tracing LLM and non-LLM calls, production debugging, prompt iteration, evaluation, and quality, cost, and latency monitoring.
AakarDev AI helps teams manage AI provider access, project-level setups, logs, and analytics from one dashboard. It supports BYOK workflows and lists providers including OpenAI, Google Gemini, Anthropic, Groq, Mistral AI, and Perplexity AI.
Happycapy is a browser-based agent platform that lets users run Claude Code, manage skills, and delegate tasks inside a secure sandbox. It offers a free tier plus paid plans for more automation, email handoff, and larger workloads.
OpenAI is an AI research and deployment company centered on ChatGPT, the API, Platform tools, and Codex. The site helps individuals, developers, and businesses explore conversational AI, build with models, and follow product and research updates.
DDS Hub is an AI API platform for Claude and OpenAI-family workflows, with token-based pricing, model selection, and Claude Code setup guidance.
Devin Desktop is Windsurf’s desktop product for managing local and cloud agents from one workspace. It supports Mac, Windows, and Linux, with additional access through a JetBrains plugin and a local CLI.
LiteLLM offers an OpenAI-compatible way to call and manage 100+ LLMs via a Python SDK or proxy server, with routing, spend tracking, and multi-provider access.