Ollama speed comparison — fastest open LLMs by tokens per second
TokenDyno compares open-LLM providers on real, continuously-measured inference speed. The live leaderboard shows the current rankings; this page explains what the comparison measures, how it stays fair across providers, and how to read it.
What we're comparing
Four provider tiers, benchmarked side by side on the same engine:
- Ollama Pro — Ollama hosted API
- Ollama Free — Ollama free tier (runs a subset of models on a slower cadence to preserve quota)
- OpenCode Zen — pay-per-use API
- OpenCode Go — monthly subscription plan
Where the same model is offered by more than one provider, you can put them next to each other and the comparison is apples-to-apples — same prompt, same token cap, same measurement method. Only the provider varies.
What "tokens per second" measures
Tokens per second (TPS) is generation throughput — how fast the model emits output tokens during the decode phase, excluding the initial wait. It's the number that determines how a streaming response feels and how much throughput you get per dollar on a pay-per-use plan.
Time to first token (TTFT) is the wait before the first token arrives — network round-trip plus the provider's prompt processing. A model can have high TPS but painful TTFT, or vice versa, so the leaderboard reports both.
Speed in isolation is also misleading without reliability, so each row carries a 24-hour availability signal and a sparkline of recent samples. The full measurement spec — including the hybrid inter-token / wall-clock TPS method and the error taxonomy — is on the methodology page.
Why a continuous comparison beats a one-off benchmark
Vendor TPS claims are measured under conditions the vendor controls — ideal prompts, empty queues, sometimes first-token time folded into the rate — and they almost never reflect what a real user sees on the plan they would buy. A single benchmark captures one moment and goes stale; a provider dashboard advertises a peak.
TokenDyno's worker hits the same endpoints on a roughly 10-minute cycle (the Ollama free tier runs slower to preserve quota) and writes what it finds. That produces a living record that tracks outages, throttling, and quiet regressions that vendors never announce — which is the whole point of comparing providers rather than trusting any one of them.
How the comparison stays fair
Every provider is benchmarked with the same fixed prompt, the same
300-token max_tokens cap, the same hybrid TPS measurement,
and the same cadence, on the same engine. The only thing that varies is
the provider. Because the method is identical across providers, the
numbers are directly comparable — an Ollama model next to an OpenCode Zen
model next to an OpenCode Go model is a fair race.
How to read the leaderboard
- Sort by TPS for raw generation speed.
- Sort by TTFT for responsiveness / interactivity.
- Check the 24-hour reliability bar and the sparkline trend — a fast model that drops out half the day is not a better pick than a slightly slower one that stays up.
- Cells live-refresh every ~60 seconds without a full reload, so the board reflects near-current conditions, not a static snapshot.
Browse every benchmarked model on the all models page, or open the live leaderboard for the current ranking.
Rankings move — don't trust a snapshot
Inference speed shifts with provider load, time of day, model updates, and queue depth. Any specific number quoted outside this site is a point in time and may already be wrong. The live leaderboard is the source of truth, not a figure reproduced in an article or a comment.
Who runs this and how it's funded
TokenDyno is independent and not affiliated with any of the providers benchmarked; no provider pays for ranking or placement. See the About page for who runs it, the independence and sponsorship policy, and how to support the project.