← Back to API models

Grok 4.1 Fast

by xAI · speed tier · longest context on our ledger

The longest context window money can buy — 2M tokens — at one of the lowest prices on the table.

Input $0.20 / 1M
Output $0.50 / 1M
Context 2M tokens
Specialty Long context
xAI Platform ↗ Updated June 10, 2026
§ API pricing

Per-token rates.

Input
$0.20/1M tokens
Prompt tokens
  • Matches Mistral Small's rate
  • Filling the 2M window costs $0.40
  • Cheapest long-context read anywhere
Output
$0.50/1M tokens
Completion tokens
  • Lowest xAI output rate
  • Under Mistral Small's $0.60
  • Built for volume
Context
2Mtokens
Window
  • Longest on our entire ledger
  • 2× GPT-5.5 and Fable 5
  • ~1.5M words in one call
Speed
Ultra-fastlatency tier
Throughput
  • xAI's fastest model
  • Real-time product friendly
  • Pairs with Grok 4.20 for planning

Why Grok 4.1 Fast exists

One number defines this model: 2 million tokens of context — twice GPT-5.5 and Fable 5, eight times Claude's standard window. At $0.20 input, reading the entire window costs about forty cents. No other model on our table comes close on price-per-token-read.

The design trade is explicit in the name: Fast. This is a throughput model, not a deliberator — xAI's answer for "read everything, answer quickly" rather than "think hard."

Capabilities

Strong at retrieval-style work over enormous inputs: find, summarize, cross-reference, and extract from document sets that other models need chunking pipelines for. Speed makes it viable in interactive products despite the giant window.

The honest weakness: deep reasoning over what it reads. It will find the clause across 1.5M words; whether the clause's implications are subtle is Grok 4.20's job — or a Claude/GPT flagship's.

Typical use cases

  • Whole-corpus Q&A: contracts, discovery documents, log archives
  • Replacing chunking/RAG pipelines with single-call reads
  • High-volume summarization at speed
  • Real-time features needing big context and low latency
  • Cheap reader paired with a stronger planner model

Sibling and rival comparison

ModelInput / 1MOutput / 1MContext
Grok 4.1 Fast$0.20$0.502M
Grok 4.20$2$6256K
Grok Code Fast 1$0.20$1.50256K
Gemini 3.1 Flash-Lite$0.25$1.501M
DeepSeek V4-Flash$0.14$0.281M

Nothing else combines this window with this price — the 1M-window budget rivals (Gemini Flash-Lite, DeepSeek V4-Flash) offer half the context. If your bottleneck is "how much can the model see at once", Grok 4.1 Fast is currently the answer; if it's reasoning quality, step up to 4.20 or beyond.

← See all xAI / Grok plans