← Back to API models

Grok 4.1 Fast

by xAI · speed tier · longest context on our ledger

The longest context window money can buy — 2M tokens — at one of the lowest prices on the table.

Input $0.20 / 1M

Output $0.50 / 1M

Context 2M tokens

Specialty Long context

xAI Platform ↗ Updated July 23, 2026

§ API pricing

Per-token rates.

Input

$0.20/1M tokens

Prompt tokens

Matches Mistral Small's rate
Filling the 2M window costs $0.40
Cheapest long-context read anywhere

Output

$0.50/1M tokens

Completion tokens

Lowest xAI output rate
Under Mistral Small's $0.60
Built for volume

Context

2Mtokens

Window

Longest on our entire ledger
2× GPT-5.5 and Fable 5
~1.5M words in one call

Speed

Ultra-fastlatency tier

Throughput

xAI's fastest model
Real-time product friendly
Pairs with Grok 4.20 for planning

Why Grok 4.1 Fast exists

One number defines this model: 2 million tokens of context — twice GPT-5.5 and Fable 5, eight times Claude's standard window. At $0.20 input, reading the entire window costs about forty cents. No other model on our table comes close on price-per-token-read.

The design trade is explicit in the name: Fast. This is a throughput model, not a deliberator — xAI's answer for "read everything, answer quickly" rather than "think hard."

Capabilities

Strong at retrieval-style work over enormous inputs: find, summarize, cross-reference, and extract from document sets that other models need chunking pipelines for. Speed makes it viable in interactive products despite the giant window.

The honest weakness: deep reasoning over what it reads. It will find the clause across 1.5M words; whether the clause's implications are subtle is Grok 4.20's job — or a Claude/GPT flagship's.

Typical use cases

Whole-corpus Q&A: contracts, discovery documents, log archives
Replacing chunking/RAG pipelines with single-call reads
High-volume summarization at speed
Real-time features needing big context and low latency
Cheap reader paired with a stronger planner model

Sibling and rival comparison

Model	Input / 1M	Output / 1M	Context
Grok 4.1 Fast	$0.20	$0.50	2M
Grok 4.20	$2	$6	2M
Grok Code Fast 1	$0.20	$1.50	256K
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M
DeepSeek V4-Flash	$0.14	$0.28	1M

Nothing else combines this window with this price — the 1M-window budget rivals (Gemini Flash-Lite, DeepSeek V4-Flash) offer half the context. If your bottleneck is "how much can the model see at once", Grok 4.1 Fast is currently the answer; if it's reasoning quality, step up to 4.20 or beyond.

← See all xAI / Grok plans