The longest context window money can buy — 2M tokens — at one of the lowest prices on the table.
One number defines this model: 2 million tokens of context — twice GPT-5.5 and Fable 5, eight times Claude's standard window. At $0.20 input, reading the entire window costs about forty cents. No other model on our table comes close on price-per-token-read.
The design trade is explicit in the name: Fast. This is a throughput model, not a deliberator — xAI's answer for "read everything, answer quickly" rather than "think hard."
Strong at retrieval-style work over enormous inputs: find, summarize, cross-reference, and extract from document sets that other models need chunking pipelines for. Speed makes it viable in interactive products despite the giant window.
The honest weakness: deep reasoning over what it reads. It will find the clause across 1.5M words; whether the clause's implications are subtle is Grok 4.20's job — or a Claude/GPT flagship's.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| Grok 4.1 Fast | $0.20 | $0.50 | 2M |
| Grok 4.20 | $2 | $6 | 256K |
| Grok Code Fast 1 | $0.20 | $1.50 | 256K |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M |
| DeepSeek V4-Flash | $0.14 | $0.28 | 1M |
Nothing else combines this window with this price — the 1M-window budget rivals (Gemini Flash-Lite, DeepSeek V4-Flash) offer half the context. If your bottleneck is "how much can the model see at once", Grok 4.1 Fast is currently the answer; if it's reasoning quality, step up to 4.20 or beyond.