Frontier multimodal at a discount. Cheapest top-tier model below 200K tokens — tiered above.
For short and medium prompts, Gemini 3.1 Pro is the cheapest way to access frontier-class quality. At $2 input and $12 output per 1M tokens, it undercuts GPT-5.5 by roughly 60% and Opus 4.7 by about half. That price-to-quality ratio is why it sits inside the AI Pro consumer plan at $19.99/mo and why it shows up so often in cost-sensitive production stacks.
The catch is the tier break. Once your request crosses 200K tokens, billing jumps to $4 input and $18 output per 1M for the entire call. That still beats GPT-5.5's flat $5/$30, but if you do a lot of long-context work the gap narrows. Engineer your prompts to stay under 200K when you can — the savings are real.
Gemini 3.1 Pro is natively multimodal in a way the GPT and Claude families are not — it handles audio and video input as first-class citizens, alongside text and images. That makes it the default choice for transcription pipelines, video analysis, and any workflow that wants one model to read everything in a folder. Reasoning is competitive with GPT-5.4 on most benchmarks and improving fast.
The weakness is consistency. Gemini still shows more variance call-to-call than Opus or GPT-5.5, and the UX around tool use is less mature. For mission-critical agent loops, most teams pair Gemini with a stricter validator.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| Gemini 3.1 Pro (≤200K) | $2 | $12 | 1M |
| Gemini 3.1 Pro (>200K) | $4 | $18 | 1M |
| Gemini 3.5 Flash | $1.50 | $9 | 1M |
| GPT-5.5 | $5 | $30 | 1M |
Versus GPT-5.5 (the closest cross-family flagship), Gemini 3.1 Pro is meaningfully cheaper at every length. Versus its own Flash sibling, the gap closed after I/O 2026: Pro is only about 33% more expensive than 3.5 Flash, and Flash actually beats Pro on coding — so reach for Pro when you need the hardest reasoning or long-horizon planning, not for code.