← Back to API models

Gemini 3.1 Pro

by Google DeepMind · frontier multimodal · in AI Pro $19.99

Frontier multimodal at a discount. Cheapest top-tier model below 200K tokens — tiered above.

Input $2 / 1M
Output $12 / 1M
Context 1M tokens
Above 200K $4 / $18
Google AI Studio ↗ Updated June 10, 2026
§ API pricing

Per-token rates, with a tier break at 200K.

Input ≤200K
$2/1M tokens
Standard tier
  • Cheapest frontier-class input
  • 40% of Opus 4.8's input rate
  • Vision and audio billed as input tokens
Output ≤200K
$12/1M tokens
Standard tier
  • Cheapest frontier-class output
  • 40% of GPT-5.5 output
  • Includes thinking tokens
Above 200K
$4/$18 per 1M
Long-context tier
  • Kicks in once your prompt crosses 200K
  • Applies to the full request, not the overage
  • Still cheaper than GPT-5.5 at length
Context
1Mtokens
Window
  • Native multimodal: text, image, audio, video
  • Legacy Gemini 2.5 Pro: 2M window at $1.25/$10
  • Strong long-context recall in Google evals

Why Gemini 3.1 Pro is the value play

For short and medium prompts, Gemini 3.1 Pro is the cheapest way to access frontier-class quality. At $2 input and $12 output per 1M tokens, it undercuts GPT-5.5 by roughly 60% and Opus 4.8 by about half. That price-to-quality ratio is why it sits inside the AI Pro consumer plan at $19.99/mo and why it shows up so often in cost-sensitive production stacks.

The catch is the tier break. Once your request crosses 200K tokens, billing jumps to $4 input and $18 output per 1M for the entire call. That still beats GPT-5.5's flat $5/$30, but if you do a lot of long-context work the gap narrows. Engineer your prompts to stay under 200K when you can — the savings are real.

One generation note, since both remain on the price list: the older Gemini 2.5 Pro is still sold at $1.25/$10 per 1M with a 2M-token window — cheaper than 3.1 Pro and with twice the context, but a generation behind on reasoning and multimodal quality. Pick it only when raw window size matters more than model quality; otherwise the current generation is the better spend.

Capabilities

Gemini 3.1 Pro is natively multimodal in a way the GPT and Claude families are not — it handles audio and video input as first-class citizens, alongside text and images. That makes it the default choice for transcription pipelines, video analysis, and any workflow that wants one model to read everything in a folder. Reasoning is competitive with GPT-5.4 on most benchmarks and improving fast.

The weakness is consistency. Gemini still shows more variance call-to-call than Opus or GPT-5.5, and the UX around tool use is less mature. For mission-critical agent loops, most teams pair Gemini with a stricter validator.

Typical use cases

  • Cost-sensitive production chat and content generation
  • Video and audio analysis (native multimodal input)
  • Long-document workflows up to 1M tokens
  • Workspace-adjacent automations (Gmail, Docs, Drive)
  • NotebookLM-style research synthesis

Sibling and rival comparison

ModelInput / 1MOutput / 1MContext
Gemini 3.1 Pro (≤200K)$2$121M
Gemini 3.1 Pro (>200K)$4$181M
Gemini 3.5 Flash$1.50$91M
GPT-5.5$5$301M

Versus GPT-5.5 (the closest cross-family flagship), Gemini 3.1 Pro is meaningfully cheaper at every length. Versus its own Flash sibling, the gap closed after I/O 2026: Pro is only about 33% more expensive than 3.5 Flash, and Flash actually beats Pro on coding — so reach for Pro when you need the hardest reasoning or long-horizon planning, not for code.

← See all Google / Gemini plans