← Back to API models

Gemini 3.5 Flash

by Google DeepMind · smarter multimodal Flash

The new mid-tier Gemini, launched at I/O 2026. Beats 3.1 Pro on coding, keeps the 1M context, and lands at roughly three-quarters of Pro's price. Cheap volume work moved down to Flash-Lite.

Input $1.50 / 1M
Output $9 / 1M
Context 1M tokens
Tier Mid (smarter)
Google AI Studio ↗ Updated May 20, 2026
§ API pricing

Per-token rates.

Input
$1.50/1M tokens
Prompt tokens
  • About 25% under 3.1 Pro's $2 input
  • Vision, audio, video billed as input
  • Tripled vs the old 3 Flash rate of $0.50
Output
$9/1M tokens
Completion tokens
  • 25% below 3.1 Pro's $12 output
  • Still 3× cheaper than GPT-5.5 output
  • Includes thinking tokens
Context
1Mtokens
Window
  • Full 1M with no tier break
  • Native multimodal across the window
  • Matches 3.1 Pro on length
Flash-Lite
$0.25/$1.50 per 1M
3.1 Flash-Lite
  • Now the cheap-volume tier
  • Same 1M window
  • Use it for classification and re-ranking

What Flash is for

Gemini 3.5 Flash, launched at Google I/O 2026, is not the cheap-volume Flash of last year. The old $0.50/$3 rate is gone; the new tier sits at $1.50 input and $9 output per 1M tokens with a flat 1M context. The right way to read it: this is the practical default for most production work — coding-strong, multimodal, frontier-length — at roughly three-quarters of Gemini 3.1 Pro's cost. If you genuinely need cheap volume, you should be on Flash-Lite or Gemini 2.5 Flash now, not here.

What you get for the price increase is real. Google's headline claim at I/O is that 3.5 Flash beats 3.1 Pro on coding benchmarks — which is the kind of inversion that doesn't happen often in a Pro/Flash split and is worth taking seriously when you're picking a model for an agent loop or an IDE assistant.

Capabilities

Flash inherits the native multimodality of the Gemini family: text, image, audio, and video all go in. The big change in 3.5 is reasoning — particularly on code generation and code editing, where Google's own evals put it above 3.1 Pro. General reasoning is up too, though Pro still wins on the hardest single-shot questions and long-horizon planning. Tool use is more reliable than in 3 Flash but still less mature than Anthropic's or OpenAI's.

The expected tradeoffs: the price tripled on input and output, so this is no longer the model you reach for to drive a free-tier chat product. Google's own free tier still defaults to Gemini 2.5 Flash for exactly that reason.

Typical use cases

  • Coding assistants and IDE integrations — the headline use case
  • Production chat where quality matters more than per-call cost
  • Agent loops and tool-using workflows that were borderline on 3 Flash
  • Long-document Q&A and contract review with native multimodal input
  • Video and audio analysis where 1M context plus reasoning both matter

Sibling and rival comparison

ModelInput / 1MOutput / 1MContext
Gemini 3.5 Flash$1.50$91M
Gemini 3.1 Flash-Lite$0.25$1.501M
Gemini 3.1 Pro$2$121M
GPT-5.4 mini$0.25$2272K

3.5 Flash now sits close enough to 3.1 Pro that the choice is mostly about workload shape: pick Pro for the hardest reasoning, pick 3.5 Flash for coding and most everything else. Flash-Lite is the genuine cheap tier — about a sixth of the input cost — and is where you should send classification, re-ranking, and bulk summarization. GPT-5.4 mini is cheaper still but loses on context length and on multimodal input.

← See all Google / Gemini plans