Gemini 3.5 Flash

Name: Gemini 3.5 Flash
Brand: Google DeepMind
Price: 1.50 USD

by Google DeepMind · smarter multimodal Flash

The new mid-tier Gemini, launched at I/O 2026. Beats 3.1 Pro on coding, keeps the 1M context, and lands at roughly three-quarters of Pro's price. Cheap volume work moved down to Flash-Lite.

Input $1.50 / 1M

Output $9 / 1M

Context 1M tokens

Tier Mid (smarter)

Google AI Studio ↗ Updated July 9, 2026

§ API pricing

Per-token rates.

Input

$1.50/1M tokens

Prompt tokens

About 25% under 3.1 Pro's $2 input
Vision, audio, video billed as input
Tripled vs the old 3 Flash rate of $0.50

Output

$9/1M tokens

Completion tokens

25% below 3.1 Pro's $12 output
Still 3× cheaper than GPT-5.5 output
Includes thinking tokens

Context

1Mtokens

Window

Full 1M with no tier break
Native multimodal across the window
Matches 3.1 Pro on length

Flash-Lite

$0.25/$1.50 per 1M

3.1 Flash-Lite

Now the cheap-volume tier
Same 1M window
Use it for classification and re-ranking

What Flash is for

Gemini 3.5 Flash, launched at Google I/O 2026, is not the cheap-volume Flash of last year. The old $0.50/$3 rate is gone; the new tier sits at $1.50 input and $9 output per 1M tokens with a flat 1M context. The right way to read it: this is the practical default for most production work — coding-strong, multimodal, frontier-length — at roughly three-quarters of Gemini 3.1 Pro's cost. If you genuinely need cheap volume, you should be on Flash-Lite ($0.25/$1.50), Gemini 2.5 Flash ($0.30/$2.50), or the rock-bottom Gemini 2.5 Flash-Lite ($0.10/$0.40) — not here.

What you get for the price increase is real. Google's headline claim at I/O is that 3.5 Flash beats 3.1 Pro on coding benchmarks — which is the kind of inversion that doesn't happen often in a Pro/Flash split and is worth taking seriously when you're picking a model for an agent loop or an IDE assistant.

Capabilities

Flash inherits the native multimodality of the Gemini family: text, image, audio, and video all go in. The big change in 3.5 is reasoning — particularly on code generation and code editing, where Google's own evals put it above 3.1 Pro. General reasoning is up too, though Pro still wins on the hardest single-shot questions and long-horizon planning. Tool use is more reliable than in 3 Flash but still less mature than Anthropic's or OpenAI's.

The expected tradeoffs: the price tripled on input and output, so this is no longer the model you reach for to drive a free-tier chat product. Google's own free tier still defaults to Gemini 2.5 Flash for exactly that reason.

Typical use cases

Coding assistants and IDE integrations — the headline use case
Production chat where quality matters more than per-call cost
Agent loops and tool-using workflows that were borderline on 3 Flash
Long-document Q&A and contract review with native multimodal input
Video and audio analysis where 1M context plus reasoning both matter

Sibling and rival comparison

Model	Input / 1M	Output / 1M	Context
Gemini 3.5 Flash	$1.50	$9	1M
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M
Gemini 3.1 Pro	$2	$12	1M
GPT-5.4 mini	$0.25	$2	272K

3.5 Flash now sits close enough to 3.1 Pro that the choice is mostly about workload shape: pick Pro for the hardest reasoning, pick 3.5 Flash for coding and most everything else. Flash-Lite is the genuine cheap tier — about a sixth of the input cost — and is where you should send classification, re-ranking, and bulk summarization. GPT-5.4 mini is cheaper still but loses on context length and on multimodal input.

← See all Google / Gemini plans