The new mid-tier Gemini, launched at I/O 2026. Beats 3.1 Pro on coding, keeps the 1M context, and lands at roughly three-quarters of Pro's price. Cheap volume work moved down to Flash-Lite.
Gemini 3.5 Flash, launched at Google I/O 2026, is not the cheap-volume Flash of last year. The old $0.50/$3 rate is gone; the new tier sits at $1.50 input and $9 output per 1M tokens with a flat 1M context. The right way to read it: this is the practical default for most production work — coding-strong, multimodal, frontier-length — at roughly three-quarters of Gemini 3.1 Pro's cost. If you genuinely need cheap volume, you should be on Flash-Lite or Gemini 2.5 Flash now, not here.
What you get for the price increase is real. Google's headline claim at I/O is that 3.5 Flash beats 3.1 Pro on coding benchmarks — which is the kind of inversion that doesn't happen often in a Pro/Flash split and is worth taking seriously when you're picking a model for an agent loop or an IDE assistant.
Flash inherits the native multimodality of the Gemini family: text, image, audio, and video all go in. The big change in 3.5 is reasoning — particularly on code generation and code editing, where Google's own evals put it above 3.1 Pro. General reasoning is up too, though Pro still wins on the hardest single-shot questions and long-horizon planning. Tool use is more reliable than in 3 Flash but still less mature than Anthropic's or OpenAI's.
The expected tradeoffs: the price tripled on input and output, so this is no longer the model you reach for to drive a free-tier chat product. Google's own free tier still defaults to Gemini 2.5 Flash for exactly that reason.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| Gemini 3.5 Flash | $1.50 | $9 | 1M |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M |
| Gemini 3.1 Pro | $2 | $12 | 1M |
| GPT-5.4 mini | $0.25 | $2 | 272K |
3.5 Flash now sits close enough to 3.1 Pro that the choice is mostly about workload shape: pick Pro for the hardest reasoning, pick 3.5 Flash for coding and most everything else. Flash-Lite is the genuine cheap tier — about a sixth of the input cost — and is where you should send classification, re-ranking, and bulk summarization. GPT-5.4 mini is cheaper still but loses on context length and on multimodal input.