← Back to API models

Pixtral 12B

by Mistral AI · budget vision

The cheapest way to look at images with an LLM: $0.15 flat per 1M tokens, both directions.

Input $0.15 / 1M
Output $0.15 / 1M
Context 128K tokens
Specialty Vision
La Plateforme ↗ Updated June 10, 2026
§ API pricing

Per-token rates.

Input
$0.15/1M tokens
Prompt tokens
  • Cheapest vision input on our ledger
  • Images billed as tokens
  • Text input at the same rate
Output
$0.15/1M tokens
Completion tokens
  • Same rate both directions
  • No output-premium math
  • Simplest pricing on the table
Context
128Ktokens
Window
  • Dozens of images per call
  • ~96K words of text alongside
  • Standard Mistral budget window
Size
12Bparameters
Open weights
  • Small, fast, self-hostable
  • Apache-licensed release
  • Runs on a single GPU

Why Pixtral exists

Pixtral 12B answers one question cheaply: "what's in this image?" At a flat $0.15 per 1M tokens — the same rate in and out, the only flat-priced model on our table — it makes million-image pipelines affordable in a way frontier vision pricing never will.

It's a 12-billion-parameter open-weights model, which sets expectations correctly: this is a tool for volume vision tasks, not a frontier brain that happens to see. The open release also means you can self-host it on a single GPU when API economics stop making sense.

Capabilities

Solid image understanding — captioning, OCR-ish reading, chart and screenshot description, content tagging — plus ordinary text chat in the same call. The 128K window fits dozens of images per request, useful for batch processing.

The honest weakness: detail and reasoning. Fine-grained chart analysis, dense document layouts, and visual reasoning chains belong to Opus 4.8, Gemini 3.1 Pro, or GPT-5.5 — at 30–300× the price.

Typical use cases

  • Bulk image captioning and alt-text generation
  • Content moderation on image streams
  • Product-photo tagging for e-commerce catalogs
  • Screenshot triage and routing
  • Self-hosted vision pipelines on a single GPU

Sibling and rival comparison

ModelInput / 1MOutput / 1MContext
Pixtral 12B$0.15$0.15128K
Gemini 3.1 Flash-Lite$0.25$1.501M
GPT-5.4 mini$0.25$2272K
Claude Haiku 4.5$1$5200K
Mistral Small 3.1$0.20$0.60128K

Every multimodal rival charges more on output — Gemini Flash-Lite 10×, GPT-5.4 mini 13×. For pure "describe/tag/filter this image" volume, Pixtral is the price floor. The moment the task becomes "reason about this image", spend up.

← See the full Mistral lineup