← Back to API models

Pixtral 12B

Name: Pixtral 12B
Brand: Mistral AI
Price: 0.15 USD

by Mistral AI · budget vision

The cheapest way to look at images with an LLM: $0.15 flat per 1M tokens, both directions.

Input $0.15 / 1M

Output $0.15 / 1M

Context 128K tokens

Specialty Vision

La Plateforme ↗ Updated July 23, 2026

§ API pricing

Per-token rates.

Input

$0.15/1M tokens

Prompt tokens

Cheapest vision input on our ledger
Images billed as tokens
Text input at the same rate

Output

$0.15/1M tokens

Completion tokens

Same rate both directions
No output-premium math
Simplest pricing on the table

Context

128Ktokens

Window

Dozens of images per call
~96K words of text alongside
Standard Mistral budget window

Size

12Bparameters

Open weights

Small, fast, self-hostable
Apache-licensed release
Runs on a single GPU

Why Pixtral exists

Pixtral 12B answers one question cheaply: "what's in this image?" At a flat $0.15 per 1M tokens — the same rate in and out, the only flat-priced model on our table — it makes million-image pipelines affordable in a way frontier vision pricing never will.

It's a 12-billion-parameter open-weights model, which sets expectations correctly: this is a tool for volume vision tasks, not a frontier brain that happens to see. The open release also means you can self-host it on a single GPU when API economics stop making sense.

Capabilities

Solid image understanding — captioning, OCR-ish reading, chart and screenshot description, content tagging — plus ordinary text chat in the same call. The 128K window fits dozens of images per request, useful for batch processing.

The honest weakness: detail and reasoning. Fine-grained chart analysis, dense document layouts, and visual reasoning chains belong to Opus 5, Gemini 3.1 Pro, or GPT-5.5 — at 30–300× the price.

Typical use cases

Bulk image captioning and alt-text generation
Content moderation on image streams
Product-photo tagging for e-commerce catalogs
Screenshot triage and routing
Self-hosted vision pipelines on a single GPU

Sibling and rival comparison

Model	Input / 1M	Output / 1M	Context
Pixtral 12B	$0.15	$0.15	128K
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M
GPT-5.4 mini	$0.25	$2	272K
Claude Haiku 4.5	$1	$5	200K
Mistral Small 3.1	$0.20	$0.60	128K

Every multimodal rival charges more on output — Gemini Flash-Lite 10×, GPT-5.4 mini 13×. For pure "describe/tag/filter this image" volume, Pixtral is the price floor. The moment the task becomes "reason about this image", spend up.

← See the full Mistral lineup