The price floor of the entire market: $0.14/$0.28 per 1M with a 1M window and real reasoning ability.
V4-Flash is the cheapest model on our API table, and it isn't a toy: it inherits the V4 family's reasoning lineage, keeps the full 1M context window, and costs $0.14 input / $0.28 output per 1M tokens. The next-cheapest US-hosted rival with a 1M window (Gemini 3.1 Flash-Lite) charges 5× more on output.
Like its big sibling V4-Pro, the considerations are non-technical: the hosted API routes through Chinese infrastructure, which is a hard blocker for some compliance regimes — and the reason open-weights self-hosting is part of DeepSeek's pitch.
For the price class, capability is absurd: usable reasoning on math, code, and logic, a 1M window, and throughput suited to volume pipelines. Most tasks that teams route to mini-tier US models run fine here at a fraction of the cost.
The honest weakness: polish and ecosystem. Tooling, SDK maturity, rate-limit headroom, and English prose quality all trail the US providers, and the hardest reasoning belongs to V4-Pro or a frontier model.
| Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | 1M |
| DeepSeek V4-Pro | $0.435 | $0.87 | 1M |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M |
| GPT-5.4 nano | $0.05 | $0.40 | 272K |
| Mistral Small 3.1 | $0.20 | $0.60 | 128K |
Only GPT-5.4 nano beats it on input price, with a quarter of the window and less reasoning. If Chinese hosting is acceptable (or you self-host), V4-Flash is the rational default for cheap volume; if not, Gemini Flash-Lite and Mistral Small are the compliant runners-up.