Template

Article generation

Load template

Long-form writing with moderately sized prompts and substantial output.

4k input · 2.5k output · 1.5k requests/day · 10% cache Batch off
Template

Customer support

Load template

Short multi-turn support replies with high daily traffic and reusable system prompts.

1.8k input · 600 output · 120k requests/day · 35% cache Batch off
Template

RAG Q&A

Load template

Retrieval-heavy prompts where context dominates cost and cache can matter.

6k input · 800 output · 30k requests/day · 45% cache Batch off
Template

Code completion

Load template

Interactive coding assistance with medium prompts and short outputs.

2.5k input · 350 output · 50k requests/day · 20% cache Batch off
Template

Batch summarization

Load template

Offline bulk summarization where batch mode and cache reuse are both realistic.

12k input · 1.6k output · 8k requests/day · 50% cache · batch on Batch on

Workload assumptions

Display currency Default USD display

Live FX rates come from Frankfurter. If conversion is unavailable, the calculator falls back to the model source currency and marks that row.

Models to compare

1 selected. Leave this blank and the calculator falls back to three live-snapshot defaults.

Quick token estimate

Optional helper for rough sizing before you set request token numbers.

Reset

Current workload summary

Scenario template Custom workload Directly edited inputs with no template preset.
Selected models 1
Display currency USD
Monthly request volume 300,000
Request shape 2,000 in · 1,000 out
Cache and batch 20% cache · Batch off
Budget Not set
FX mode Source currency

Request details

Show request details
{
  "modelCodes": ["gpt-4.1"],
  "inputTokens": 2000,
  "outputTokens": 1000,
  "dailyRequests": 10000,
  "activeDays": 30,
  "cacheHitRatio": 0.20,
  "useBatch": false,
  "monthlyBudget": null,
  "displayCurrencyCode": "USD"
}
From English words 0
From Chinese chars 0
From pages 0
Estimated total tokens 0

Estimate only. Default assumptions: 1 token ~= 0.75 English words, 1 token ~= 1.5 Chinese characters, 1 page ~= 500 English words.

How the calculator interprets inputs

  • `Cache hit ratio` discounts only the cached share of input tokens.
  • `Batch discount` applies only when the stored snapshot lists a batch ratio for that model.
  • `Budget fit` means the maximum monthly requests you can afford with the exact request shape above.

Shareable by URL

Scenario, workload inputs, selected models, display currency, and budget already stay in the current query string, so the active calculator view is directly shareable.

Model Monthly cost 1k requests Blend / 1M Cache savings / month Batch savings / month Budget fit
GPT-4.1
gpt-4.1
OpenAI
Updated Mar 31, 09:56
Fallback source
PricePerToken OpenAI
Currency USD
Fallback snapshot from PricePerToken because OpenAI official pricing pages currently return an anti-bot challenge to server-side crawlers. Source updated at 2026-03-28T08:30:28.409699Z.
Cache saves $180.00 No batch savings
$3420.00 $11.40 $3.50 $180.00 $0.0000
Set budget to see fit

Estimate only. Actual billing may differ by tokenizer behavior, cache hit rate, and provider rules.