LLM API Cost Comparison: OpenAI vs Anthropic vs Google (March 2026)

Ivan Horvatić 25 Apr 2026 5 min read

LLM API Cost Comparison: OpenAI vs Anthropic vs Google (March 2026)

---

Choosing an LLM API? **Cost** can make or break your budget. A naïve implementation can burn $10k/month where a smart one costs $500.

This guide breaks down **real pricing** (March 2026), shows cost-per-task examples, and reveals hidden tricks to slash your bill.

Quick Comparison Table

| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Speed | |----------|-------|----------------------|----------------------|----------------|-------| | **OpenAI** | GPT-5.4 | $15 | $60 | 128k | ⚡⚡⚡ | | **OpenAI** | GPT-5 Mini | $0.15 | $0.60 | 128k | ⚡⚡⚡⚡ | | **Anthropic** | Claude Opus 4.6 | $15 | $75 | 200k | ⚡⚡ | | **Anthropic** | Claude Sonnet 4.5 | $3 | $15 | 200k | ⚡⚡⚡ | | **Anthropic** | Claude Haiku 4.5 | $0.25 | $1.25 | 200k | ⚡⚡⚡⚡ | | **Google** | Gemini 3.1 Pro | $7 | $21 | 1M | ⚡⚡⚡ | | **Google** | Gemini 2.5 Flash | $0.075 | $0.30 | 1M | ⚡⚡⚡⚡⚡ | | **GitHub Copilot** | Claude Sonnet 4.5 | $0.50 | $2.00 | 200k | ⚡⚡⚡ | | **GitHub Copilot** | Claude Opus 4.6 | $0.50 | $2.00 | 200k | ⚡⚡ |

**Legend:** - ⚡ = Slow (10–30 tokens/sec) - ⚡⚡⚡ = Fast (40–70 tokens/sec) - ⚡⚡⚡⚡⚡ = Very fast (100+ tokens/sec)

Real-World Cost Examples

Example 1: Customer Support Chatbot

**Usage:** 100k messages/month, 500 tokens input + 200 tokens output each

| Provider | Model | Monthly Cost | |----------|-------|-------------| | OpenAI | GPT-5 Mini | $105 | | Anthropic | Claude Haiku 4.5 | $62.50 | | Google | Gemini 2.5 Flash | $9.75 | | **Winner** | **Gemini 2.5 Flash** | **$9.75** |

**Why Gemini wins:** 10x cheaper than competitors, 1M context handles long conversations.

---

Example 2: Code Generation Tool

**Usage:** 50k requests/month, 2k tokens input + 1k tokens output each

| Provider | Model | Monthly Cost | |----------|-------|-------------| | OpenAI | GPT-5.4 | $4,500 | | Anthropic | Claude Opus 4.6 | $5,250 | | Google | Gemini 3.1 Pro | $1,750 | | GitHub Copilot | Claude Sonnet 4.5 | $200 | | **Winner** | **GitHub Copilot (Sonnet)** | **$200** |

**Why Copilot wins:** Subsidized pricing (GitHub eats the cost). Only available to Copilot subscribers ($10–20/month).

---

Example 3: Document Analysis (Long Context)

**Usage:** 10k docs/month, 50k tokens input + 2k tokens output each

| Provider | Model | Monthly Cost | |----------|-------|-------------| | OpenAI | GPT-5.4 | $8,700 | | Anthropic | Claude Opus 4.6 | $9,000 | | Google | Gemini 3.1 Pro | $3,770 | | **Winner** | **Google Gemini 3.1 Pro** | **$3,770** |

**Why Gemini wins:** 1M context window = fewer API calls, lower input costs.

---

Example 4: Summarization Pipeline

**Usage:** 1M short texts/month, 200 tokens input + 50 tokens output each

| Provider | Model | Monthly Cost | |----------|-------|-------------| | OpenAI | GPT-5 Mini | $42 | | Anthropic | Claude Haiku 4.5 | $112.50 | | Google | Gemini 2.5 Flash | $21 | | **Winner** | **Google Gemini 2.5 Flash** | **$21** |

**Why Gemini wins:** Unbeatable pricing for simple tasks.

---

Hidden Costs to Watch

1. Prompt Caching (Anthropic Only)

**What it is:** Reuse repeated prompt prefixes, pay 10% of normal input cost.

**Example:** - Normal: 100k tokens input = $1.50 (Claude Opus) - With caching: 10k unique + 90k cached = $0.15 + $0.135 = **$0.285** (81% savings)

**When it helps:** Long system prompts, RAG contexts, repeated instructions.

**How to use:**

Anthropic API

response = anthropic.messages.create( model="claude-opus-4.6", messages=[...], system=[ {"type": "text", "text": "Long system prompt...", "cache_control": {"type": "ephemeral"}} ] )

**Savings:** Up to 90% on input costs.

---

2. Batch API (OpenAI)

**What it is:** Submit jobs in bulk, get 50% discount, results in 24h.

**When it helps:** Non-time-sensitive tasks (data labeling, summarization).

**Example:** - Standard API: $15/1M input = $1,500 for 100M tokens - Batch API: $7.50/1M input = **$750** (50% savings)

**How to use:**

OpenAI Batch API

client.batches.create( input_file=batch_file, endpoint="/v1/chat/completions", completion_window="24h" )

---

3. Output Token Costs (Often Overlooked)

**Reality check:** Output tokens cost 2–5x more than input tokens.

**Bad example:** - Generate 10k token response = $0.60 (GPT-5 output) - Could have used GPT-5 Mini = $0.006 (100x cheaper)

**Optimization:** Use smaller models for long outputs (summaries, reports).

---

Cost Optimization Strategies

Strategy 1: Tiered Model Routing

Route requests based on complexity:

Simple tasks → Gemini 2.5 Flash ($0.075 input) Medium tasks → Claude Haiku / GPT-5 Mini ($0.25 input) Hard tasks → GPT-5 / Claude Opus ($15 input)

**Tools:** LiteLLM, OpenRouter, custom routing logic.

**Savings:** 60–80% on total API costs.

---

Strategy 2: Prompt Compression

Compress prompts without losing context:

**Tools:** - [PromptCompressor](https://github.com/microsoft/LLMLingua) — 50–80% token reduction - Semantic caching (vector DB + similarity search)

**Example:** - Original: 5k tokens = $0.075 (GPT-5.4) - Compressed: 1.5k tokens = **$0.0225** (70% savings)

---

Strategy 3: Local + Cloud Hybrid

Run cheap tasks locally (Ollama), expensive tasks in cloud:

Draft generation → Ollama Mistral 7B (free) Final polish → Claude Sonnet 4.5 ($3 input)

**Savings:** 80–90% vs pure cloud.

---

Strategy 4: GitHub Copilot Arbitrage

If you have Copilot subscription ($10–20/month):

**Use Copilot API for everything:** - Claude Sonnet 4.5: $0.50 input (vs $3 direct) - Claude Opus 4.6: $0.50 input (vs $15 direct)

**Catch:** 10 req/min rate limit. Fine for low-volume personal projects.

---

Hidden Pricing Traps

❌ Free Tiers Are Marketing

- OpenAI: $5 free credits expire in 3 months - Anthropic: No free tier - Google: $300 credits (90 days) then charges

**Trap:** Free credits lure you in, then bills hit. Budget from day 1.

---

❌ Rate Limits Can Cost You

Hitting rate limits = retries = wasted tokens + latency.

**Tiers (OpenAI example):** - Tier 1 (new account): 500 req/min - Tier 5 ($1k+ spent): 10k req/min

**Solution:** Use multiple API keys, rotate providers, or pay for higher tier.

---

❌ Context Window Waste

**Bad example:** Send 50k token context, only need 5k.

**Cost:** - Wasted: 45k tokens × $15/1M = $0.675 per request - Over 100k requests = **$67,500 wasted**

**Solution:** Trim context, use RAG (only send relevant chunks).

---

Which Provider Should You Choose?

Choose OpenAI if:

- You need GPT-5 class performance - Speed matters (fastest inference) - Ecosystem matters (most integrations)

Choose Anthropic if:

- Long context (200k+ tokens) - Safety/refusal behavior matters (most aligned) - Prompt caching saves you money

Choose Google if:

- Cost is priority #1 (cheapest flagship + flash models) - 1M context window (process books, codebases) - Multimodal native (video, audio)

Choose GitHub Copilot if:

- You're already a Copilot subscriber - Low-volume personal/side projects - Want flagship models at 90% discount

---

Cost Calculator

**Try this formula:**

Monthly cost = (input_tokens × input_price) + (output_tokens × output_price)

**Example:** - 100M input, 20M output - GPT-5.4: (100 × $15) + (20 × $60) = **$2,700** - Gemini 3.1 Pro: (100 × $7) + (20 × $21) = **$1,120**

**Savings:** $1,580/month (58%)

---

Final Recommendations

**For most apps:** 1. Start with Gemini 2.5 Flash (cheapest, fast) 2. Upgrade to Gemini 3.1 Pro if quality suffers 3. Add Claude Sonnet 4.5 for edge cases

**For high-stakes apps:** 1. Use Claude Opus 4.6 or GPT-5.4 2. Implement prompt caching (Anthropic) 3. Route easy tasks to cheaper models

**For personal projects:** 1. Get GitHub Copilot ($10–20/month) 2. Use Copilot API for everything 3. Fallback to Ollama for free local inference

---

Resources

- [OpenAI Pricing](https://openai.com/pricing) - [Anthropic Pricing](https://anthropic.com/pricing) - [Google Vertex AI Pricing](https://cloud.google.com/vertex-ai/pricing) - [LiteLLM Cost Calculator](https://litellm.ai/calculator)

---

**What's your monthly API bill?** Drop it in the comments — let's compare strategies.

*(Affiliate disclosure: Some links may include referral codes. I only recommend tools I actually use.)*

LLM API Cost Comparison: OpenAI vs Anthropic vs Google (March 2026)

Quick Comparison Table

Real-World Cost Examples

Example 1: Customer Support Chatbot

Example 2: Code Generation Tool

Example 3: Document Analysis (Long Context)

Example 4: Summarization Pipeline

Hidden Costs to Watch

1. Prompt Caching (Anthropic Only)

Anthropic API

2. Batch API (OpenAI)

OpenAI Batch API

3. Output Token Costs (Often Overlooked)

Cost Optimization Strategies

Strategy 1: Tiered Model Routing

Strategy 2: Prompt Compression

Strategy 3: Local + Cloud Hybrid

Strategy 4: GitHub Copilot Arbitrage

Hidden Pricing Traps

❌ Free Tiers Are Marketing

❌ Rate Limits Can Cost You

❌ Context Window Waste

Which Provider Should You Choose?

Choose **OpenAI** if:

Choose **Anthropic** if:

Choose **Google** if:

Choose **GitHub Copilot** if:

Cost Calculator

Final Recommendations

Resources

Choose OpenAI if:

Choose Anthropic if:

Choose Google if:

Choose GitHub Copilot if: