ThePricer Media
  • Latest: How Much Will Gas Prices Rise After the Iran Strikes?
  • Daily Price Puzzle (60s)
  • Talk to Alec
  • Blog
  • Podcast
  • eBook
  • Click to open the search input field Click to open the search input field Search
  • Menu Menu
Online Services, Tech

How Much Does Gemini 3.5 Flash Cost?

Published on May 28, 2026 | Written by Alec Pow
This article was researched using 12 sources. See our methodology and corrections policy.

Google places Gemini 3.5 Flash in a developer stack that includes Google AI Studio, Google Cloud billing, Vertex AI access paths, context caching, Batch processing, and Google Search grounding, with model behavior described in its developer model entry. The buyer’s unit is API activity across prompts, files, generated tokens, cached context, storage hours, and search-backed requests.

Gemini 3.5 Flash is billed as a developer API model, not a fixed consumer subscription. As of May 2026, the paid API rate is usage-based, with separate charges for input tokens, output tokens, cached content, Batch jobs, and grounded search.

The bill is built from the text, audio, image, video, or file content sent to the model and the response it returns. Exact monthly spend stays private to each project because token volume, thinking level, cache use, and platform path can change the invoice.

For a developer, Gemini 3.5 Flash spending is best read per million tokens, then translated into a monthly app budget. Tier choice matters less than the workload shape, because an agent that writes long answers can spend far more on output than a classifier that sends short prompts and gets short labels.

TL;DR: As of May 2026, the paid API rate table puts Gemini 3.5 Flash at $1.50 per million input tokens and $9.00 per million output tokens, with caching, Batch, and grounding changing the real bill.

How Much Does Gemini 3.5 Flash Cost?

Jump to sections
  • What you’re actually buying
  • Pricing snapshot and model fit
  • Token billing
  • Free tier, paid tier, and limits
  • Cache, Batch, and grounding extras
  • What small teams might pay
  • Worked total for a Gemini 3.5 Flash proje…
  • Base input as of May 2026, $1.50 per 1M tokens.
  • Base output as of May 2026, $9.00 per 1M tokens.
  • Cached input as of May 2026, $0.15 per 1M tokens, plus $1.00 per 1M token-hours for storage.
  • Batch as of May 2026, $0.75 per 1M input tokens and $4.50 per 1M output tokens.
  • Grounding after the free shared monthly allowance, $14 (that's 28 minutes of your life at a $30/hr wage, or $5.60 in 1990 money) per 1,000 search queries.
Gemini 3.5 Flash Cost Card

What you’re actually buying

Gemini 3.5 Flash is a Google AI model for developers who need fast responses, multimodal input, and agent-style tasks. A team sends prompts, files, images, audio, or video into the API, then receives text output from the model. It is different from the consumer Gemini app, where a person chats through Google’s interface rather than metering an application through API calls.

The model also differs from lighter Flash-Lite options and heavier Pro-style models. Its role is speed and higher reasoning quality for interactive work such as coding agents, document handling, tool calls, and long context tasks. Google says the model supports sub-agent deployment, multi-step workflows, and long-horizon work, so buyers should connect spend to the workload rather than a single sticker price on the agent and coding profile.

Related guides

  • How Much Does Claude Fable 5 Cost?
  • How Much Does Claude AI Cost for Businesses?
  • How Much Does a Quinn Subscription Cost?

Pricing snapshot and model fit

The May 2026 price sheet separates Standard usage from Batch usage and cache lines. That split matters because the same app can pay one rate for live chat and another rate for delayed jobs. Batch is cheaper, but it fits back-office processing better than a live customer chat box.

Line item Paid rate as of May 2026 Budget signal
Standard input $1.50 per 1M tokens Prompt size and file context
Standard output $9.00 per 1M tokens Answer length and thinking tokens
Cached input $0.15 per 1M tokens Repeated context that can be reused
Batch input and output $0.75 and $4.50 per 1M tokens Delayed work that can wait
Grounded search over allowance $14 (about $5.60 in 1990 money) per 1,000 search queries Research agents and live web answers

Nearby models also shape the choice. Google’s model index lists Gemini 3.5 Flash as stable, Gemini 3 Flash as preview, and Gemini 3.1 Flash-Lite as a lower-cost stable option in the same family, so buyers should match model strength to the job instead of using one model for every call on the stable and preview labels.

Token billing

Input tokens are what the app sends in. Output tokens are what the model sends back, including reasoning or thinking tokens where the model uses them. This makes answer length the main swing item for many agent builds, because output is priced much higher than input.

A project that sends 10M input tokens and receives 1M output tokens would spend $24.00 (about $9.70 in 1990 money) before cache or grounding, because 10 × $1.50 equals $15.00 and 1 × $9.00 equals $9.00, using the May 2026 third-party model analysis. That same token count can be manageable for a production app if it supports many users, but expensive for a one-off script with long answers and no revenue tied to the work.

The practical control is not just prompt trimming. Teams can cap max output, use shorter system instructions, store repeated context, and send long jobs through Batch when speed is not needed. Developers comparing API tools may also want to benchmark this against Cursor usage pricing, because both products show how agentic workflows turn a fixed tool budget into a metered compute budget.

Free tier, paid tier, and limits

Gemini 3.5 Flash has a free tier path for testing, but production planning should use paid-tier math. Free access can support prototypes, prompt checks, and early demos. It is not a stable substitute for a paid launch because rate limits, feature access, and data-handling settings can differ.

Google said on May 19, 2026 that Gemini 3.5 Flash is available through the Gemini app, AI Mode in Search, Google AI Studio, Android Studio, Gemini API, and enterprise products, so readers should separate consumer access from developer billing in the launch availability details. A person using the Gemini app may see the model without paying API token fees. A software company calling the API pays by metered usage.

Consumer subscription pricing can add confusion. Google’s May 2026 subscription update says AI Ultra tiers include higher usage limits in the Gemini app and Google Antigravity, with the top tier lowered from $250 to $200 per month and a $100 tier also listed in the consumer AI plan changes. Those are not the same as Gemini API token charges.

Cache, Batch, and grounding extras

The extra range runs from $0.075 per 1M Batch cached input tokens to $14 per 1,000 search queries after the shared allowance, with cache storage at $1.00 per 1M token-hours. These are small unit prices, but they matter when a retrieval app keeps large context alive across many users.

Batch can cut the same input and output task in half when the job can wait. On the Google Cloud price sheet, Standard Gemini 3.5 Flash is $1.50 input and $9.00 output per 1M tokens, while Flex or Batch is $0.75 input and $4.50 output, so the 10M input and 1M output run drops from $24.00 to $12.00, using the cloud billing rows.

Grounding is a separate decision. It can make sense for an agent that must answer from current web material, but it is a poor default for closed tasks such as formatting a JSON object, classifying a support ticket, or rewriting a short internal note. Teams building speech or media products should also compare usage meters against AI voice cloning prices, where minutes, characters, rights, and API volume can all move the bill.

What small teams might pay

Gemini 3.5 Flash Cost Mini case 1, prototype. A developer tests a help widget with short prompts, short answers, and no grounded search. The first budget line is token volume, not a seat fee. If the app sends small text prompts and returns short summaries, the bill can stay modest because output is limited.

Mini case 2, support agent. A team lets the model read a knowledge base section, call tools, and draft longer customer replies. Output tokens become the pressure point. A support answer that is three times longer than expected can move spend faster than the input side because output has the higher rate.

Mini case 3, research assistant. An app uses repeated long context and grounded search. Cache can lower repeated context cost, but the grounding line can appear once the shared free monthly allowance is passed. That team should track search queries beside tokens, because a single customer request can trigger more than one search query.

These cases are planning examples, not claims about real customer invoices. The safest first month is a capped pilot with logging for input tokens, output tokens, cache reads, storage hours, and grounded queries. If a team also pays for code assistants, compare this meter beside GitHub Copilot subscriptions so engineering spend is viewed as one stack, not scattered tools.

Worked total for a Gemini 3.5 Flash project

Itemized example for a small internal document assistant, assume 20M standard input tokens, 2M standard output tokens, 5M cached input tokens, and 10M token-hours of cache storage in one month. The input line is 20 × $1.50, or $30.00. The output line is 2 × $9.00, or $18.00. The cached input line is 5 × $0.15, or $0.75. The storage line is 10 × $1.00, or $10.00.

The total is $58.75 before taxes, payment-card effects, or any grounded search fees. Output still takes almost a third of that bill, because $18.00 out of $58.75 is about 31 percent. Cache reads are tiny in this case, but cache storage is not zero, so leaving large cached context alive after a job ends can add waste.

The line to watch is the one the product team controls least. In a document assistant, users may ask for long summaries, rewrites, tables, or draft emails. Setting a shorter answer target can save more than shaving a few words from the prompt, because the output rate is the higher line item.

When it’s worth paying for

Gemini 3.5 Flash fits buyers who need stronger agent and coding behavior than a cheaper model, but do not want to move every task to a larger Pro-style model. Platform choice also matters. Direct Gemini API access can be the simpler route for developers already using Google AI Studio, while Google Cloud can fit teams that need enterprise controls, cloud billing, and managed agents. Google’s May 2026 developer post says the Gemini API now supports managed agents that run in a secure cloud sandbox, define skills, and use versioned agent files through the secure sandbox and skills workflow.

Makes sense if

  • Your app uses multimodal prompts, coding tasks, or tool calls where a cheaper Lite model underperforms.
  • You can cap response length and send slow jobs through Batch.
  • Your repeated context can be cached rather than resent in full.
  • Your team already uses Google AI Studio, Android Studio, or Google Cloud billing.

Doesn’t make sense if

  • Your workload is short classification, extraction, or routing that a lower-cost model can handle.
  • Your app produces long answers without a clear user or revenue reason.
  • You need a fixed monthly bill and cannot tolerate token swings.
  • You only want consumer Gemini chat access rather than API calls.

What we verified

  • Checked public third-party rate matching for $1.50 input and $9.00 output through a router pricing comparison.
  • Confirmed independent news coverage that Gemini 3.5 Flash became the default model in Gemini and AI Mode in the I/O product coverage.
  • Cross-referenced Google’s statement that Gemini 3.5 Flash is available for building through Gemini with its effective pricing view.

Rates can change after publication. Recheck the API price sheet before opening production traffic or removing budget caps.

Article Highlights

  • Gemini 3.5 Flash API billing is usage-based, not a flat monthly plan.
  • The main paid rates as of May 2026 are $1.50 input and $9.00 output per 1M tokens.
  • Batch can halve input and output rates when delayed processing works for the job.
  • Cache reads are cheap, but cache storage can still add waste when context stays live.
  • Consumer Gemini access and API token billing are separate buying paths.

Answers to Common Questions

How much does Gemini 3.5 Flash cost per month?

There is no single monthly API price. A light prototype may owe little or nothing on a free testing path, while paid production usage is built from tokens, cache, Batch jobs, and grounding.

Is Gemini 3.5 Flash cheaper than Gemini 3 Flash?

No, not by the current public rate sheets we checked. Independent analysis lists Gemini 3.5 Flash at higher input and output rates than Gemini 3 Flash, so the value case rests on capability and speed, not the lowest unit price.

Does Google AI Ultra include Gemini 3.5 Flash API usage?

Google AI Ultra is a consumer subscription path with higher usage limits in Google products. It is separate from Gemini API token billing for developers.

What line item should developers watch first?

Watch output tokens first for chat, coding, and agent apps. Long answers and reasoning tokens can move the bill faster than prompt input.

Disclosure: Educational content, not financial advice. Prices reflect public information as of the dates cited and can change. Confirm current rates, fees, taxes, and terms with official sources before purchasing. See our methodology and corrections policy.

Published: May 28, 2026/by Alec Pow
ThePricer Daily Price Puzzle
© 2014 - 2026 - ThePricer Media, LLC, 4 Grove Street, New York, NY, 10014, Phone: (212) 431-2441
We don’t use affiliate links or paid placements. All sources are cited only for verification.
  • Link to X
  • Link to LinkedIn
  • Link to Facebook
  • Link to Instagram
  • Link to Pinterest
  • Link to Youtube
  • Contact Us
  • About Us
  • Press & Mentions
  • Careers
  • Meet the Founder
  • Privacy Policy
  • Editorial Ethics
  • Methodology
  • Corrections
  • Disclosure
  • Terms and Conditions
Scroll to top Scroll to top Scroll to top