In-Depth

Beware Project-Wrecking GitHub Copilot Premium SKU Quotas

When using GitHub Copilot Pro in Visual Studio Code, you can bump up against a premium SKU quota limit that stealthily switches AI models on you, resulting in a wrecked project, hours of angst and consternation and an acrimonious turn to your formerly cordial relationship with your Chat buddy -- with ugly cussing involved.

At least for an always-on-deadline journalist working on various frenzied projects who didn't see a little notice in a long, complicated Chat window. Your mileage may vary. You, perhaps being a meticulous, methodical, professional coder, might have spotted this right away amid the chatter: "You have exceeded your premium request allowance. We have automatically switched you to GPT 4.1 which is included with your plan."

Let me tell you, GPT 4.1 sucks -- so much for journalistic objectivity.

This happened in an agentic AI comparison project I was working on in VS Code, where I do all my writing. I've used AI to enhance my writing/editing experience, so naturally took to documenting how agentic AI works in the IDE (it's definitely not a mere code editor anymore) in a series of articles. Advanced AI in VS Code so improved my writing experience that I ponied up for a personal Copilot Pro subscription. However, I wasn't aware of the premium SKU quotas that come with it.

Model Usage Breakdown
[Click on image for larger view.] Model Usage Breakdown (source: GitHub).

My latest project was comparing how my editorial assistant Agent Skill, fairly new to VS Code, compared to Amazon Q, after noticing how Amazon keeps posting updates about how well it works on .NET projects of all kinds. I was wondering why a Microsoft-centric developer would choose a third-party cloud tool for .NET projects when tools like Copilot were available, much more integrated into the whole VS Code and Visual Studio experience.

For the project I tested my Agent Skill that formats articles from freelancers for our CMS. It has turned into quite a beast, as I keep improving it, with the backing SKILL.md file now numbering more than 3,500 words, providing all kinds of spiffy functionality. What used to be an onerous, time-consuming process of manual drudgery to convert a Word doc to a CMS-ready HTML article is now done in minutes with one command that applies all the right formatting according to the freelancer it detects, checks for grammar and spelling and constructs a key takeaway sidebar along with an article summary and social media posts. It's been a game-changer for my workflow, and I love it.

We've had a bit of a one-sided lover's quarrel.

That's thanks to GitHub wanting to make more money and my being inattentive to what passes as the "fine print" in our digital age.

In the middle of my comparison, I duplicated the formatting actions with my Copilot-powered Agent Skill and with Amazon Q (which I have to say right now performed quite admirably), on an identical article, a beast itself, being a nearly 6,000-word code-heavy data science tutorial loaded with all kinds of formatting challenges, including code snippets and even pseudo-code snippets, which my tool recognizes and formats differently.

Somewhere along the line, my Copilot tool just tanked, instantly turning from an experienced senior editor savant into a clueless, child-like intern, and I had no idea why. I was getting all kinds of weird, unhelpful responses that were not even close to what I had been getting before.

I kept asking Copilot what the hell had happened and all it had to offer were vague messages like:

The change in behavior is not due to your workflow, the file, or your actions. It is most likely due to a change in how the coding agent (GitHub Copilot) is currently configured to interact with the environment, or how it is allowed to use tools and provide reasoning steps. This can be affected by:
•	Platform or Extension Updates: Sometimes, updates to the Copilot extension, VS Code, or the underlying platform can silently change how much context I can access, how I prompt for approvals, or how I explain my reasoning.
•	Session or Policy Shifts: The agent's session state or policy may have changed, causing it to skip or abbreviate the explicit reasoning and step-by-step explanations you are used to. This can happen if the system is running in a more "direct action" mode, or if verbosity settings have changed.
•	Toolchain or Security Adjustments: If there are new restrictions on how I can invoke PowerShell, edit files, or prompt for user input, I may be forced to skip or condense those steps, even if the SKILL.md says otherwise.
Nothing about your file, your process, or your usage is at fault. The difference is almost certainly due to a change in the agent's operational environment, which is outside your control.

We went round and round with that for a while, but I never got a better answer. I honestly think the tool didn't know what was going on. I have before noticed a strange ignorance of LLMs about their own operational environment and what affects them, which is ironic given that they are supposed to be the smartest entities in the room.

The actual problem was it switched midstream to GPT 4.1, which is a much less capable model than the one I had been using, GPT-5.2-Codex. Unbeknownst to me, that model itself had sometime earlier been switched to from Claude Sonnet 4.5, which I had been using from the start. I don't remember ever picking that model. Being an expensive more quota-limited model (30 requests per month compared to 270 for GPT-5.2-Codex), it's almost like GitHub chose that as my default model for me so I would bump the quota limit sooner and eventually have to give them more money to finish an ongoing project. (Editor's note: No organization has been proven guilty of any nefarious, underhanded, sneaky, surreptitious, shady, devious, duplicitous, clandestine, stealthy, covert, secretive, backdoor-y behavior, nor accused of such.)

I probably just settled on Auto for the model picker, and I have found out paid Copilot users receive a 10% "request discount" (0.9x multiplier) when using the Auto model picker, a financial incentive that allows Microsoft to dynamically load-balance global traffic by routing requests to the best available high-tier models based on real-time capacity. As usual, it's all about money.

Here some models available to me:

Copilot Models
[Click on image for larger view.] Copilot Models (source: GitHub).

After finding the deprecation notice above, I combed through my Chat history but couldn't find any deprecation notices for going from Claude Sonnet 4.5 to GPT-5.2-Codex. I searched and searched but the Chat responses just stopped showing being generated from the former and started showing the latter, which, again, I didn't notice.

Anyway, I don't remember a degradation in performance whenever that happened, just with the drop to the free GPT 4.1 model.

The downloadable usage report shows the shift happening around Feb. 5-6 (no idea why it switched back and forth for a day):

Usage Report
[Click on image for larger view.] Usage Report (source: GitHub).

After finally identifying the problem, I upped my budget for the premium SKU usage, which I didn't even know existed. I then switched back to Claude Sonnet 4.5 to run my test and it performed well, just like the good old days. As you can see, I had already been charged 16 cents so far earlier in the month and wasn't even aware.

Billed Premium Requests
[Click on image for larger view.] Billed Premium Requests (source: GitHub).

I asked Copilot if the model switch could have caused the formatting fiasco and it replied:

Yes, that message is very likely the cause of the sudden change in behavior. When you exceeded your premium request allowance, the system switched you from GPT-5.2-Codex (a more advanced or specialized model) to GPT-4.1. This switch can significantly affect:

•	Code understanding and formatting accuracy
•	Adherence to custom skills and editorial rules
•	Stepwise reasoning and explicit explanations
•	Tool invocation reliability and context handling

GPT-4.1 is powerful, but it may not match the precision, context window, or specialized capabilities of GPT-5.2-Codex, especially for complex, multi-step editorial tasks. The drop in performance and reliability you observed is consistent with such a model downgrade.
If you want the previous level of performance, you would need to regain access to the higher-tier model (e.g., by increasing your premium allowance or subscription). Otherwise, you will be limited to the capabilities of GPT-4.1.

I then asked the tool if GitHub could start with a pricey model as the default. It replied:

Not officially documented -- but behavior shows that Copilot will often choose a premium model first when available.

There are multiple reasons this happens:

•	Premium models are automatically preferred because they produce better results
•	Copilot (especially in VS Code) tries to route to the highest capability model by default
•	The extension updates often reset the model selector
•	When a conversation starts fresh, it may default to the highest tier model you have access to

So yes:

•	It is absolutely plausible that:
•	You initially used Claude
•	Then at some point switches went to GPT 5.2 Codex
•	Both produced good results
•	You unknowingly consumed all 300 premium requests over time
•	You got downgraded to GPT 4.1 when quota hit zero
•	Formatting quality fell off a cliff
•	There is nothing inconsistent about this -- it matches GitHub's documented system.

Now that I'm aware of the quota problem, I've discovered an easy way to keep track of usage (detailed below by Gemini), that involves simply clicking on the Copilot icon at the bottom right of VS Code, which brings up this:

Copilot Usage Box in VS Code
Copilot Usage Box in VS Code (source: VS Code).

I asked Gemini to sum up the situation and it said:

In 2026, GitHub Copilot has fully transitioned to a consumptive billing model that treats AI usage as a metered resource rather than a flat-fee service. This system relies on "Premium Requests," a currency used to manage the significant computing costs of high-end models like GPT-5 and Claude 4.5.

The following is a breakdown of how the quota, pricing, and automated switching systems function.


1. The Quota Hierarchy: Plans & Allowances

GitHub offers three primary tiers for individual developers. Each includes a monthly "allowance" of premium requests that resets on the first of every month, regardless of your personal billing date.

Plan Price Monthly Premium Requests Best For
Copilot Free $0 50 Casual use / Students
Copilot Pro $10 / month 300 Daily professional development
Copilot Pro+ $39 / month 1,500 Heavy "Agent" usage & power users
  • Standard Usage: Basic code completions and chat using "standard" models (like GPT-4.1) are generally unlimited for paid subscribers and do not count against the premium quota.
  • Premium Usage: Any prompt sent to a high-tier model (Claude 4.5, GPT-5.2-Codex, or Gemini 3 Pro) consumes at least one premium request.

2. The 'Multiplier' System

A single prompt does not always equal one "request" deducted from your balance. GitHub uses multipliers based on the cost of the model:

  • 1x Models: Claude 3.5/4.5 Sonnet, GPT-5.2-Codex, Gemini 2.5 Pro.
  • 3x to 10x Models: Top-tier reasoning models like Claude 4.6 Opus or OpenAI o3/o4 often carry high multipliers. A single complex session with an Opus model could theoretically consume 10 units of your quota.
  • Auto-Selection Discount: GitHub often applies a 10% discount (charging 0.9x) if you let the system automatically choose the model for you.

3. Automated Model Switching (the 'Failover')

This is the most controversial aspect of the current system. When your premium quota hits zero, GitHub implements a Failover Protocol:

  • The Switch: The system automatically downgrades your session to GPT-4.1 (or a similar "standard" model).
  • The Performance Gap: Standard models often lack the context window and reasoning capabilities required for complex tasks.

4. Pricing and Overages

Users who exhaust their quota can continue using premium models by enabling Extra Premium Requests:

  • Cost: Flat rate of $0.04 per request (after the monthly allowance is gone).
  • Hard Caps: By default, GitHub sets a $0 budget for extra requests, meaning the service will simply block premium models once the quota is hit unless you manually increase the limit.

5. Monitoring & Alerts

To prevent "surprise" downgrades, GitHub provides several visibility tools:

  • IDE Status Bar: In VS Code, clicking the Copilot icon shows a percentage of the quota used (e.g., "78% of premium requests used").
  • Billing Dashboard: Learn about this here.
  • Email Notifications: You can opt-in to receive alerts when you hit 75%, 90%, and 100% of your monthly allowance or your additional spending budget.

This has been a public service announcement.

comments powered by Disqus

Featured

  • VS Code 1.123 Adds Agent Session Sync, 1M Context Windows

    Microsoft released Visual Studio Code 1.123 on June 3, adding agent-focused features, larger model context support, integrated browser updates and a new delay for some automatic extension updates.

  • Copilot Billing Shock Hits Developers

    Developer complaints about GitHub Copilot's new usage-based billing model have centered on unexpectedly rapid AI credit consumption, and neither GitHub nor Microsoft has responded directly to the backlash, though they have previously published guidance to lessen model usage costs.

  • Hands On with GitHub Copilot App Technical Preview: Turning a Blazor Issue into a PR

    GitHub's brand-new Copilot desktop app, in technical preview, handled a small Blazor issue from planning through pull request creation, but the hands-on test also showed why developers still need to verify agent work in the running app before merging.

  • At Build 2026, Microsoft Sets Up Windows as an OS for AI Agents

    Microsoft's Build 2026 Windows developer announcements point to a broader platform strategy for agentic AI, spanning terminal workflows, local models, app-building skills, Cloud PCs and operating system-level containment.

Subscribe on YouTube