OpenAI's gpt-oss Powers Hybrid AI Across Azure and Windows -- Visual Studio Magazine

OpenAI's gpt-oss Powers Hybrid AI Across Azure and Windows

By David Ramel
08/06/2025

OpenAI has released its first set of open-weight models since GPT-2, and Microsoft is accelerating developer adoption by launching these models-gpt-oss-120b and gpt-oss-20b--on both Azure AI Foundry and Windows AI Foundry. For the IT professional and developer community, this represents a pivotal shift: high-performing, customizable OpenAI large language models can now be deployed with total control--across cloud, edge, and client devices, with full support for enterprise-tailored use.

"This new era calls for tools that are open, adaptable, and ready to run wherever your ideas live--from cloud to edge, from first experiment to scaled deployment," the company said in an Aug. 5 announcement. "At Microsoft, we're building a full-stack AI app and agent factory that empowers every developer not just to use AI, but to create with it."

Two Models, Broad Reach

gpt-oss-120b:
- 120 billion parameters (with roughly 5.1 billion active at inference), designed for advanced reasoning, code generation, mathematics, and domain-specific Q&A.
- Optimized to run on a single enterprise-class GPU--specifically, the NVIDIA H100--making high-performance AI practical for both on-premises and secure cloud scenarios.
- Delivers "o4-mini level performance at a fraction of the size," according to Microsoft.
gpt-oss-20b:
- 21 billion parameters (3.6 billion active), engineered for agentic workflows, tool use, and code execution.
- Runs efficiently on Windows devices with discrete GPUs (16GB+ VRAM), and support is coming soon to macOS via Foundry Local.
- Ideal for embedding autonomous assistants or robust local inferencing in privacy- and bandwidth-sensitive environments.
Both models are planned to be API-compatible with the now-ubiquitous Responses API, streamlining migration from existing apps and speeding integration.

**[Click on image for larger view.]** Model Comparison *(source: Microsoft).*

Open Models, Open Customization

Full transparency and flexibility: With open-weight access, developers can:
- Fine-tune using parameter-efficient methods such as LoRA, QLoRA, and PEFT.
- Inject proprietary data or adapters, or retrain specific layers to match organizational needs.
- Distill or quantize models for edge deployment, export to ONNX or Triton for Kubernetes-based inferencing.
- Inspect model internals for security or compliance audits and build custom checkpoints in hours, not weeks.
Azure AI Foundry: Supports full lifecycle management--fine-tuning, versioning, and low-latency serving in the cloud.
Windows AI Foundry & Foundry Local: Enable secure, on-device inference for privacy, regulatory compliance, and low-latency performance--even offline or in air-gapped networks.

Cloud-Optional Hybrid AI
By integrating Foundry Local with Windows AI Foundry, developers and IT teams can deploy gpt-oss-20b directly on client devices running Windows (and soon macOS), without cloud dependencies. This enables compliance with the most stringent data residency, privacy, or sovereignty requirements, while also supporting low-latency inferencing at the edge. Developers can choose between fast, serverless endpoints on Azure or fully local deployments--mixing and matching to fit their organization's needs.

Pricing and Availability

gpt-oss-120b: $0.15 per million input tokens and $0.60 per million output tokens for serverless deployments via Azure.
gpt-oss-20b: Pricing depends on the Azure Machine Learning VM type under managed compute options.
Pricing may vary across providers: for instance, some public cloud partners may list gpt-oss-120b at $0.15 per million input tokens and $0.75 per million output tokens.

Why It Matters for IT Pros and Developers

No more black box AI: Full access to model weights and internals enables transparency for compliance, customization, and security.
Accelerated innovation: Fine-tuning, efficient deployment, and a large catalog of models (over 11,000 available in Azure AI Foundry) boost enterprise and developer agility.
Flexible, hybrid AI: Deploy best-in-class models wherever they're needed--cloud, on-premises, or edge--supporting evolving cloud-optional scenarios and data sovereignty requirements.

At a Glance: Key Benefits of gpt-oss on Azure and Windows AI Foundry

Feature	gpt-oss-120b	gpt-oss-20b
Parameter Count	120B (5.1B active)	21B (3.6B active)
Ideal Workloads	Reasoning, math, code, enterprise Q&A	Agentic, code execution, tool use, local AI assistants
Deployable On	Cloud (Azure), single enterprise GPU (NVIDIA H100)	Windows devices with discrete GPUs, soon macOS
API Compatibility	Upcoming: Responses API across both models for easy integration

Conclusion

OpenAI's gpt-oss launch--delivered through Azure AI Foundry and Windows AI Foundry--signals a turning point in enterprise AI adoption. Developers and organizations are no longer locked out of model internals: now, they can deeply customize, audit, and deploy cutting-edge language models with confidence and sovereignty, whether in the public cloud, private data center, or on individual endpoints.

As Microsoft's AI team puts it, "AI is no longer a layer in the stack--it's becoming the stack." With these new open-weight models, the stack is both transparent and programmable--putting true innovation into the hands of every builder.

About the Author

David Ramel is an editor and writer at Converge 360.

Printable Format

comments powered by Disqus

Featured

VS Code 1.123 Adds Agent Session Sync, 1M Context Windows

Microsoft released Visual Studio Code 1.123 on June 3, adding agent-focused features, larger model context support, integrated browser updates and a new delay for some automatic extension updates.
Copilot Billing Shock Hits Developers

Developer complaints about GitHub Copilot's new usage-based billing model have centered on unexpectedly rapid AI credit consumption, and neither GitHub nor Microsoft has responded directly to the backlash, though they have previously published guidance to lessen model usage costs.
Hands On with GitHub Copilot App Technical Preview: Turning a Blazor Issue into a PR

GitHub's brand-new Copilot desktop app, in technical preview, handled a small Blazor issue from planning through pull request creation, but the hands-on test also showed why developers still need to verify agent work in the running app before merging.
At Build 2026, Microsoft Sets Up Windows as an OS for AI Agents

Microsoft's Build 2026 Windows developer announcements point to a broader platform strategy for agentic AI, spanning terminal workflows, local models, app-building skills, Cloud PCs and operating system-level containment.

Subscribe on YouTube

.NET Insight

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 2-Day Hands-On Seminar: Copilot Studio, Microsoft Agent Framework and Foundry: Building Multi-Agent AI Systems
June 8-9, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
July 9-10, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
July 14-17, 2026

Visual Studio Live! @ Microsoft HQ
July 27-31, 2026

Visual Studio Live! @ San Diego
September 14-18, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

VSLive! 6-Week Training & Certification Course: Blazor Developer Accelerator: Hands-On Skills for Real-World .NET Teams
October 7 – November 11, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

Visual Studio Live! Orlando
November 15-20, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
December 15-18, 2026

Free Webcasts

> More Webcasts