News
Azure AI Foundry Gets 'Computer-Using Agent' for Autonomous GUI Interaction
Microsoft is expanding functionality for agentic AI into its Azure AI Foundry platform, furthering one of the hottest areas of development right now.
The company this week announced two new features, a Responses API and a Computer-Using Agent (CUA), for its all-in-one platform for building transformative AI apps and agents, formerly called AI Studio.
[Click on image for larger view.] Azure AI Foundry (source: Microsoft).
The Responses API simplifies AI application development by providing a unified interface for retrieval, reasoning, and execution, while the CUA autonomously interacts with computer systems to execute tasks, bridging the gap between AI and real-world application control.
The CUA is described as a specialized AI model in Azure OpenAI Service that enables AI to interact with GUIs, navigate applications, and automate multi-step tasks via natural language instructions, a step up from automation tools that rely on predefined scripts or API-based integrations.
The tech is based on OpenAI's Computer-Using Agent announced in January, when the Microsoft partner touted "the flexibility to perform digital tasks without using OS- or web-specific APIs."
[Click on image for larger view.] Computer-Using Agent (CUA) (source: OpenAI).
Microsoft on Tuesday detailed these unique abilities of the offering:
- Autonomous UI navigation: Can open applications, click buttons, fill out forms, and navigate multi-page workflows.
- Dynamic adaptation: Interprets UI changes and adjusts actions accordingly, reducing reliance on rigid automation scripts.
- Cross-application task execution: Operates across web-based and desktop applications, integrating disparate systems without API dependencies.
- Natural language command interface: Users can describe a task in plain language, and CUA determines the correct UI interactions to execute.
The Responses API fits into the scheme by providing a structured response format that allows AI to interact with multiple tools while maintaining context across interactions, supporting:
- Tool calling in one simple API call: Now, developers can seamlessly integrate AI tools, making execution more efficient.
- Computer use: Use the computer use tool within the Responses API to drive automation and execute software interactions.
- File search: Interact with enterprise data dynamically and extract relevant information.
- Function calling: Develop and invoke custom functions to enhance AI capabilities.
- Chaining responses into conversations: Keep track of interactions by linking responses together using unique response IDs, ensuring continuity in AI-driven dialogues.
- Enterprise-grade data privacy: Built with Azure's trusted security and compliance standards, ensuring data protection for organizations.
An accompanying video from Marco Casalaina, VP of products for CoreAI and an AI Futurist at Microsoft, shows the new tooling in action. He used the two new features to demonstrate the automation of a routine task on a Linux virtual machine, where the AI autonomously navigates a website to download a shipment PDF, extracts and retains key information, inputs it into another site, and prompts for human confirmation before final submission.
"As you can see, these tools offer some amazing possibilities for automating workflows and enhancing productivity across various industries," Casalaina said. "Azure AI Foundry continues to push the boundaries of what's possible with AI-driven automation, and we're excited to see how you'll innovate with these powerful tools."
Microsoft said developers can immediately start building with CUA, while enterprises will soon gain access to Responses API and CUA in Azure OpenAI Service, with future plans to integrate CUA automation into Windows 365 and Azure Virtual Desktop for seamless deployment on Cloud PCs and VMs with enterprise-grade security and compliance.
Speaking to the latter, Microsoft hinted at the possible challenges that come with increased AI autonomy, which in popular culture often lead to doomsday, AI-kills-humanity scenarios.
"As AI systems become more autonomous, ensuring security, reliability, and alignment with human intent is critical," the company said. "The CUA model is one of the first agentic AI models capable of directly interacting with software environments, bringing new challenges in misuse prevention, unintended actions, and adversarial risks. To address these, Microsoft and OpenAI have implemented a multi-layered safety approach spanning the model, system, and deployment levels."
About the Author
David Ramel is an editor and writer at Converge 360.