Hands On with VS Code 1.112's New Image Analysis for Agents -- Visual Studio Magazine

Hands On with VS Code 1.112's New Image Analysis for Agents

By David Ramel
03/19/2026

Visual Studio Code 1.112 introduced native image support for agents.

Vision and imagery are becoming more important in agentic AI because agents increasingly need to interpret screenshots, charts, interfaces, and other visual artifacts that contain the context required to complete real-world tasks.

That functionality also provides a good topic for a hands-on, proof-of-concept article that -- while tailored to my editorial workflow -- can be adapted for developers who read Visual Studio Magazine.

The pitch is straightforward:

Setting: chat.imageCarousel.enabled, imageCarousel.explorerContextMenu.enabled

Agents can now read image files from disk and binary files natively, which allows you to use agents for a wider variety of tasks, such as analyzing screenshots, reading data from binary files, and more. Binary files are presented to the agent in a hexdump format.

When an agent or tool generates an image as output, such as a screenshot from the integrated browser, those images are now selectable in chat responses and can be opened in a dedicated image carousel view. Enable this functionality with the chat.imageCarousel.enabledsetting (Experimental).

When imageCarousel.explorerContextMenu.enabled (Experimental) is enabled, you can right-click image files or folders in the Explorer view and select Open Images in Carousel to browse images in the carousel view.

Note: The image carousel is currently experimental.

**[Click on image for larger view.]** Experimental Enablement *(source: Ramel).*

So I tried it on a topic I've previously covered, the Microsoft Foundry AI Model Leaderboard. I took full-page captures of three trade-off charts showing quality versus cost, quality versus safety, and quality versus throughput.

The goal was not to ask for a grand market analysis or a fluffy winner-take-all narrative. I wanted a narrow, factual summary that a developer could actually use.

The prompt I used for Claude Sonnet 4.6 was:

Analyze these three screenshots from the Microsoft AI Foundry leaderboard. They feature trade-off charts based on cost, safety, and throughput relative to quality, summarize the best-performing models for developers.

Specifically:

Identify the 'Quality Leader' that maintains reasonable cost.

Identify the 'Safest' high-quality model.

Identify the 'High-Throughput' winner for production scale.

Format the response with the subhead [Category Name] for each section. Provide only factual data found in the charts without hyperbole.

After that, I asked for a table and a list to go along with the summary.

There is also a secondary workflow angle here. If you are collecting screenshots during reporting or product testing, having them live in the workspace as analyzable inputs is a lot more practical than manually transcribing every label, every plotted value, and every model name into a prompt. For me, that is the real promise: less copy-paste drudgery, more structured extraction for my journalism. Opportunities for developers are myriad.

The Response
The response was surprisingly usable on first pass, starting out with, "Based on the three screenshots showing the Microsoft AI Foundry Model Leaderboard trade-off charts" and then dutifully generating summaries with our custom subhead formatting for each category. The content was factual and aligned with the charts, though I would still recommend a human review to trim back interpretive language and ensure that approximate values were presented as such.

Quality Leader
gpt-5.3-codex (2026-07-24) holds the top quality score of 0.94 on the leaderboard. In the Quality vs. Cost trade-off chart, it sits at a competitive cost position -- the Cost highlight chart shows the lowest bars clustering around $0.14-$0.17 per 1M tokens, placing it among the most cost-efficient models at peak quality.

Safest
The Safety chart measures attack success rate (lower is better). In the Quality vs. Safety trade-off chart, gpt-5.3-codex and claude-opus-4-8 register near-zero attack success rates, while gpt-5.4, gpt-5.2-codex, gpt-4.2, and kimi-k2.5 range approximately 0.17%-0.35%. claude-opus-4-8 (quality score 0.93, rank #5) is the only non-OpenAI model in the top six and shares the near-zero safety failure rate alongside gpt-5.3-codex, making it the standout high-quality model for safety-sensitive deployments.

High-Throughput
The Throughput chart measures output tokens per second (higher is better). In the Quality vs. Throughput trade-off chart, gpt-5.3-codex leads at approximately 541-550 tokens/second. gpt-5.4 is the next closest at approximately 277-278 tokens/second, with gpt-5.2-codex, gpt-4.2, claude-opus-4-8, and kimi-k2.5 clustered between roughly 101-115 tokens/second.

The table:

Category	Model	Quality Score	Key Metric
Quality Leader	gpt-5.3-codex (2026-07-24)	0.94	~$0.14-$0.17 per 1M tokens
Safest	claude-opus-4-8	0.93	Near-zero attack success rate
High-Throughput	gpt-5.3-codex (2026-07-24)	0.94	~541-550 tokens/second

The list:

Quality Leader: gpt-5.3-codex (2026-07-24) — Quality score 0.94; competitive cost at approximately $0.14–$0.17 per 1M tokens.
Safest: claude-opus-4-8 — Quality score 0.93 (rank #5); near-zero attack success rate in the Quality vs. Safety trade-off chart.
High-Throughput: gpt-5.3-codex (2026-07-24) — Quality score 0.94; approximately 541–550 output tokens per second, roughly 2× faster than the next closest model.

Where the PoC Still Needs Human Review
This is not a "publish straight from chat" feature. The output still needs an editor looking at the screenshots. Approximate values are still approximate values. Phrases like "reasonable cost" are interpretive unless the chart or prompt defines a threshold. And when a response says one model is the "standout," that is the kind of wording I would usually trim back unless the chart makes the comparison unmistakable.

My takeaway from this PoC is simple: this is not the flashiest feature in VS Code 1.112, but it might be one of the more practical ones for people who work with visual source material.

About the Author

David Ramel is an editor and writer at Converge 360.

Printable Format

comments powered by Disqus

Featured

The AI-Powered Software Development Lifecycle

René van Osnabrugge makes the case that AI's biggest opportunity in software development is not faster coding -- it's reducing the friction everywhere else in the SDLC.
Copilot Usage-Based Billing Gets a Token Dashboard

Microsoft is keeping Visual Studio's new built-in Agent Skills switched off by default while a public dashboard measures whether their performance gains justify the additional tokens they may consume.
VS Code 1.129 Introduces Agent Host and Experimental Agents Window Editor

Visual Studio Code 1.129 adds a dedicated process for running AI agent sessions and an experimental docked editor for reviewing agent-generated changes.
.NET 11 Preview 6 Roundup: ASP.NET Core, MAUI, C#, EF Core and SDK Updates

Microsoft's sixth .NET 11 preview advances async validation, C# unions, cross-platform UI controls, database queries, testing tools, Native AOT and container images.