News

Hands On with VS Code 1.112's New Image Analysis for Agents

Visual Studio Code 1.112 introduced native image support for agents.

Vision and imagery are becoming more important in agentic AI because agents increasingly need to interpret screenshots, charts, interfaces, and other visual artifacts that contain the context required to complete real-world tasks.

That functionality also provides a good topic for a hands-on, proof-of-concept article that -- while tailored to my editorial workflow -- can be adapted for developers who read Visual Studio Magazine.

The pitch is straightforward:

Setting: chat.imageCarousel.enabled, imageCarousel.explorerContextMenu.enabled

Agents can now read image files from disk and binary files natively, which allows you to use agents for a wider variety of tasks, such as analyzing screenshots, reading data from binary files, and more. Binary files are presented to the agent in a hexdump format.

When an agent or tool generates an image as output, such as a screenshot from the integrated browser, those images are now selectable in chat responses and can be opened in a dedicated image carousel view. Enable this functionality with the chat.imageCarousel.enabledsetting (Experimental).

When imageCarousel.explorerContextMenu.enabled (Experimental) is enabled, you can right-click image files or folders in the Explorer view and select Open Images in Carousel to browse images in the carousel view.

Note: The image carousel is currently experimental.
Experimental Enablement
[Click on image for larger view.] Experimental Enablement (source: Ramel).

So I tried it on a topic I've previously covered, the Microsoft Foundry AI Model Leaderboard. I took full-page captures of three trade-off charts showing quality versus cost, quality versus safety, and quality versus throughput.

The goal was not to ask for a grand market analysis or a fluffy winner-take-all narrative. I wanted a narrow, factual summary that a developer could actually use.

The prompt I used for Claude Sonnet 4.6 was:

Analyze these three screenshots from the Microsoft AI Foundry leaderboard. They feature trade-off charts based on cost, safety, and throughput relative to quality, summarize the best-performing models for developers.

Specifically:

Identify the 'Quality Leader' that maintains reasonable cost.

Identify the 'Safest' high-quality model.

Identify the 'High-Throughput' winner for production scale.

Format the response with the subhead [Category Name] for each section. Provide only factual data found in the charts without hyperbole.

After that, I asked for a table and a list to go along with the summary.

There is also a secondary workflow angle here. If you are collecting screenshots during reporting or product testing, having them live in the workspace as analyzable inputs is a lot more practical than manually transcribing every label, every plotted value, and every model name into a prompt. For me, that is the real promise: less copy-paste drudgery, more structured extraction for my journalism. Opportunities for developers are myriad.

The Response
The response was surprisingly usable on first pass, starting out with, "Based on the three screenshots showing the Microsoft AI Foundry Model Leaderboard trade-off charts" and then dutifully generating summaries with our custom subhead formatting for each category. The content was factual and aligned with the charts, though I would still recommend a human review to trim back interpretive language and ensure that approximate values were presented as such.

Quality Leader
gpt-5.3-codex (2026-07-24) holds the top quality score of 0.94 on the leaderboard. In the Quality vs. Cost trade-off chart, it sits at a competitive cost position -- the Cost highlight chart shows the lowest bars clustering around $0.14-$0.17 per 1M tokens, placing it among the most cost-efficient models at peak quality.

Quality vs Cost Trade-Off
[Click on image for larger view.] Quality vs Cost Trade-Off (source: Microsoft).

Safest
The Safety chart measures attack success rate (lower is better). In the Quality vs. Safety trade-off chart, gpt-5.3-codex and claude-opus-4-8 register near-zero attack success rates, while gpt-5.4, gpt-5.2-codex, gpt-4.2, and kimi-k2.5 range approximately 0.17%-0.35%. claude-opus-4-8 (quality score 0.93, rank #5) is the only non-OpenAI model in the top six and shares the near-zero safety failure rate alongside gpt-5.3-codex, making it the standout high-quality model for safety-sensitive deployments.

Quality vs Safety Trade-Off
[Click on image for larger view.] Quality vs Safety Trade-Off (source: Microsoft).

High-Throughput
The Throughput chart measures output tokens per second (higher is better). In the Quality vs. Throughput trade-off chart, gpt-5.3-codex leads at approximately 541-550 tokens/second. gpt-5.4 is the next closest at approximately 277-278 tokens/second, with gpt-5.2-codex, gpt-4.2, claude-opus-4-8, and kimi-k2.5 clustered between roughly 101-115 tokens/second.

Quality vs Throughput Trade-Off
[Click on image for larger view.] Quality vs Throughput Trade-Off (source: Microsoft).

The table:

Category Model Quality Score Key Metric
Quality Leader gpt-5.3-codex (2026-07-24) 0.94 ~$0.14-$0.17 per 1M tokens
Safest claude-opus-4-8 0.93 Near-zero attack success rate
High-Throughput gpt-5.3-codex (2026-07-24) 0.94 ~541-550 tokens/second

The list:

  • Quality Leader: gpt-5.3-codex (2026-07-24) — Quality score 0.94; competitive cost at approximately $0.14–$0.17 per 1M tokens.
  • Safest: claude-opus-4-8 — Quality score 0.93 (rank #5); near-zero attack success rate in the Quality vs. Safety trade-off chart.
  • High-Throughput: gpt-5.3-codex (2026-07-24) — Quality score 0.94; approximately 541–550 output tokens per second, roughly 2× faster than the next closest model.

Where the PoC Still Needs Human Review
This is not a "publish straight from chat" feature. The output still needs an editor looking at the screenshots. Approximate values are still approximate values. Phrases like "reasonable cost" are interpretive unless the chart or prompt defines a threshold. And when a response says one model is the "standout," that is the kind of wording I would usually trim back unless the chart makes the comparison unmistakable.

My takeaway from this PoC is simple: this is not the flashiest feature in VS Code 1.112, but it might be one of the more practical ones for people who work with visual source material.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Mastering AI Development and Building AI Apps with GitHub Copilot

    Two Microsoft experts explain how GitHub Copilot is evolving from a coding assistant into a broader platform for building, customizing and testing AI-powered developer workflows.

  • VS Code 1.123 Adds Agent Session Sync, 1M Context Windows

    Microsoft released Visual Studio Code 1.123 on June 3, adding agent-focused features, larger model context support, integrated browser updates and a new delay for some automatic extension updates.

  • Copilot Billing Shock Hits Developers

    Developer complaints about GitHub Copilot's new usage-based billing model have centered on unexpectedly rapid AI credit consumption, and neither GitHub nor Microsoft has responded directly to the backlash, though they have previously published guidance to lessen model usage costs.

  • Hands On with GitHub Copilot App Technical Preview: Turning a Blazor Issue into a PR

    GitHub's brand-new Copilot desktop app, in technical preview, handled a small Blazor issue from planning through pull request creation, but the hands-on test also showed why developers still need to verify agent work in the running app before merging.

Subscribe on YouTube