In-Depth
Hands On with Copilot Vision: VS Code's Head Start and How the IDE Is Catching Up
Hey devs, when anybody can feed an image to Visual Studio Copilot and create a fully navigation-enabled web app in one fell swoop, what will YOU be doing?
Almost a year ago, I showed how GitHub Copilot in Visual Studio Code Insiders could use an image as the starting point for code generation, turning a provided mockup into actual HTML and CSS code that could serve as the starting point for a working web site.
Now, that vision-driven workflow is more integrated in VS Code, and Visual Studio is starting to offer its own image-aware Copilot experience.
This hands-on look revisits the original idea, checks what has improved in VS Code since then, and tests what the Visual Studio IDE can do today with the same kind of visual prompt.
The question now is what has changed since that early Insiders-era demo -- and whether Visual Studio has anything comparable. Using the same mockup image as input, I ran a simple hands-on test in both VS Code and Visual Studio, focusing on what each tool can generate from an image prompt today and how much manual iteration is still required.
Revisiting the Same Image in Today's VS Code
To keep the comparison as clean as possible, I reused the same degraded mockup image from that original proof of concept. Last year, Copilot used it to generate HTML and CSS for a single placeholder page -- useful as a starting point, but not a functioning site.
I don't remember what exact prompt I used last year, but I had my AI assistant ChatGPT help me craft something similar. Here's the prompt I used:
Using this image as the source of truth, generate a responsive HTML/CSS layout that matches the structure and layout shown. Focus on semantic markup and reasonable defaults rather than pixel-perfect styling.
Here's the poor-quality AI-generated starter image I used last year and this year:
[Click on image for larger view.] AI-Generated Mockup Image (source: Ramel).
Here's what the February 2025 Insiders build spit out; I had to add the Visual Studio Magazine image with guidance from Copilot.
[Click on image for larger view.] Generated Home Page from February 2025 (source: Ramel).
Running that same image through today's VS Code Copilot produces a very different outcome. Instead of stopping at a static layout, Copilot was able to generate a complete, navigation-capable web site, including multiple pages such as About, Services, and Contacts, all wired together with working navigation.
Here's the result from this week in the latest VS Code Insiders build (v1.109.0-insider):
[Click on image for larger view.] Insiders Build Generated Home Page from January 2026 (source: Ramel).
In the stable VS Code build, this happened quickly -- on the order of about a minute -- and produced a functional result with minimal back-and-forth. The generated site was clearly more complete than last year's placeholder output, even if the overall design was fairly straightforward.
Here's the result in the latest stable build, VS Code v1.108:
[Click on image for larger view.] Stable Build Generated Home Page from January 2026 (source: Ramel).
As you can see, it's sub-optimal, messing up the nav bar placeholders, which differ from the corresponding blocks below. Even if it got the nav bar items wrong due to the poor quality of the starter image, they should have matched in both places.
Using the Insiders build with the same image and prompt eventually yielded an even more polished result, with better imagery and a more cohesive visual presentation. In this case, however, the process involved long-running background operations and required significantly more waiting before the final site appeared. It is an Insiders build, after all, where bleeding-edge tech is tested. The spinning circle icons cycled so long I actually thought it had frozen and went to the gym to work off some frustration, but when I came back, the stuck processes had completed. That might have been specific to this project, and the end result showed how far Copilot's vision-based workflows have progressed since the original experiment.
How Copilot Vision Evolved in VS Code
Part of what makes this comparison possible is how Copilot's vision capabilities have changed in VS Code over the past year. When I first tested the feature, image-based prompting lived in an experimental Insiders workflow and felt more like a preview than a core development feature. Since then, image input has become more tightly integrated into Copilot Chat, no longer requiring a separate extension (like last year) or special setup.
That integration matters. Instead of treating images as a novelty, Copilot now uses them as first-class input alongside text prompts, allowing it to reason about layout, structure, and navigation in a way that goes well beyond generating a single placeholder page. The result isn't instant or one-click, but it is significantly more capable and cohesive than the early proof-of-concept experience.
This maturation in VS Code sets the bar for what "vision-aware" development looks like today -- which naturally raises the question of how much of that experience has made its way into the Visual Studio IDE.
What Vision Support Looks Like in the Visual Studio IDE
Visual Studio now includes its own form of image-aware Copilot Chat, allowing developers to attach screenshots or mockups as part of a prompt. That said, the experience is not a direct counterpart to what VS Code offers. There is no image-to-site or image-to-UI workflow in the IDE that mirrors VS Code's ability to scaffold a working web site from a visual prompt.
Instead, images in Visual Studio function as additional context for Copilot rather than as the primary driver of generation. Developers still need to describe what they want built, with the image serving as a reference rather than the source of truth. In practice, that means Copilot can help interpret a design and suggest UI code, but it doesn't independently translate an image into a runnable application.
Here's what documentation says it can do:
- In Visual Studio 17.14 and later, you can attach up to three images per prompt (PNG, JPEG, GIF).
- Copilot will analyze any attached images and use that visual context to inform its suggestions. This includes scenarios such as:
- UI development guidance (e.g., design interpretation)
- Debugging (using a screenshot of an error)
- Better context for code-related prompts
To see how far that approach can go today, I ran a simple hands-on test in the Visual Studio IDE using the same mockup image, focusing on what Copilot can generate when vision is used as contextual input rather than as the starting point for code generation.
Hands-On with Vision in the Visual Studio IDE
For the Visual Studio side of the comparison, I created a blank ASP.NET Core Empty project and attached the same degraded mockup image directly to a Copilot Chat prompt in the IDE.
In opening up the IDE, Copilot Chat appears immediately in the right-hand pane. As you can see it says it can optimize and fix code and write unit tests:
[Click on image for larger view.] IDE Copilot Side Pane (source: Ramel).
Its model defaulted to Claude Sonnet 4.5, a "Premium" model, with 16 other options including the Gemini and GPT families and even a Grok "Standard" model. In the model pick list you are informed with a "Variable" designation, like 1x for the default and 0x for the Grok model, which indicates how much compute is spent per request. Last year in VS Code it was defaulted to GPT-4o.
As a first step, I asked Copilot to describe the layout shown in the image, which it did accurately, identifying the major regions, navigation elements, and content blocks:
Look at the attached image and describe the layout and main sections of the page.
I then explicitly prompted Copilot to generate a simple single-page HTML file with inline CSS based on that image:
Using the attached image as a reference and your layout description, generate a simple single-page HTML file with inline CSS that approximates this layout. Keep it minimal and runnable.
That did this:
[Click on image for larger view.] Visual Studio Community Edition 18.2.0 Generated Home Page (source: Ramel).
The resulting page rendered correctly and broadly matched the structure of the mockup, but it was more simplistic than the Insiders VS Code output and included some visual and semantic glitches, such as headings rendered in an unexpected language, because of the degraded starter image, and Copilot helped me fix that stuff. Visually, it most closely resembled the VS Code Stable result, which itself tracked more closely with the original starter image. Interestingly, it also has a mismatch of nav bar item names and the corresponding blocks below.
I was able to push the IDE workflow further and end up with a navigation-capable site, with sub pages for About, Services, and Contacts, but only with more manual assembly than in VS Code. Visual Studio did not scaffold the sub-pages on its own, so I had to create each page manually and then paste in Copilot-generated HTML for each one. There were different options to see the output in a browser, including running a web server, which required me to manually create a wwwroot folder in the project's File Explorer folder and manually move the index.html file into it. I chose to open the index.html file in a browser to view the output The end result worked -- it even had some really nice features like a really professional-looking send-in-comment box with real text-input fields:
[Click on image for larger view.] Visual Studio Community Edition 18.2.0 Generated Contact Page (source: Ramel).
My demo reinforced the familiar pattern of vision-driven development, with Copilot in VS Code first sporting more advanced vision capabilities.
That workflow difference underscores the current state of vision support in the Visual Studio IDE. Images can inform Copilot's responses and help guide code generation, but they do not drive the process or produce a fully scaffolded application. In contrast, VS Code's image-driven workflow takes more initiative, creating files and expanding the result with less manual intervention.
Where Vision-Driven Development Stands Today
Taken together, these hands-on tests show how differently image-based prompting is currently applied across Microsoft's development tools. In VS Code, vision has evolved into a more integrated, image-driven workflow that can take initiative, scaffold files, and expand a visual prompt into a working site. In the Visual Studio IDE, vision support is present, but it plays a more constrained role, providing contextual guidance rather than driving full code generation.
That gap is not necessarily a flaw. VS Code has long served as the proving ground for experimental Copilot capabilities, while Visual Studio tends to adopt them more conservatively. For now, developers looking to explore vision-driven generation will find the most complete experience in VS Code, with the IDE offering a useful but more manual alternative. While VS Code currently leads in vision-driven generation, Visual Studio's strengths lie elsewhere -- in deep solution awareness, debugging, and build-integrated assistance that VS Code does not attempt to replicate.
I don't know why the Visual Studio IDE Copilot can't create files and inject code into them -- and create a wwwroot folder and stick the index.html file into it and kick off a web server -- but I expect it will continue to evolve in this area and eventually create a fully navigation-enabled working web app in one fell swoop -- the ultimate in vibe coding.
But then, what will you be doing?
About the Author
David Ramel is an editor and writer at Converge 360.