How the New ChatGPT Vision (GPT-4V) Can Help 3D Designers

In late September, OpenAI began the rollout of its first truly multimodal generative AI system. Called ChatGPT Vision (or GPT-4V for short), the new system allows ChatGPT to understand and chat about images, not just text.
The new system is rolling out for users over the next month, but we were one of the first groups to get access!
Of course, we immediately began testing ChatGPT Vision on HDRi Maps , and experimenting with ways that it could help 3D designers improve their workflows.
Here are some of the helpful things ChatGPT Vision can do for the 3D design community, based on our early experiments.

Describing the Light in an HDRi Map

Our high-quality HDRi Maps include both visuals and the lighting data needed for Image Based Lighting. 
As a human looking at an HDRi Map, though, it can be challenging to quickly understand the light sources present in the image. Often, you need to load up the image in a DCC and add a mirror ball or basic model in order to test the lighting data.
ChatGPT Vision simplifies that process. We gave it one of our HDRi Maps, and asked it to describe the light sources in the image. We also asked for specific info related to automotive rendering.
It responded with a detailed breakdown:

Ideas for Automotive Renderings Using a Specific HDRi Map

Next, we asked GPT-4V to help brainstorm ideas for vehicles that would work well for an automotive rendering in the scene.
From the image, the system correctly understood that the HDRi Map depicts a coastal bridge (Bixby Bridge). 
It provided practical and helpful ideas for the kinds of vehicles that would be most appropriate to render in this scene, as well as explanations for its choices.

Background Ideas from a Vehicle Photo

As a final test, we gave ChatGPT Vision a photo of a real-life classic car. We then asked it to come up with ideas for an ideal background, which we could use to showcase the vehicle in an automotive rendering.
Here are some of the ideas it presented. Clearly, ChatGPT Vision understood what it was seeing, and what kinds of environments would show off this stunning vehicle!
We then asked the system for some keywords we could enter into CGI.Backgrounds’ search system in order to find an ideal HDRi Map for its second concept, a Historic City Street.
The system gave us several keywords, including “cobbled streets historic”. We entered them into the CGI.Backgrounds’ search, and found this HDRi Map, which would be ideal for showcasing a digital twin of our target vehicle.
ChatGPT Vision isn't just a tool for ideation--it can also help to guide your search for the ideal background.

Conclusion

Multimodal AI systems like ChatGPT Vision are still in their infancy. But it’s already clear that these systems will be a powerful ideation tool for 3D designers.
When these systems are ultimately combined with generative AI image creator for 3D rendering, they will become even more powerful. Imagine a system that could not only provide ideas for an ideal background, but could actually generate CGI backgrounds using AI!
We’ll keep experimenting with cutting-edge AI and sharing our results with you. Join our newsletter to keep up to speed on all our experiments.

Author

  • Thomas Smith

    Director of Communications

    Thomas Smith is a professional journalist, photographer, and CEO of Gado Images, an AI-driven content agency. Smith uses his degree in Cognitive Science from Johns Hopkins University and 10+ years of photography industry experience to provide insight on industry trends.