New AI tool makes fast, high-quality images

Researchers create a hybrid method to quickly generate realistic images, helping train self-driving cars and robots.

Self-driving cars need realistic images to learn how to avoid dangers on the road. These images must look real to make the cars safer. Generating high-quality images fast is key to this process. Artificial intelligence (AI) helps create these images.

One type of AI, called a diffusion model, makes very realistic images. A diffusion model works by starting with random noise - meaning messy, unclear pixels - and slowly cleaning it up step by step. This process takes time and uses a lot of computer power. Another type, called an autoregressive model, is faster. It predicts parts of an image one by one, like guessing puzzle pieces. These models power tools like ChatGPT but often make blurry or wrong images.

Researchers from MIT and NVIDIA built a new tool called HART. HART stands for hybrid autoregressive transformer. It mixes the speed of autoregressive models with the detail of diffusion models. First, it uses an autoregressive model to quickly sketch the big parts of an image. Then, a small diffusion model fixes the tiny details. This teamwork makes HART fast and good at creating clear images. It works nine times faster than top diffusion models and needs less computer power. People can use it on a laptop or phone by typing a simple sentence.

The best of both worlds

The researchers describe HART in a preprint published in arXiv. HART might train robots to do tricky tasks or help game designers make cool scenes. Diffusion models usually take 30 steps to finish an image, but HART’s diffusion part only needs eight steps. This is because it just adds details after the fast model does most of the work. The researchers say it’s like painting a picture: start big, then add small touches.

The team faced challenges combining the two models. Early tries caused mistakes to pile up. They fixed this by using the diffusion model only at the end. HART’s smart design beats bigger models while using less power. In the future, it could work with vision-language tools or even make videos and sounds.

#GenerativeModels

New AI tool makes fast, high-quality images

The best of both worlds

Related Articles

Comments on this article

Mindplex

QUICK LINKS

ABOUT US

CONTACT