GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

CVPR 2024

Taoran Yi¹, Jiemin Fang^2‡, Junjie Wang², Guanjun Wu¹, Lingxi Xie², Xiaopeng Zhang²,

Wenyu Liu¹, Qi Tian², Xinggang Wang^1‡✉

¹Huazhong University of Science and Technology ²Huawei Inc.

^‡Project lead. ^✉Corresponding author.

Paper Code Huggingface demo

Abstract

In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can help generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D object generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our GaussianDreamer can generate a high-quality 3D instance or 3D avatar within 15 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time.

Framework

Overall framework of GaussianDreamer. Firstly, we utilize a 3D diffusion model to generate the initialized point clouds. After executing noisy point growing and color perturbation on the point clouds, we use them to initialize the 3D Gaussians. The initialized 3D Gaussians are further optimized using the SDS method with a 2D diffusion model. Finally, we render the image using the 3D Gaussians by employing 3D Gaussian Splatting. We can use one of various 3D diffusion models to generate the initialized point clouds. In this case, we take text-to-3D and text-to-motion diffusion models as examples.

Training Process

A 3D instance can be generated within 15 minutes on one GPU, much faster than previous methods, and can be directly rendered in real time.

Video

Comparison Results

Qualitative comparisons between our method and DreamFusion, Magic3D, Fantasia3D and ProlificDreamer.

Generation with Ground

We use the point clouds with the added ground to initialize the 3D Gaussians..

airplane, fighter, steampunk style, ultra realistic, 4k, HD

a fox

ferrari convertible, trending on artstation, ultra realistic, 4k, HD

More Generated Samples

More generated samples by our GaussianDreamer.

ferrari convertible, trending on artstation, ultra realistic, 4k, HD

flamethrower, with fire, scifi, cyberpunk, photorealistic, 8K, HD

a zoomed out DSLR photo of an amigurumi motorcycle

fries and a hamburger

a DSLR photo of a teapot shaped like an elephant head where its snout acts as the spout

magic dagger, mistery, ancient, photorealistic, 8K, HD

a zoomed out DSLR photo of a lion's mane jellyfish

a fox

a freshly baked loaf of sourdough bread on a cutting board

Blue and white porcelain Viking axe

a DSLR photo of a small saguaro cactus planted in a clay pot

a delicious hamburger

an airplane made out of wood

a DSLR photo of a pair of headphones sitting on a desk

Viking axe, fantasy, weapon, blender, 8k, HD

magic gun, game asset, mistery, photorealistic, 8K, HD

airplane, fighter, steampunk style, ultra realistic, 4k, HD

a DSLR photo of a bagel filled with cream cheese and lox

a DSLR photo of a wine bottle and full wine glass on a chessboard

sniper rifle, asset, scifi, cyberpunk, photorealistic, 8K, HD

a golden goblet, low poly

a plate of delicious tacos

an opulent couch from the palace of Versailles

mushroom boss, cute, arms and legs, big eyes, game, character, render, best quality, super detailed, 4K, HD

a DSLR photo of a steaming basket full of dumplings

a panda wearing a necktie and sitting in an office chair

a beautiful dress made out of fruit, on a mannequin. Studio lighting, high quality, high resolution

a DSLR photo of an ice cream sundae

a silver platter piled high with fruits

a spanish galleon sailing on the open sea

Paint the SMPL

Generate examples using the SMPL initialization. The SMPL is generated using text prompt through MDM.

Someone kicks with his left leg

Iron man kicks with his left leg

Hulk kicks with his left leg

The man jumped down from the sky

Link in Zelda jumped down from the sky

Batman jumped down from the sky

Application

Import the generated 3D assets into the Unity game engine to become materials for games and designs with the help of UnityGaussianSplatting .

Generated by GaussianDreamer.

Import the generated 3D assets into the Unity game engine.

More Research
NeuSample, TiNeuVox, GNeuVox, Segment Anything in 3D with NeRFs, 4D-GS GaussianDreamerPro

BibTeX

@inproceedings{yi2023gaussiandreamer, title={GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models}, author={Yi, Taoran and Fang, Jiemin and Wang, Junjie and Wu, Guanjun and Xie, Lingxi and Zhang, Xiaopeng and Liu, Wenyu and Tian, Qi and Wang, Xinggang}, year = {2024}, booktitle = {CVPR} }

Website template from DreamFusion. We thank the authors for the open-source code.

A highly detailed stone bust of Carl Friedrich Gauss

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

CVPR 2024

Abstract

Framework

Training Process

Video

Comparison Results

Generation with Ground

airplane, fighter, steampunk style, ultra realistic, 4k, HD

a fox

ferrari convertible, trending on artstation, ultra realistic, 4k, HD

More Generated Samples

ferrari convertible, trending on artstation, ultra realistic, 4k, HD

flamethrower, with fire, scifi, cyberpunk, photorealistic, 8K, HD

a zoomed out DSLR photo of an amigurumi motorcycle

fries and a hamburger

a DSLR photo of a teapot shaped like an elephant head where its snout acts as the spout

magic dagger, mistery, ancient, photorealistic, 8K, HD

a zoomed out DSLR photo of a lion's mane jellyfish

a fox

a freshly baked loaf of sourdough bread on a cutting board

Blue and white porcelain Viking axe

a DSLR photo of a small saguaro cactus planted in a clay pot

a delicious hamburger

an airplane made out of wood

a DSLR photo of a pair of headphones sitting on a desk

Viking axe, fantasy, weapon, blender, 8k, HD

magic gun, game asset, mistery, photorealistic, 8K, HD

airplane, fighter, steampunk style, ultra realistic, 4k, HD

a DSLR photo of a bagel filled with cream cheese and lox

a DSLR photo of a wine bottle and full wine glass on a chessboard

sniper rifle, asset, scifi, cyberpunk, photorealistic, 8K, HD

a golden goblet, low poly

a plate of delicious tacos

an opulent couch from the palace of Versailles

mushroom boss, cute, arms and legs, big eyes, game, character, render, best quality, super detailed, 4K, HD

a DSLR photo of a steaming basket full of dumplings

a panda wearing a necktie and sitting in an office chair

a beautiful dress made out of fruit, on a mannequin. Studio lighting, high quality, high resolution

a DSLR photo of an ice cream sundae

a silver platter piled high with fruits

a spanish galleon sailing on the open sea

Paint the SMPL

Someone kicks with his left leg

Iron man kicks with his left leg

Hulk kicks with his left leg

The man jumped down from the sky

Link in Zelda jumped down from the sky

Batman jumped down from the sky

Application

Generated by GaussianDreamer.

Import the generated 3D assets into the Unity game engine.

More Research

BibTeX