arXiv:2404.04037v1 Announce Type: cross
Abstract: We present InstructHumans, a novel framework for instruction-driven 3D human texture editing. Existing text-based editing methods use Score Distillation Sampling (SDS) to distill guidance from generative models. This work shows that naively using such scores is harmful to editing as they destroy consistency with the source avatar. Instead, we propose an alternate SDS for Editing (SDS-E) that selectively incorporates subterms of SDS across diffusion timesteps. We further enhance SDS-E with spatial smoothness regularization and gradient-based viewpoint sampling to achieve high-quality edits with sharp and high-fidelity detailing. InstructHumans significantly outperforms existing 3D editing methods, consistent with the initial avatar while faithful to the textual instructions. Project page: https://jyzhu.top/instruct-humans .

InstructHumans: Enhancing Instruction-driven 3D Human Texture Editing

In the field of multimedia information systems, the concept of instruction-driven 3D human texture editing plays a crucial role in enhancing the visual quality and realism of virtual characters. This emerging area combines elements from multiple disciplines, including animations, artificial reality, augmented reality, and virtual realities.

The article introduces a novel framework called InstructHumans, which aims to improve the process of instruction-driven 3D human texture editing. It addresses the limitations of existing text-based editing methods that use Score Distillation Sampling (SDS) to distill guidance from generative models. The authors argue that relying solely on these scores can harm the editing process by compromising the consistency with the source avatar.

To overcome this challenge, the researchers propose an alternative approach called Score Distillation Sampling for Editing (SDS-E). This method selectively incorporates subterms of SDS across diffusion timesteps, ensuring that edits maintain consistency with the original avatar. Furthermore, SDS-E is enhanced with spatial smoothness regularization and gradient-based viewpoint sampling to achieve high-quality edits with sharp and high-fidelity detailing.

The results of the study demonstrate that InstructHumans outperforms existing 3D editing methods in terms of preserving consistency with the source avatar while faithfully following the given textual instructions. This advancement in the field of instruction-driven 3D human texture editing paves the way for more immersive and realistic virtual experiences.

The significance of this work extends beyond the specific application of 3D human texture editing. By combining insights from animations, artificial reality, augmented reality, and virtual realities, the researchers contribute to the broader field of multimedia information systems. These interdisciplinary collaborations enable the development of more advanced and sophisticated techniques for creating and manipulating virtual content.

In conclusion, the InstructHumans framework represents a valuable contribution to the field of instruction-driven 3D human texture editing. Its novel approach addresses the limitations of existing methods and demonstrates improved consistency and fidelity in edits. This work demonstrates the importance of interdisciplinary collaboration in advancing the field of multimedia information systems and highlights its relevance to the wider domains of animations, artificial reality, augmented reality, and virtual realities.

Read the original article