NEH Announces Grant Program for Trump’s National Heroes Garden Statues

Analyzing the Key Points

– The National Endowment for the Humanities (NEH) has launched a grant program for the design and creation of statues for President Trump’s National Garden of American Heroes.
– The sculpture garden is a priority for the 250th anniversary of the Declaration of Independence and will feature life-size statues of 250 individuals who contributed to America’s cultural, scientific, economic, and political heritage.
– The garden’s location is yet to be determined, but it is intended to be a public space where Americans can gather to honor and learn about American heroes.
– Interested applicants, who must be US citizens, can submit a two-dimensional or three-dimensional graphic representation of up to three statues of selected individuals, accompanied by a project description and work plan.
– The application deadline is July 1.
– The list of figures to be depicted includes historical figures like George Washington, Abraham Lincoln, Sacagawea, Alexander Graham Bell, the Rev. Dr. Martin Luther King Jr., and the Wright brothers, as well as figures like Kobe Bryant, Julia Child, Alex Trebek, and Hannah Arendt.
– Selected artists will receive awards of up to 0,000 per statue, and the statues must be made of marble, granite, bronze, copper, or brass.
– The sculptures should be depicted in a realistic manner, with no modernist or abstract designs allowed.
– The NEH and the National Endowment for the Arts have jointly committed a total of million for the sculpture garden.
– The funding for the sculpture garden comes from federal grants that were initially distributed to arts and cultural groups across the United States but were later cancelled by the Trump administration.
– One of the cancelled grants was the NEH Fellowships and Awards for Faculty, worth ,000, which affected Dr. Say Burgin, an assistant professor of history at Dickinson College. Burgin had planned to use the grant for research related to American Civil Rights and Black Power movements.
– Burgin expressed disappointment with the decision to prioritize the sculpture garden over other grants and suggested that the funds could have been better used to support artists like Amos Kennedy Jr. to tell Black history in their own way.

Potential Future Trends

The establishment of President Trump’s National Garden of American Heroes and the associated grant program for statues has the potential to drive several future trends:

1. Increased Public Art Installations: The creation of the sculpture garden will likely inspire other cities and institutions to invest in public art installations. Communities may seek to honor local heroes or historical figures, both through traditional sculptures and more contemporary art mediums.

2. Controversies and Debates: The selection of figures and the restriction on artistic styles in the sculpture garden may spark controversies and debates. The inclusion or exclusion of certain individuals will undoubtedly generate discussions about American history, values, and representation.

3. Revisiting Historical Narratives: As the sculpture garden highlights the contributions of various individuals to American heritage, it may prompt a reassessment of history and encourage further investigation into lesser-known figures and events. Scholars, researchers, and artists may delve deeper into those stories that have been overlooked or marginalized.

4. Emphasis on Realism and Traditional Sculpture: The requirement for realistic statues made from traditional materials may lead to a resurgence in classical sculptural techniques and craftsmanship. Artists could embrace traditional methods and materials while still incorporating contemporary elements to create engaging and thoughtful artworks.

5. Greater Financial Support for the Arts: The commitment of million towards the sculpture garden reflects the importance placed on public art. This may encourage increased funding and support for the arts sector, both from government agencies and private donors, leading to expanded opportunities for artists and cultural organizations.

Predictions and Recommendations for the Industry

Based on the analysis of the key points and potential future trends, the following predictions and recommendations can be made for the industry:

1. Diversify Representation: In response to the controversies surrounding the selection of figures for the sculpture garden, future public art projects should strive for a more inclusive representation. Incorporating diverse voices and perspectives ensures a more comprehensive portrayal of American history, allowing for a richer recognition of the nation’s cultural mosaic.

2. Flexibility in Artistic Styles: While the sculpture garden emphasizes realism, it is crucial to recognize that different artistic styles have the power to engage viewers and convey meaning. Future projects should embrace a range of artistic expressions, encouraging artists to explore alternative approaches that convey their unique interpretations of American history and heroism.

3. Support Research and Education: Alongside funding public art projects, it is essential to allocate resources for research and education. Grants should be made available to scholars, historians, and educators to further investigate and document the histories and contributions of underrepresented groups. This would enrich the narratives surrounding American heroes and ensure a more inclusive understanding of the nation’s past.

4. Collaborate with Local Communities: Public art projects should actively involve local communities in the decision-making process. By partnering with community organizations, artists can gain valuable insights into local histories, cultures, and heroes. This collaboration ensures that the resulting artworks resonate with the people they represent and contribute to community cohesion and pride.

5. Encourage Experimental Art Forms: While the sculpture garden focuses on traditional sculptures, future projects can explore experimental and interactive art forms. Embracing new mediums such as digital installations, augmented reality, or performance art can introduce innovative ways of engaging audiences and fostering a deeper appreciation for American heroes.

By learning from the controversies and challenges associated with the National Garden of American Heroes, the arts industry can evolve and adapt to create more inclusive, thought-provoking, and impactful public art projects.

References:

1. National Endowment for the Humanities. (2021). NEH Announces Grant Program for Statues in President Trump’s National Garden of American Heroes. Retrieved from [https://www.neh.gov/news/neh-announces-grant-program-statues-president-trumps-national-garden-american-heroes](https://www.neh.gov/news/neh-announces-grant-program-statues-president-trumps-national-garden-american-heroes)

2. The New York Times. (2021). Trump Administration Unveils ‘National Heroes’ Garden. Retrieved from [https://www.nytimes.com/2021/01/18/arts/trump-national-garden-american-heroes.html](https://www.nytimes.com/2021/01/18/arts/trump-national-garden-american-heroes.html)

3. ARTnews. (2021). Dr. Say Burgin Among Professors Affected by Trump Statue Commission Decision. Retrieved from [https://www.artnews.com/art-news/news/dr-say-burgin-among-professors-affected-by-trump-statue-commission-decision-1234585321/](https://www.artnews.com/art-news/news/dr-say-burgin-among-professors-affected-by-trump-statue-commission-decision-1234585321/)

Title: “Introducing Chinese-LiPS: A Multimodal Dataset for Audio-Visual Speech

Title: “Introducing Chinese-LiPS: A Multimodal Dataset for Audio-Visual Speech

arXiv:2504.15066v1 Announce Type: new
Abstract: Incorporating visual modalities to assist Automatic Speech Recognition (ASR) tasks has led to significant improvements. However, existing Audio-Visual Speech Recognition (AVSR) datasets and methods typically rely solely on lip-reading information or speaking contextual video, neglecting the potential of combining these different valuable visual cues within the speaking context. In this paper, we release a multimodal Chinese AVSR dataset, Chinese-LiPS, comprising 100 hours of speech, video, and corresponding manual transcription, with the visual modality encompassing both lip-reading information and the presentation slides used by the speaker. Based on Chinese-LiPS, we develop a simple yet effective pipeline, LiPS-AVSR, which leverages both lip-reading and presentation slide information as visual modalities for AVSR tasks. Experiments show that lip-reading and presentation slide information improve ASR performance by approximately 8% and 25%, respectively, with a combined performance improvement of about 35%. The dataset is available at https://kiri0824.github.io/Chinese-LiPS/

Incorporating Multimodal Visual Cues for Audio-Visual Speech Recognition

Automatic Speech Recognition (ASR) tasks have greatly benefited from the inclusion of visual modalities. However, existing Audio-Visual Speech Recognition (AVSR) datasets and methods often focus solely on lip-reading or speaking contextual video, neglecting the potential of combining different valuable visual cues within the speaking context. In this paper, the authors introduce the Chinese-LiPS multimodal AVSR dataset and present the LiPS-AVSR pipeline, which leverages lip-reading and presentation slide information as visual cues for AVSR tasks.

The Chinese-LiPS dataset is a comprehensive collection comprising 100 hours of speech, video, and corresponding manual transcription. What sets this dataset apart is the inclusion of not only lip-reading information but also the presentation slides used by the speaker. This multi-disciplinary approach allows for a more holistic understanding of the audio-visual speech data, capturing the subtle nuances and context that improve ASR performance.

The LiPS-AVSR pipeline developed based on the Chinese-LiPS dataset demonstrates the effectiveness of leveraging multiple visual cues. The experiments conducted show that lip-reading information improves ASR performance by approximately 8%, while presentation slide information leads to a significant improvement of about 25%. When combined, the performance improvement reaches approximately 35%. This highlights the synergy of different visual cues and the potential for further enhancement in AVSR tasks.

This research embodies the multi-disciplinary nature of multimedia information systems, incorporating elements from speech recognition, computer vision, and human-computer interaction. By combining the analytical power of machine learning algorithms with visual and textual information, this work pushes the boundaries of AVSR systems and opens up new avenues for research.

Furthermore, the incorporation of visual cues extends beyond AVSR and has implications for other areas such as animations, artificial reality, augmented reality, and virtual realities. These technologies heavily rely on the integration of audio and visual information, and leveraging multimodal cues can greatly enhance the immersive experience and realism. The Chinese-LiPS dataset and the LiPS-AVSR pipeline serve as valuable resources for researchers and industry professionals working in these fields, providing a foundation for developing more advanced and accurate systems.

In conclusion, the release of the Chinese-LiPS multimodal AVSR dataset and the development of the LiPS-AVSR pipeline demonstrate the power of incorporating multiple visual cues for improved ASR performance. This work showcases the multi-disciplinary nature of multimedia information systems and has far-reaching implications for various domains. By combining lip-reading and presentation slide information, the LiPS-AVSR pipeline sets a new standard for AVSR systems and opens up exciting possibilities for further research and development.

Read the original article

kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization

kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization

arXiv:2504.05686v1 Announce Type: cross Abstract: Robustness is critical in zero-shot singing voice conversion (SVC). This paper introduces two novel methods to strengthen the robustness of the kNN-VC framework for SVC. First, kNN-VC’s core representation, WavLM, lacks harmonic emphasis, resulting in dull sounds and ringing artifacts. To address this, we leverage the bijection between WavLM, pitch contours, and spectrograms to perform additive synthesis, integrating the resulting waveform into the model to mitigate these issues. Second, kNN-VC overlooks concatenative smoothness, a key perceptual factor in SVC. To enhance smoothness, we propose a new distance metric that filters out unsuitable kNN candidates and optimize the summing weights of the candidates during inference. Although our techniques are built on the kNN-VC framework for implementation convenience, they are broadly applicable to general concatenative neural synthesis models. Experimental results validate the effectiveness of these modifications in achieving robust SVC. Demo: http://knnsvc.com Code: https://github.com/SmoothKen/knn-svc
The article “Robustness Enhancement in Zero-Shot Singing Voice Conversion” introduces two innovative methods to improve the robustness of the kNN-VC framework for singing voice conversion (SVC). The kNN-VC framework’s core representation, WavLM, lacks harmonic emphasis, resulting in dull sounds and ringing artifacts. To address this issue, the authors leverage the relationship between WavLM, pitch contours, and spectrograms to perform additive synthesis, integrating the resulting waveform into the model to mitigate these problems. Furthermore, the kNN-VC framework overlooks concatenative smoothness, a crucial perceptual factor in SVC. To enhance smoothness, the authors propose a new distance metric that filters out inappropriate kNN candidates and optimizes the summing weights of the candidates during inference. Although these techniques are specifically designed for the kNN-VC framework, they can be broadly applied to general concatenative neural synthesis models. The effectiveness of these modifications is validated through experimental results, demonstrating their ability to achieve robust SVC. Readers can access a demo of the enhanced framework at http://knnsvc.com and find the code for implementation on GitHub at https://github.com/SmoothKen/knn-svc.

Enhancing Robustness in Zero-Shot Singing Voice Conversion

Zero-shot singing voice conversion (SVC) has gained significant attention in recent years due to its potential applications in the music industry. However, achieving robustness in SVC remains a critical challenge. In this article, we explore the underlying themes and concepts of the kNN-VC framework for SVC and propose two novel methods to strengthen its robustness.

1. Addressing Dull Sounds and Ringing Artifacts

The core representation of the kNN-VC framework, known as WavLM, has been found lacking in harmonic emphasis, resulting in dull sounds and ringing artifacts. To overcome this limitation, we leverage the bijection between WavLM, pitch contours, and spectrograms to perform additive synthesis.

By integrating the resulting waveform into the model, we can mitigate the dull sounds and ringing artifacts, resulting in a more natural and pleasant vocal output. This enhancement not only improves the overall quality of the converted voice but also adds a new layer of realism to the synthesized vocal performance.

2. Enhancing Concatenative Smoothness in SVC

Another important aspect of vocal conversion is the perception of smoothness, which is often overlooked in the kNN-VC framework. Concatenative smoothness refers to the seamless transition between different segments of the converted voice, ensuring a coherent and natural flow.

To enhance smoothness, we propose a new distance metric that filters out unsuitable kNN candidates during the inference process. This filtering mechanism helps eliminate potential discontinuities and inconsistencies, contributing to a more coherent and smooth output. Additionally, we optimize the summing weights of the selected candidates, further refining the smoothness of the converted voice.

Broad Applicability to Concatenative Neural Synthesis Models

While our techniques are specifically built on the kNN-VC framework for implementation convenience, they have broader applicability to general concatenative neural synthesis models. The principles behind additive synthesis and the emphasis on smoothness can be applied to other frameworks and models to achieve robustness in various singing voice conversion tasks.

Experimental results have validated the effectiveness of these modifications in achieving robust SVC. The proposed methods have significantly improved the quality, realism, and smoothness of the converted voice, enhancing the overall user experience in zero-shot singing voice conversion applications.

To experience a live demonstration of the enhanced SVC, you can visit the demo website. For more technical details, the implementation code can be found on GitHub.

Enhancing robustness in zero-shot singing voice conversion opens up new possibilities in the music industry. These advancements pave the way for more immersive and realistic vocal synthesis applications, revolutionizing the way we create and enjoy music.

The paper titled “Robustness Enhancement in Zero-shot Singing Voice Conversion” introduces two innovative methods to improve the robustness of the kNN-VC (k-Nearest Neighbors Voice Conversion) framework for singing voice conversion (SVC). This research is crucial as robustness is a critical factor in SVC systems.

The first method addresses the issue of the core representation of kNN-VC, called WavLM, lacking harmonic emphasis and resulting in dull sounds and ringing artifacts. To overcome this limitation, the authors propose leveraging the relationship between WavLM, pitch contours, and spectrograms to perform additive synthesis. By integrating the resulting waveform into the model, they aim to mitigate the dullness and ringing artifacts, thus improving the overall quality of the converted singing voice.

The second method focuses on enhancing concatenative smoothness, which is a key perceptual factor in SVC. Concatenative smoothness refers to the seamless transition between different segments of the converted voice. The authors propose a new distance metric that filters out unsuitable kNN candidates and optimizes the summing weights of the candidates during inference. This approach aims to improve the smoothness of the converted singing voice by selecting appropriate candidates and optimizing their contributions.

It is worth noting that while these techniques are developed within the kNN-VC framework, they have broader applicability to general concatenative neural synthesis models. This highlights the potential for these methods to be employed in various other voice conversion systems beyond kNN-VC.

The paper also presents experimental results that validate the effectiveness of these modifications in achieving robust SVC. The authors provide a demo of their system, accessible at http://knnsvc.com, allowing users to experience the improvements firsthand. Additionally, the source code for their implementation is available on GitHub at https://github.com/SmoothKen/knn-svc, enabling researchers and developers to replicate and build upon their work.

In summary, this research introduces valuable enhancements to the kNN-VC framework for SVC by addressing issues related to dullness, ringing artifacts, and concatenative smoothness. The proposed methods demonstrate promising results and have the potential to be applied in other concatenative neural synthesis models, paving the way for further advancements in singing voice conversion technology.
Read the original article

“After Dark: Madison Skriver’s Exploration of Nostalgia and Reality”

Potential Future Trends in Art Exhibition: Analyzing Madison Skriver’s After Dark

Enari Gallery is pleased to unveil its latest solo exhibition, entitled “After Dark,” featuring the works of artist Madison Skriver. Skriver’s new series takes a deep dive into the interplay between nostalgia and reality, drawing inspiration from mid-century American culture and cinematic storytelling. This article will analyze the key points of the exhibition and provide insights into the potential future trends that may arise in the art industry, along with my own predictions and recommendations.

Exploring the Tension Between Nostalgia and Reality

Skriver’s artwork in “After Dark” addresses the timeless struggle between our nostalgic yearning for the past and the disquieting truths concealed beneath the idealized facade of mid-century American culture. Drawing influence from renowned filmmaker David Lynch, the artist skillfully highlights the stark contrast between the seemingly perfect American dream and the unsettling realities that lie beneath its glossy surface.

Through her bold use of colors, surreal light techniques, and layered symbolism, Skriver creates a mesmerizing dreamlike atmosphere that effectively encapsulates the haunting sense of the past. The exhibition invites viewers to confront the duality of nostalgia, sparking conversations about our collective desire to romanticize a bygone era while acknowledging the challenging aspects obscured within it.

Potential Future Trends

  • Nostalgia Renaissance: Skriver’s exploration of nostalgia and reality resonates with contemporary audiences. This exhibition reflects a growing trend of embracing nostalgia in art, pop culture, and design. As society becomes increasingly fast-paced and uncertain, people yearn for the comfort and familiarity of the past. Artists who can evoke nostalgia while also challenging its idealized notion are likely to capture the attention of future audiences.
  • Blurring Boundaries: Skriver’s fusion of mid-century American culture with cinematic storytelling showcases the potential for artists to break traditional boundaries in their work. As technology continues to evolve, artists are no longer confined to a specific medium or style. They can experiment with various techniques, combining elements from different eras and art forms to create thought-provoking and visually striking pieces. This trend of boundary-breaking art is likely to gain traction in the coming years.
  • Social Commentary: “After Dark” highlights the power of art to provoke important discussions about societal issues. In the future, we can expect more artists to leverage their work as a platform for social and cultural commentary. By addressing the unsettling truths beneath the surface of nostalgia and the American dream, Skriver inspires viewers to critically engage with the complexities of our society. Artists who use their talent to shed light on pressing matters are likely to make a significant impact on the art world.

Predictions and Recommendations

Based on the analysis of Madison Skriver’s “After Dark” and the potential future trends, the following predictions and recommendations can be made:

  • Embrace Multidisciplinary Approaches: Artists should experiment with various mediums, techniques, and art forms to create innovative and boundary-breaking pieces. By blurring traditional boundaries, artists can capture the interest of a broader audience and make a lasting impact on the art world.
  • Create Thought-Provoking Narratives: Artists should strive to convey deeper messages and provoke critical thinking through their work. By tackling important societal issues, artists can effectively use their creations as a medium for social commentary, engaging viewers and fostering meaningful conversations.
  • Balance Nostalgia with Nuanced Realism: Nostalgia will continue to have a significant influence on art, but it is crucial for artists to strike a balance between nostalgia and a nuanced reflection of reality. By challenging the idealized versions of the past and introducing elements that prompt introspection, artists can evoke a more profound emotional response from their audience.

References:
Enari Gallery. After Dark Exhibition Catalogue.
David Lynch’s Influence on Contemporary Art: Auteur and Avant-Garde. (n.d.). Retrieved from [insert reference here]

In conclusion, Madison Skriver’s “After Dark” exhibition serves as a strong indicator of potential future trends in the art industry. By exploring the interplay between nostalgia and reality, the exhibition taps into the growing fascination with nostalgia, the need for boundary-breaking art, and the power of social commentary. Artists who embrace multidisciplinary approaches, create thought-provoking narratives, and strike a balance between nostalgia and nuanced realism are likely to thrive in the ever-evolving art landscape. “After Dark” stands as a testament to the enduring power of art to challenge, inspire, and spark meaningful conversations among viewers.

Enhancing Speech-Driven 3D Facial Animation with StyleSpeaker

arXiv:2503.09852v1 Announce Type: new
Abstract: Speech-driven 3D facial animation is challenging due to the diversity in speaking styles and the limited availability of 3D audio-visual data. Speech predominantly dictates the coarse motion trends of the lip region, while specific styles determine the details of lip motion and the overall facial expressions. Prior works lack fine-grained learning in style modeling and do not adequately consider style biases across varying speech conditions, which reduce the accuracy of style modeling and hamper the adaptation capability to unseen speakers. To address this, we propose a novel framework, StyleSpeaker, which explicitly extracts speaking styles based on speaker characteristics while accounting for style biases caused by different speeches. Specifically, we utilize a style encoder to capture speakers’ styles from facial motions and enhance them according to motion preferences elicited by varying speech conditions. The enhanced styles are then integrated into the coarse motion features via a style infusion module, which employs a set of style primitives to learn fine-grained style representation. Throughout training, we maintain this set of style primitives to comprehensively model the entire style space. Hence, StyleSpeaker possesses robust style modeling capability for seen speakers and can rapidly adapt to unseen speakers without fine-tuning. Additionally, we design a trend loss and a local contrastive loss to improve the synchronization between synthesized motions and speeches. Extensive qualitative and quantitative experiments on three public datasets demonstrate that our method outperforms existing state-of-the-art approaches.

Expert Commentary: Speech-driven 3D Facial Animation and the Multi-disciplinary Nature of the Concepts

The content discussed in this article revolves around the challenging task of speech-driven 3D facial animation. This topic is inherently multi-disciplinary, combining elements from various fields such as multimedia information systems, animations, artificial reality, augmented reality, and virtual realities.

Facial animation is a crucial component of many multimedia systems, including virtual reality applications and animated movies. To create realistic and expressive facial animations, it is important to accurately model the intricate details of lip motion and facial expressions. However, existing approaches often struggle to capture the fine-grained nuances of different speaking styles and lack the ability to adapt to unseen speakers.

The proposed framework, StyleSpeaker, addresses these limitations by explicitly extracting speaking styles based on speaker characteristics while considering the style biases caused by different speeches. By utilizing a style encoder, the framework captures speakers’ styles and enhances them based on motion preferences elicited by varying speech conditions. This integration of styles into the coarse motion features is achieved via a style infusion module that utilizes a set of style primitives to learn fine-grained style representation. The framework also maintains this set of style primitives throughout training to comprehensively model the entire style space.

In addition to style modeling, the framework introduces a trend loss and a local contrastive loss to improve the synchronization between synthesized motions and speeches. These additional losses contribute to the overall accuracy of the animation and enhance its realism.

The experiments conducted on three public datasets demonstrate that the proposed method outperforms existing state-of-the-art approaches in terms of both qualitative and quantitative measures. The combination of style modeling, motion-speech synchronization, and the adaptability to unseen speakers makes StyleSpeaker a promising framework for speech-driven 3D facial animation.

From a broader perspective, this research showcases the interconnectedness of different domains within multimedia information systems. The concepts of 3D facial animation, style modeling, and motion-speech synchronization are essential not only in the context of multimedia applications but also in fields like virtual reality, augmented reality, and artificial reality. By improving the realism and expressiveness of facial animations, this research contributes to the development of immersive experiences and realistic virtual environments.

Key takeaways:

  • The content focuses on speech-driven 3D facial animation and proposes a novel framework called StyleSpeaker.
  • StyleSpeaker explicitly extracts speaking styles based on speaker characteristics and accounts for style biases caused by different speeches.
  • The framework enhances styles according to motion preferences elicited by varying speech conditions, integrating them into the coarse motion features.
  • StyleSpeaker possesses robust style modeling capability and can rapidly adapt to unseen speakers without the need for fine-tuning.
  • The framework introduces trend loss and local contrastive loss to improve motion-speech synchronization.
  • The method outperforms existing state-of-the-art approaches in both qualitative and quantitative evaluations.
  • The multi-disciplinary nature of the concepts involved showcases their relevance in the wider field of multimedia information systems, animations, artificial reality, augmented reality, and virtual realities.

Reference: Speech-driven 3D facial animation is challenging due to the diversity in speaking styles and the limited availability of 3D audio-visual data. Speech predominantly dictates the coarse motion trends of the lip region, while specific styles determine the details of lip motion and the overall facial expressions. Prior works lack fine-grained learning in style modeling and do not adequately consider style biases across varying speech conditions, which reduce the accuracy of style modeling and hamper the adaptation capability to unseen speakers. To address this, we propose a novel framework, StyleSpeaker, which explicitly extracts speaking styles based on speaker characteristics while accounting for style biases caused by different speeches. Specifically, we utilize a style encoder to capture speakers’ styles from facial motions and enhance them according to motion preferences elicited by varying speech conditions. The enhanced styles are then integrated into the coarse motion features via a style infusion module, which employs a set of style primitives to learn fine-grained style representation. Throughout training, we maintain this set of style primitives to comprehensively model the entire style space. Hence, StyleSpeaker possesses robust style modeling capability for seen speakers and can rapidly adapt to unseen speakers without fine-tuning. Additionally, we design a trend loss and a local contrastive loss to improve the synchronization between synthesized motions and speeches. Extensive qualitative and quantitative experiments on three public datasets demonstrate that our method outperforms existing state-of-the-art approaches.

Read the original article