TransVox™ Technology for Transforming Your Singing Voice to a Particular Person’s Voice

23.06.23

Transforming Your Singing Voice in Real Time

TransVox is a technology that transforms a person’s singing voice into that of a completely different person in real time.

By having the AI learn the characteristics of a particular person’s singing voice, such as singing habits and changes in timbre according to the pitch of the notes, it can transform anyone’s voice, regardless of age or gender, into that person’s singing voice.

The way TransVox works is quite different from the voice conversion of a conventional voice changer, which processes the singing voice and gives it an electronic effect. By instantly analyzing the pronunciation and inflection of a singing voice and re-synthesizing it based on another person’s singing voice that has been previously learned by AI, real-time conversion to a natural singing voice has been achieved.

Mechanism for Reproducing the Characteristics of a Singing Voice

Conventional Method of Voice Conversion

With voice conversion using standard audio signal processing, such as effectors of audio equipment, predetermined signal processing is applied to the singing voice and the processed singing voice is output.
In this case, while simple audio signal processing allows for stable conversion, the characteristics of the original singing voice remain and the processed voice becomes mechanical and unnatural.

Singing Voice Transformation with TransVox

TransVox uses deep learning to solve these problems.
First, the original singing voice is analyzed to extract the linguistic components of the song. Then, singer-dependent factors are removed from the singing voice information so that only the linguistic content of the song remains. Through processing, the target singer’s voice is re-synthesized based on the linguistic components of the song.

With re-synthesis of the singing voice, the characteristics of the target singer’s voice – learned in advance by deep learning – are automatically reproduced. Therefore, even if a singer sings a song that the target singer has never sung before, an estimate of what it would sound like in the target singer’s voice will be output. As a result, the re-synthesized singing voice will sound as if it were sung by the target singer, with few of the characteristics of the original singer’s voice remaining.

If TransVox has a sufficient amount of singing data for the target singer, it can be used as training data to reproduce their singing voice through deep learning.

Demonstration Study using TransVox

Transformer Microphone featuring ELT’s Kaori Mochida: Special Room

“Transformer Microphone featuring ELT’s Kaori Mochida: Special Room” using TransVox technology was held for a limited time at three Big Echo locations from August 25 to October 11, 2022.

The “Transformer Microphone” allows users to sing in the voice of Kaori Mochida, the singer for Every Little Thing. Since Ms. Mochida’s singing voice can be reproduced so well, the event attracted a great deal of attention from various media and social networking sites, and was featured on many TV programs and YouTube channels.

The Transformer Microphone used Ms. Mochida’s vocal data from Every Little Thing’s songs as training data to understand the characteristics of her singing voice through deep learning. In addition, an “octave switch” function that transposes voices up one octave allowed even those with low voices to experience a natural transformation of their singing voice.

Ichiro Ito of Every Little Thing has uploaded a video of his experience with “Transformer Microphone featuring ELT’s Kaori Mochida” to his YouTube channel, “ELT Ichiro Ito Ikkun TV.”

YOXO FESTIVAL 2023 – Experience the Future in Yokohama – TransVox Experience

“TransVox Experience,” where your voice can be transformed into characters performed by professional voice actors, was exhibited at “YOXO FESTIVAL 2023 – Experience the Future in Yokohama,” an event for sharing creative innovations.

TransVox technology can be extended beyond singing voices for use with conversational speech. This exhibit allows anyone to pretend to be a voice actor by transforming their normal conversational voice into the emotional voices used in animation scenes.
Your voice will be transformed into those used by professional voice actors, such as “a boy with a calm atmosphere” or “a mature woman who calmly assesses things,” allowing you to become that character.

The booth was very popular during the event and always crowded with people. The participants were all surprised when their own voices were transformed into the emotionally charged voices of professional voice actors, just like in a real animated scene.

Yamaha’s Singing Voice Research and Demonstration Study

Yamaha has been researching singing voices for many years, starting with the VOCALOID™ singing voice synthesis technology, and continues to pursue this challenge while constantly incorporating the latest technologies.

TransVox is a completely new technology born out of Yamaha’s research into singing voices. We see the “Transformer Microphone featuring ELT’s Kaori Mochida: Special Room” and “TransVox Experience” as fun examples of how this technology can be applied.

Through this demonstration, we have introduced this technology to the public in a form that can actually be used by everyone in order to discover together the answer to the question, “How will advanced technology enhance our enjoyment of music in the future?” In this way, Yamaha hopes to discover with you how technology will enhance our enjoyment of music in the future.