In August, Meta introduced SeamlessM4T, an advanced multimodal AI translation model proficient in supporting nearly 100 languages for text and 36 for speech. With an upgraded "v2" architecture, the technology giant is refining this tool to enable more spontaneous and expressive conversational translations—a crucial element for genuine cross-language communication.
One of the two notable improvements is "SeamlessExpressive," a feature designed to convey expressions in translated speech. This includes elements such as pitch, volume, emotional tone (expressing excitement, sadness, or whispers), speech rate, and pauses. This breakthrough addresses the historical robotic quality of translated speeches, potentially revolutionizing both daily interactions and content production. The supported languages include English, Spanish, German, French, Italian, and Chinese, although the demo page currently lacks Italian and Chinese.
The second feature, "SeamlessStreaming," empowers real-time translation of a speech while the speaker is still talking, enabling others to receive translations more promptly. Despite a brief latency of just under two seconds, this feature eliminates the need to wait for the speaker to finish a sentence. Meta acknowledges the challenge posed by diverse sentence structures in different languages, leading to the development of an algorithm dedicated to analyzing partial audio input. This algorithm determines whether there is enough context to initiate the generation of a translated output or whether it should continue listening.
Meta's recent advancements in the "Seamless Communication" suite are remarkable, surpassing the capabilities of mobile interpreter tools offered by competitors like Google and Samsung. While there is no official release date for the public utilization of these features, one can anticipate Meta incorporating them into its smart glasses in the future, further enhancing their practicality.