Noor Al-Sibai
Futurism.com
Originally posted 15 Feb 24
A new Amazon AI model, according to the researchers who built it, is exhibiting language abilities that it wasn't trained on.
In a not-yet-peer-reviewed academic paper, the team at Amazon AGI — which stands for "artificial general intelligence," or human-level AI — say their large language model (LLM) is exhibiting "state-of-the-art naturalness" at conversational text. Per the examples shared in the paper, the model does seem sophisticated.
As the paper indicates, the model was able to come up with all sorts of sentences that, according to criteria crafted with the help of an "expert linguist," showed it was making the types of language leaps that are natural in human language learners but have been difficult to obtain in AI.
Named "Big Adaptive Streamable TTS with Emergent abilities" or BASE TTS, the initial model was trained on 100,000 hours of "public domain speech data," 90 percent in English, to teach it how Americans talk. To test out how large models would need to be to show "emergent abilities," or abilities they were not trained on, the Amazon AGI team trained two smaller models, one on 1,000 hours of speech data and another on 10,000, to see which of the three — if any — exhibited the type of language naturalness they were looking for.
My overall conclusion from the paper linked in the article:
BASE TTS (Text To Speech) represents a significant leap forward in TTS technology, offering superior naturalness, efficiency, and potential for real-world applications like voicing LLM outputs. While limitations exist, the research paves the way for future advancements in multilingual, data-efficient, and context-aware TTS models.