One of the many useful uses of artificial conversational intelligence • stresses Meta • is the development of tools for people who have hearing difficulties or language problems.
But often speech comprehension systems do not work well in the daily situations where we need them most: when more people talk at the same time or when there is a lot of background noise.
One reason why people can understand the speeches better than artificial intelligence in these cases • • • • • • • • • • • • • • •
People see the mouth of those who speak moving, as well as listening to their voices.
And that’s why Meta AI is working on new systems of artificial conversational intelligence that can recognize the shady correlations between what you see and what you hear in a conversation. Just like humans do.
In order to build more versatile and robust voice recognition tools, Meta has now announced Audio-Visual Hidden Unit BERT (AV-HuBERT). It is a state-of-the-art self-supervised structure for understanding the speech that learns both by seeing and hearing people speak.
Meta says that this is the first system that jointly shapes the speech and movements of lips from untagged data: raw videos that have not previously been transcribed.
Using the same amount of transcriptions, Meta says, AV-HuBERT is more accurate than 75% of the best audio-visual voice recognition systems, i.e. those that use both the sound and the images of the speaker to understand what the person is saying.
In particular, he said Meta, this system exceeds an important limit in the training of artificial intelligence to perform useful tasks. That is: AV-HuBERT exceeds the previous best system of audio-visual voice recognition using one tenth of the data labelled.
As it is difficult to obtain large amounts of data labelled for most languages in the world, the self-supervised approach of AV-HuBERT according to Meta will help the company to build automatic voice recognition systems (ASR, automatic speech recognition) that
AV-HuBERT according to Meta will contribute much more than allowing to develop artificial conversational intelligence systems that can be used in difficult scenarios.
Since it requires much less supervised training data, it will also open the possibility of developing models of artificial conversational intelligence for hundreds of millions of people around the world who do not speak languages such as English, Mandarin and Spanish, who can