Harvard Business Review
Originally posted May 20, 2019
Here is an excerpt:
But what happens when machines start analyzing how we talk? The big tech firms are coy about exactly what they are planning to detect in our voices and why, but Amazon has a patent that lists a range of traits they might collect, including identity (“gender, age, ethnic origin, etc.”), health (“sore throat, sickness, etc.”), and feelings, (“happy, sad, tired, sleepy, excited, etc.”).
This worries me — and it should worry you, too — because algorithms are imperfect. And voice is particularly difficult to analyze because the signals we give off are inconsistent and ambiguous. What’s more, the inferences that even humans make are distorted by stereotypes. Let’s use the example of trying to identify sexual orientation. There is a style of speaking with raised pitch and swooping intonations which some people assume signals a gay man. But confusion often arises because some heterosexuals speak this way, and many homosexuals don’t. Science experiments show that human aural “gaydar” is only right about 60% of the time. Studies of machines attempting to detect sexual orientation from facial images have shown a success rate of about 70%. Sound impressive? Not to me, because that means those machines are wrong 30% of the time. And I would anticipate success rates to be even lower for voices, because how we speak changes depending on who we’re talking to. Our vocal anatomy is very flexible, which allows us to be oral chameleons, subconsciously changing our voices to fit in better with the person we’re speaking with.
The info is here.