Virtual personal assistants (VPA), also known as smart assistants like Amazon’s Alexa and Google’s Assistant, are in the spotlight for vulnerabilities to attack. Take, for example, that incident about an Oregon couple’s Echo smart speaker inadvertently recording their conversation and sending it to a random contact. Or that time when the Alexa started laughing out of the blue. Indeed, something has to be done about these hacks, whether they’re by accident or not.

Earlier this month, researchers from Indiana University, the Chinese Academy of Sciences, and the University of Virginia found exploitable weaknesses in the VPAs above. Researchers dubbed the techniques they used to reveal these weaknesses as voice squatting and voice masquerading. Both take advantage of the way smart assistants process voice commands. Unsurprisingly, these also exploit users’ misconceptions about how such devices work.

How smart assistants work

VPA services used in smart speakers can do what they’re created to do with the use of apps called “skills” (by Amazon) or “actions” (by Google). A skill or an action provides a VPA additional features. Users can interact with a smart assistant via a virtual user interface (VUI), allowing them to run a skill or action using their voice.

Entrepreneurs, with the help of developers, are already taking advantage of creating their own voice assistant (VA) apps to cater to client needs, making their services accessible in the voice platform, or merely introducing an enjoyable experience to users.

As of this writing, the smart assistant apps market is booming. Alexa skills alone already has tens of thousands, thanks to the Alexa Skill Kit. Furthermore, Amazon has recently released Alexa Skill Blueprints, making skills creation easy for the person who has little to no knowledge of coding.

Unfortunately, the availability of such a kit to the public has made abuse by potential threat actors possible, making the VPA realm an entirely new attack vector. If an attack is successful—and the study researchers conducted proved that it can be—a significant number of users could be affected. They concluded that remote, large-scale attacks are “indeed realistic.”

Squatters and masqueraders

Voice squatting is a method wherein a threat actor takes advantage or abuses the way a skill or action is invoked. Let’s take an example used from the researchers’ white paper. If a user says, “Alexa, open Capital One” to run the Capital One skill, a threat actor can potentially create a malicious app with a similarly pronounced name, such as Capital Won. The command meant for the Capital One skill is then hijacked to run the malicious Capital Won skill instead. Also, as Amazon is now rewarding kids for saying “please” when commanding Alexa, a similar hijacking can occur if a threat actor uses a paraphrased name like Capital One please or Capital One Police.

“Please” and “police” may mean two totally different things to us, but for current smart assistants, these words are the same, as they cannot correctly recognize one invocation name over another similar-sounding one.

Suffice to say, VPAs are not great at handling homophones.


Read: Out of character: Homograph attacks explained


Voice masquerading, on the other hand, is a method wherein a malicious skill impersonates a legitimate one to either trick users into giving out their personal information and account credentials or eavesdrop on conversations without user awareness.

Researchers identified two ways this attack can be made: in-communication skill switch and faking termination. The former takes advantage of the false assumption that smart assistants readily switch from one skill to another once users invoke a new one. Going back to our previous example, if Capital Won is already running and the user decides to ask “Alexa, what’ll the weather be like today?”, Capital Won then pretends to hand over control to the Weather skill in response to the invocation when, in fact, it is still Capital Won running but this time impersonating the Weather skill.

As for the latter, faking termination abuses volunteer skill termination, a feature wherein skills can self-terminate after delivering a voice response such as “Goodbye!” to users. A malicious skill can be programmed to say “Goodbye!” but remain running and listening in the background for a given length of time.

But…I like my smart assistant!

No need to box up your smart speakers and send them back if these vulnerabilities worry you. But it is essential for users to really get to know how their voice assistant works. We believe that doing so can make a significant difference in maintaining one’s privacy and protection from attack.

“Making devices, such as Alexa, responsible for important systems and controls around the house is concerning, especially when evidence emerges that it’s able to turn a simple mistake into a potentially serious consequence,” our very own Malware Intelligence Analyst Chris Boyd said in an interview with Forbes.

Smart assistants and IoT, in general, are still fairly new tech, so we expect improvements in the AI, and the security and privacy efforts within this sector. Both Amazon and Google have claimed they already have protections against voice squatting and voice masquerading.

While it is true that the researchers had already met with both firms to help them understand these threats further and offer them mitigating steps, they remain skeptical about whether the protections put in place are indeed adequate. Only time will tell.