The opportunity at home – can AI drive innovation in personal assistant devices and sign language?
Advancing tech innovation and combating the data dessert that exists related to sign language have been areas of focus for the AI for Accessibility program. Towards those goals, in 2019 the team hosted a sign language workshop, soliciting applications from top researchers in the field. Abraham Glasser, a Ph.D. student in Computing and Information Sciences and a native American Sign Language (ASL) signer, supervised by Professor Matt Huenerfauth, was awarded a three-year grant. His work would focus on a very pragmatic need and opportunity: driving inclusion by concentrating on and improving common interactions with home-based smart assistants for people who use sign language as a primary form of communication.
Since then, faculty and students in the Golisano College of Computing and Information Sciences at Rochester Institute of Technology (RIT) conducted the work at the Center for Accessibility and Inclusion Research (CAIR). CAIR publishes research on computing accessibility and it includes many Deaf and Hard of Hearing (DHH) students operating bilingually in English and American Sign Language.
To begin this research, the team investigated how DHH users would optimally prefer to interact with their personal assistant devices, be it a smart speaker other type of devices in the household that respond to spoken command. Traditionally, these devices have used voice-based interaction, and as technology evolved, newer models now incorporate cameras and display screens. Currently, none of the available devices on the market understand commands in ASL or other sign languages, so introducing that capability is an important future tech development to address an untapped customer base and drive inclusion. Abraham explored simulated scenarios in which, through the camera on the device, the tech would be able to watch the signing of a user, process their request, and display the output result on the screen of the device.
Some prior research had focused on the phases of interacting with a personal assistant device, but little included DHH users. Some examples of available research included studying device activation, including the concerns of waking up a device, as well as device output modalities in the form for videos, ASL avatars and English captions. The call to action from a research perspective included collecting more data, the key bottleneck, for sign language technologies.
To pave the way forward for technological advancements it was critical to understand what DHH users would like the interaction with the devices to look like and what type of commands they would like to issue. Abraham and the team set up a Wizard-of-Oz videoconferencing setup. A “wizard” ASL interpreter had a home personal assistant device in the room with them, joining the call without being seen on camera. The device’s screen and output would be viewable in the call’s video window and each participant was guided by a research moderator. As the Deaf participants signed to the personal home device, they did not know that the ASL interpreter was voicing the commands in spoken English. A team of annotators watched the recording, identifying key segments of the videos, and transcribing each command into English and ASL gloss.
Abraham was able to identify new ways that users would interact with the device, such as “wake-up” commands which were not captured in previous research.
Additionally, a summarization of command categories and frequencies showed the most popular category was “command and control” where users adjust device settings, navigate through the results and answer yes/no style of questions. The next popular category was related to entertainment questions, followed by lifestyle and shopping. Furthermore, despite signing into a device, participants made sophisticated use of the spaces around their bodies, for example to represent and refer to people or things that were the topic of their questions. Another observation was the use of a question-mark sign at the beginning of yes or no questions, to call the attention of the device, while typically this sign more often used at the end of such questions. When it came to errors, such as the device not giving the result the users were looking for, most commonly users would simply ignore the error and proceed with a different command. A close second method was to repeat the command with the exact same wording and signing style, followed by rewording the command. For instance, some reworded their questions to be more English language like, or fingerspelling words for emphasis upon re-attempts.
A paper with the full details of the research has been presented and published in the Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, entitled “Analyzing Deaf and Hard-of-Hearing Users’ Behavior, Usage, and Interaction with a Personal Assistant Device that Understands Sign-Language Input” by Abraham Glasser, Matthew Watkins, Kira Hart, Sooyeon Lee, Matt Huenerfauth.
The knowledge gained through this research was then the basis of building a video dataset of recording of DHH people producing commands in ASL and interacting with their personal assistant devices – such as asking about the weather, controlling electronics in their home environment, and more. In using surveys and interviews to gather preferences and requirements from DHH users, videos were collected of ASL commands, leading to the production of a publicly available dataset that can be further leveraged by the research community to train ASL recognition technologies. However, the dataset would be useful for developers of personal assistant technologies as well, and for developers and researchers investigating sign language technologies.
While there are still many opportunities ahead to incorporate sign languages in tech, and innovate, the work Abraham and team have undertaken in the last three years represents an important milestone to further innovation in accessibility and ensure inclusion for all.