There are more than 300 sign languages in use today around the world. A region’s sign language is linguistically different from its spoken language, for example American Sign Language (ASL) is different from spoken English, and signed English is not ASL.
Furthermore, sign language recognition is not the same as gesture recognition, and there is currently no holistic written representation of sign language from which to use AI tools to build a language model. Given the growing number of voice-powered experiences in devices and tech, people who use sign language will be excluded if we do not invest in further research for this scenario.
While this is a complex topic, one major challenge is the systemic lack of data recourses to train AI models for sign languages since often modern AI methods require huge amounts of data to learn statistical properties and perform complex tasks. For sign languages, the problem is exacerbated by the fact data available for them is substantially less than for those in spoken languages, with further imbalance existing across sign languages. For example, there is more data for ASL compared to Indian Sign Language (ISL).
A team lead by Pratyush Kumar at AI4Bharat focused their research to explore this challenge. Having built resources for ISL they recognized both the obstacles and value in this type of data collection. Furthermore, advances in large language models (LLMs) have shown an unanticipated ability to learn from unsupervised data (i.e.,data without human labeling), motivating the team to apply these techniques to sign languages. To do so, their aim was to collect large amounts of sign language videos to train AI models and then fine-tune them using smaller amounts of labelled data.
In testing the hypothesis, the team shortlisted a few sign languages from across the world (e.g., American, Argentinian, Chinese, German, Greek, Indian and Turkish) and set out techniques for data collection. By identifying video channels popular with the communities using the specific sign language, they were able to analyze the content and develop a plan to remove all personal identifiable data. They computed the key points from the videos, extracting only positions of facial and body landmarks such as eyes, lips, or shoulders. Together, it resulted in over four thousand ours of sign language video.
A large AI model was trained on the collected data, which was then fine-tuned on small datasets across the seven sign languages to evaluate its performance in detecting signs. For example, if a person is signing “cat” in Chinese Sign Languages, how often is the model capable of detecting that correctly?
The results showed that the trained model improved on state-of-the-art results from published studies across all the seven sign languages. For isolated sign recognition, SOTA had an average accuracy of 71.1%. With their methods, they were able to increase it to 77.6%. The results were promising, but they were concentrated on signing individual words. AI4Bharat found that the signers would often “finger spell” out a word – such as names or technical terms. Thus, the next focus was evaluating the model’s ability to classify the signs for each language’s alphabet. As no dataset existed providing labelled data across the seven languages, AI4Bharat created a new fingerspelling labelled dataset. This once again demonstrated the higher accuracy of their AI model. This showed that when the large corpus that they have created is used the accuracy increases from 50.71% to 59.2% on average.
While there is still much to be done to get closer to real time sign language translation through AI models, AI4Bharat’s researched showed the potential of adopting ideas from large language models, especially considering their increased population in products such as GitHub Copilot, which helps programmers increase their efficiency in programming, to the domain of AI for sign languages. Furthermore, the research shows multilingual models can be trained so that the model is simultaneously able to recognize sign across different sign languages, a point of much interest in the context of the large variety across sign languages. For AI4Bharate this opens an important possibility: to build AI techniques that partly address the large scarcity in building AI for sign languages, providing hope for the over 300 recognized sign languages in the world.
For those considering or undergoing a similar body of work, AI4Bharat recommends taking input and co-developing with the sign language community. Their own research wouldn’t be possible without this community who shared insights and provided feedback regarding tools that have to improve significantly for them to be useful in real-time communication. Hence, AI4Bharat considers it critical to recognize their research as work-in-progress toward building AI systems that are useful for people communicating in sign languages across the world.
Full research paper available on OpenReview.
More stories on sign language: