A UT computer science research lab worked on an artificial intelligence system that makes short-form videos accessible for visually impaired users.
The system, called ShortScribe, creates an audio transcription that describes the videos in detail. Short-form videos, found on TikTok or Instagram Reels, are typically not as accessible for blind or visually impaired users because the shortness makes creating audio transcriptions harder. Tess Van Daele, a recent computer science graduate, was the first author of the research paper about the system. She said she hopes more people will research making content accessible.
“Hopefully these huge platforms that we’re talking about … will see a lot of this research of mine and others, and take the advice … and actually use it towards their platforms to make them better for their customers,” Van Daele said.
According to Amy Pavel, assistant computer science professor and co-author of the research paper, the process of creating the system starts by segmenting the video into shots and then using AI to get a visual description of the footage.
The AI system, Optical Character Recognition, is first used to recognize any text on the video. Automatic Speech Transmission is then used to transcribe everything said in the video into text. Lastly, GPT-4, an OpenAI program, is used to create summaries of the content. The content is then turned into audio for the user to access.
“Audio descriptions are really interesting,” Pavel said. “They’re not a transcription, they are a narration of the visual content in a video. They’re made to go in between the audio.”
PhD student Mina Huh worked alongside Pavel. Huh said there will always need to be adjustments for users when there are constantly new forms of media being presented.
“As new mediums come out, like short-form videos, which didn’t exist until five years ago, it’s like a new form of media,” Huh said. “A lot of people love new media, but new media means (more) accessibility challenges because we haven’t considered how this new media should be accessible.”