Linguists model code switching in bilingual speech

When architectural engineering sophomore Vanessa Velez talks to her friends from high school, they rarely stick to one language. Velez said her Latina friend group switches between Spanish and English.

This phenomenon, known as code-switching, is the subject of research headed by linguistics professor Barbara Bullock. Bullock leads the Bilingual Annotation Tasks Force, a project that harnesses computational methods to study how bilingual speakers switch between languages with ease.

Bullock said that a long-term goal of the project is to determine whether the rules that govern code-switching are universal or vary based on the languages in question.

“That (goal) requires a lot of computational models, because you have to know what languages we’re dealing with in order to tag them properly,” Bullock said. “In order to tag them properly, you need to have context … the tools that we currently use just break down. So we’re working on lots of different computational models.”

Velez said her code-switching is governed by efficiency.

“Most of the time we don’t have fully Spanish conversations,” Velez said. “If I need to say one specific thing that you can only say in Spanish, I’ll use that part in an English conversation. If I’m saying ‘I’m doing my hair,’ I’ll say I’m ‘peinádome’ because that’s one word (in Spanish).”

Bullock said she introduced computation into her research, because current linguistic tools are designed for work on single languages.

The tasks force analyzes all language combinations, but code switching is especially prevalent in some parts of the world, Bullock said.

“India, for instance, is a huge case of language mixing,” Bullock said. “Either between Hindi and English, because English was the colonizer language, or between some of the smaller languages in India.”

The phenomenon of code-switching between Indian state languages is familiar to Sunkulp Ananthanarayan, a linguistics junior volunteering with the project. He said that his father grew up speaking a mixture of Tamil and Malayalam while his mother grew up speaking a combination of Kannada and Marathi. Ananthanarayan said this personal connection is what got him interested in the tasks force.

“My parents are both very much multilingual and I am not, so I wanted to see what’s going on in their speech that I don’t understand,” Ananthanarayan said.

Bullock said the force encompasses four facets: math, coding, data and linguistics.

“The coding group and the math group are working together quite a bit now,” Bullock said. “We are trying to quantify the way people actually switch. We want to know not only how much of one language versus the other language is represented, in (Ananthanarayan’s) mom’s speech for instance, but how often she may switch between them, how regularly she might switch between them.”

Ananthanarayan said in the future he anticipates an even better understanding of language as technology improves.

“We’re constantly changing how we’re doing things,” Ananthanarayan said. “Our metrics are getting better as we go on.”