The problem of gender in machine translation

Data used to track, manage, and optimize resources.
Post Reply
Rina7RS
Posts: 675
Joined: Mon Dec 23, 2024 3:42 am

The problem of gender in machine translation

Post by Rina7RS »

Here we see a machine translation system at work, making a choice based on the statistical probability of a translation occurring according to the data it has. But what happens when the MT system has to make a more difficult choice?

Gender is a major choice that MT systems need to consider for some languages. For example, doctor translates into doctor or doctora in Spanish, changing according to the gender of the person being referred to. It can also translate to medico or medica.

Most MT systems generally default to the male version. Google Translate provides alternative translations for both genders, but it also defaults to male for longer sentences.

Meanwhile, nurse translates to enfermero or enfermera. But here, most MT systems will default to the female form.

This is something that you can test yourself—try translating these south africa mobile database terms by themselves, then try testing them with a long sentence like “The doctor/nurse said you should rest as much as necessary.”

All this hinges on the biases that exist in data that machine translation systems are trained on. MT systems don’t know gender per se. But because the available data tends to translate doctor as male and nurse as female, the MT system will also prefer these translations as a matter of statistical probability.

As such, machine translation’s technical “objectivity” becomes its own weakness in practice. Because of it, machine translation poses the danger of perpetuating gender stereotypes, and this is a problem that continues to exist even as machine translation’s quality continues to improve.
Post Reply