Translation Portfolio: Machine Translation

Jennifer Sommerfeldt

12 December 2012

Since its inception in the early 20^th century, machine translation has seen a great deal of change and improvement. The Georgetown –IBM Experiment was hugely successful and ushered in an era of excitement about the possibility of quick, easy, “instant” translation. However, as time went on, researchers and developers began to recognize and experience some of the challenges that faced and still face machine translation today. Machine translation has changed and improved drastically since those early experiments; however, it still faces a number of shortcomings that must be overcome in order to truly become fully automated, namely, the need to fully understand and interpret a source text in context, the need to use information not contained in the system’s corpus, and the need to recognize changes in meaning that cannot necessarily be derived from a bottom-up analysis.

Historically, machine translation systems have struggled with ambiguity issues that stem primarily from an inability to recognize the context around the translation unit. Human translators rely heavily on context—whether they realize it or not. The human brain has an amazing capacity to recall information, and despite incredible advances in computing, machines are still not able to recognize and retrieve information as well as a human can, especially since statistical machine translation programs get their information from a corpus, which may or may not differentiate between contexts for various translation projects. Machines have to be told what the context is, and someone has to limit the corpus accordingly, providing more work and keeping machine translation from reaching its goal of becoming fully automated. Researchers and developers have worked hard to correct this problem, but the only real solution that they have found so far is to add to the corpus or limit it to specific topics or contexts (Marcu & Melby, 2008). Until a solution to the context-ambiguity issue is found, machine translation systems will not be able to compete in the context-determination area of translation.

The issue of not being able to use knowledge not contained directly in the corpus is much larger than just making translation decisions based on the context of the translation unit. Human translators use a wide variety of information not contained in the corpus, including cultural background information, current events, even client preference and project specifications. In addition, translators use context clues from previous sentences, paragraphs, and even pages—which is something that machine translation systems are less able to do. All these considerations affect the end translation product. Developers believe that there are viable solutions to this problem, including greater emphasis being placed on local context and using it to determine the global context, but this solution is still under construction (Marcu & Melby, 2008). Finding a way to make machine translation systems better able to use knowledge external to the translation unit or things not specifically contained in the corpus is essential to making machine translation more effective and bringing it closer to full automation.

Machine translation systems rely heavily on adjacent collocations to determine meaning, and therefore additionally struggle with translating segments whose meaning is not necessarily clear based on the actual words or phrases (i.e. idiomatic expressions). Machine translation is infamous for its mistranslation of idioms and phrases, and this is something that must be corrected since idioms are used frequently in many languages (in fact, idioms appear approximately 4.08 times per minute in English, according to a study done at the University of Tennessee) (Pollio et al., 1977). Although researchers point out that data-driven machine translation is improving rapidly (Marcu & Melby, 2008), systems must be able to recognize and correctly translate all text, regardless of its idiomatic or unconventional use.

The advances that the world has seen in machine translation over the past few decades have been phenomenal. Although the original timeline set out for the development of a fully automated, high quality machine translation system was greatly underestimated, the developments to machine translation are not to be ignored. Critics argue that machine translation will never catch up to human translation, and perhaps it will not. However, the quality of machine translation has steadily improved, and as the computer age continues to progress, it is reasonable to expect that machine translation will keep developing as well. The future is bright and the outlook is good. It seems reasonable to expect that one day, machine translation will approach the capabilities of human translators—and perhaps, with enough technological advancements and enough time to develop, it will someday be able to match and then exceed the capabilities of even the best human translators.

References

Marcu, D. & Melby, A. (2008). Data-driven machine translation: a conversation with linguistics and translation studies. AMTA. Retrieved from: http://www.mt-archive.info/AMTA-2006-Marcu-Melby-expanded.pdf.

Barlow, J., Fine, H., Pollio, M., & Pollio, H. (1977). Psychology and the poetics of growth: Figurative language in psychology, psychotherapy, and education. Hillsdale, NJ: Lawrence Erlbaum Associates.

Pages

Machine Translation