12 December 2012
The Future of Machine Translation
Since its inception in the early 20th
century, machine translation has seen a great deal of change and improvement.
The Georgetown –IBM Experiment was hugely successful and ushered in an era of
excitement about the possibility of quick, easy, “instant” translation.
However, as time went on, researchers and developers began to recognize and
experience some of the challenges that faced and still face machine translation
today. Machine translation has changed and improved drastically since those
early experiments; however, it still faces a number of shortcomings that must
be overcome in order to truly become fully automated, namely, the need to fully
understand and interpret a source text in context, the need to use information
not contained in the system’s corpus, and the need to recognize changes in
meaning that cannot necessarily be derived from a bottom-up analysis.
Historically, machine translation
systems have struggled with ambiguity issues that stem primarily from an
inability to recognize the context around the translation unit. Human translators
rely heavily on context—whether they realize it or not. The human brain has an
amazing capacity to recall information, and despite incredible advances in
computing, machines are still not able to recognize and retrieve information as
well as a human can, especially since statistical machine translation programs
get their information from a corpus, which may or may not differentiate between
contexts for various translation projects. Machines have to be told what the
context is, and someone has to limit the corpus accordingly, providing more
work and keeping machine translation from reaching its goal of becoming fully
automated. Researchers and developers have worked hard to correct this problem,
but the only real solution that they have found so far is to add to the corpus
or limit it to specific topics or contexts (Marcu & Melby, 2008). Until a
solution to the context-ambiguity issue is found, machine translation systems
will not be able to compete in the context-determination area of translation.
The issue of not being able to use
knowledge not contained directly in the corpus is much larger than just making
translation decisions based on the context of the translation unit. Human
translators use a wide variety of information not contained in the corpus,
including cultural background information, current events, even client
preference and project specifications. In addition, translators use context
clues from previous sentences, paragraphs, and even pages—which is something
that machine translation systems are less able to do. All these considerations
affect the end translation product. Developers believe that there are viable
solutions to this problem, including greater emphasis being placed on local
context and using it to determine the global context, but this solution is still
under construction (Marcu & Melby, 2008). Finding a way to make machine
translation systems better able to use knowledge external to the translation
unit or things not specifically contained in the corpus is essential to making
machine translation more effective and bringing it closer to full automation.
Machine translation systems rely
heavily on adjacent collocations to determine meaning, and therefore
additionally struggle with translating segments whose meaning is not necessarily
clear based on the actual words or phrases (i.e. idiomatic expressions).
Machine translation is infamous for its mistranslation of idioms and phrases,
and this is something that must be corrected since idioms are used frequently
in many languages (in fact, idioms appear approximately 4.08 times per minute
in English, according to a study done at the University of Tennessee) (Pollio
et al., 1977). Although researchers point out that data-driven machine
translation is improving rapidly (Marcu & Melby, 2008), systems must be
able to recognize and correctly translate all text, regardless of its idiomatic
or unconventional use.
The advances that the world has seen
in machine translation over the past few decades have been phenomenal. Although
the original timeline set out for the development of a fully automated, high
quality machine translation system was greatly underestimated, the developments
to machine translation are not to be ignored. Critics argue that machine
translation will never catch up to human translation, and perhaps it will not.
However, the quality of machine translation has steadily improved, and as the
computer age continues to progress, it is reasonable to expect that machine
translation will keep developing as well. The future is bright and the outlook
is good. It seems reasonable to expect that one day, machine translation will
approach the capabilities of human translators—and perhaps, with enough
technological advancements and enough time to develop, it will someday be able
to match and then exceed the capabilities of even the best human translators.
References
Marcu,
D. & Melby, A. (2008). Data-driven machine translation: a conversation with
linguistics and translation studies. AMTA. Retrieved from: http://www.mt-archive.info/AMTA-2006-Marcu-Melby-expanded.pdf.
Barlow,
J., Fine, H., Pollio, M., & Pollio, H. (1977). Psychology and the poetics
of growth: Figurative language in psychology, psychotherapy, and education. Hillsdale,
NJ: Lawrence Erlbaum Associates.