Multi-lingualizing the EOLSS using UNL
Sustainable environment and sustainable development has become one of the greatest concerns of our time. We need a comprehensive and coherent body of knowledge to preserve life on Earth and to ensure sustainable development for all. The EOLSS provides such a useful body of knowledge. Nevertheless, for this knowledge to have a substantial impact, it must reach all peoples, regardless of their native languages or their cultural backgrounds.
EOLSS deals with the natural Environment in connection with economic and social development. The EOLSS is unique in that it comprehensively examines, from their origins, the threats facing all the systems that support life on Earth from the climate, oceans, forests, water cycle, and atmosphere to social systems. The result is an integrated system of knowledge on the “web of life” achieved by bringing different academic disciplines together to explain the complex interactions among natural systems and human societies.
The “Encyclopedia of Life Support Systems” (EOLSS) is the largest on-line encyclopedia and is constantly expanding. It currently contains more than 120,000 web pages the content of which is edited by nearly 300 subject experts. About 5,000 authors from more than 100 countries have been, and still are, contributing to the development of the EOLSS. This massive volume of knowledge and data is developed under the auspices of UNESCO and is updated and augmented fortnightly at (www.eolss.net).
Our Task
Our ultimate goal is to make the entire EOLSS available in the Arabic language; however, the UNDL foundation started by encoverting 25 documents as a sample and have sent the resulting UNL expressions to the 15 language centers involved in the project in order to be deconverted into their respective languages. As the Arabic UNL center, we have succeeded in deconverting the 25 documents into Arabic.
First, a specialized dictionary of 27,100 entries have been built. Then, the Arabic counterparts of the incoming UNL expressions have been automatically generated by applying the DeConversion’s morphological and syntactic rules.
Due to the highly specialized language of the texts and their large size, the Arabic DeConversion grammar had to undergo a phase of enhancement on both, the syntactic and morphological levels in order to be able to handle the various linguistic phenomena they comprehend. The following are some of the areas covered in this enhancement:
Syntactic Challenges:
• Semantics-Syntax Mapping
• Generating Nominal Chunks
• Dealing with Superlative Adjectives
• Arranging Nouns and Adjectives
Morphological Challenges:
• Handling word forms.
• Achieving Number and Gender agreement.
• Dealing with reference resolution.
• Identifying the case ending of words.
• Proper handling of pronouns.
Then comes the stage of Evaluating the Deconversion’s performance, two methods have been adopted in evaluating the Deconversion’s output in general, and the 25 documents in particular. The first is a qualitative evaluation where, first, the English document “Tsunami” has been human-translated taking into consideration the UNL expressions. Then, a comparison between the deconverted text and the human-translated text has been performed according to some linguistic criteria that evaluate the output on the syntactic, semantic and morphological levels. The results were 70-75% syntactically correct, 85% semantically correct, and 90% morphologically correct. For more details on this evaluation see A Semantic-Based Approach for Multilingual Translation of Massive Documents.
The second method is the statistical method. 500 sentences have been selected randomly form the 25 EOLSS documents and translated by two human translators in addition to another translator who post-edited the machine output by making the minimal changes necessary. Efforts were focused on devising a metrics to measure the resemblance between the Arabic deConverter output and one or more human translations. Three matrices were used BLEU, F1 and F mean. The results were compared to those of three English to Arabic translation systems; Google, Babylon and Sakhr’s Tarjim. UNL translation achieved the best scores in this evaluation, followed by Google, Sakhr and then Babylon. These results were statistically significant at 95% confidence. For the details of this evaluation see Evaluation of Arabic Machine Translation System based on the Universal Networking Language.