A Bi-Directional Ge’ez-Amharic Neural Machine Translation:  a Deep Learning Approach

Amdework , Asefa Belay

A Bi-Directional Ge’ez-Amharic Neural Machine Translation: a Deep Learning Approach

Amdework , Asefa Belay

URI: http://etd.dbu.edu.et:80/handle/123456789/737

Date: 2021-07

Abstract:

Currently, due to globalization, our world is moving into one village and human languages are being transnational. So far, human interpreters have been resolving communication gaps between two people who speak different languages. However, since human translation is costly and inconvenient, many kinds of research are being done to resolve this problem with Machine Translation (MT) techniques. MT is a process of automatically translating text or speech from one human language to another by computers. Neural Machine Translation uses Artificial Neural Networks such as Transformers, which are the state of the art models that shows promising result over the previous MT models. Several ancient scripts written in the Ge’ez language that needs to be translated are available in Ethiopia and abroad. Currently, youth and researchers are interested to learn and involve in research areas of Ge’ez and Amharic manuscripts. This thesis, therefore, aims to demonstrate the capabilities of deep learning algorithms on MT tasks for those morphologically rich languages. A bi-directional text-based Ge’ez-Amharic MT was tested on two main different deep learning models viz. Seq2Seq with attention, and Transformer. A total of 20,745 parallel corpora was used for the experiment, from which the 13,787 parallel sentences were collected from former researchers and a new 6958 parallel corpus was prepared. In addition, a Ge’ez Latin numeric corpus having 3,078 parallel lines has been added to handle numeric translation. We conducted four experiments, and the transformer outperforms other techniques by scoring 22.9 and 29.7 BLEU scores from Ge’ez to Amharic and vice versa using 20,745 parallel corpora. The typical Seq2Seq model improves the BLEU score of the SMT model, obtained by previous researchers with BLEU scores of +0.65 and +0.79 that is 2.46% and 4.66% increment from Ge’ez to Amharic and from Amharic to Ge’ez using 13,833 parallel sentences. Doing further researches with clean larger corpus size and pre-trained models may improve the result we have reported in this work. However, we faced a scarcity of corpus and pre-trained models for Amharic and Ge’ez languages to get better results.

Show full item record