Abstract:
Currently, due to globalization, our world is moving into one village and human languages are
being transnational. So far, human interpreters have been resolving communication gaps between
two people who speak different languages. However, since human translation is costly and
inconvenient, many kinds of research are being done to resolve this problem with Machine
Translation (MT) techniques. MT is a process of automatically translating text or speech from one
human language to another by computers. Neural Machine Translation uses Artificial Neural
Networks such as Transformers, which are the state of the art models that shows promising result
over the previous MT models. Several ancient scripts written in the Ge’ez language that needs to
be translated are available in Ethiopia and abroad. Currently, youth and researchers are interested
to learn and involve in research areas of Ge’ez and Amharic manuscripts. This thesis, therefore,
aims to demonstrate the capabilities of deep learning algorithms on MT tasks for those
morphologically rich languages. A bi-directional text-based Ge’ez-Amharic MT was tested on two
main different deep learning models viz. Seq2Seq with attention, and Transformer. A total of
20,745 parallel corpora was used for the experiment, from which the 13,787 parallel sentences
were collected from former researchers and a new 6958 parallel corpus was prepared. In addition,
a Ge’ez Latin numeric corpus having 3,078 parallel lines has been added to handle numeric
translation. We conducted four experiments, and the transformer outperforms other techniques by
scoring 22.9 and 29.7 BLEU scores from Ge’ez to Amharic and vice versa using 20,745 parallel
corpora. The typical Seq2Seq model improves the BLEU score of the SMT model, obtained by
previous researchers with BLEU scores of +0.65 and +0.79 that is 2.46% and 4.66% increment
from Ge’ez to Amharic and from Amharic to Ge’ez using 13,833 parallel sentences. Doing further
researches with clean larger corpus size and pre-trained models may improve the result we have
reported in this work. However, we faced a scarcity of corpus and pre-trained models for Amharic
and Ge’ez languages to get better results.