Abstract:
Availabilityoflargeamountofelectronicjobvacancytextonthewebmakesthe
identification ofrelevantvacancyannouncementrelated to a specifictopicisa
challengingtask.It’salsotrueforAmharictexts.Amharic(ኣማርኛ)isanEthiopian
languagewhichcomes from SemiticlanguageandusedasfirstlanguagebyAmhara
andworkinglanguageoffederalgovernment.Largeamountofelectronictextsinthis
domainhasbeengenerated.So,a textcategorizationmechanism is required for
finding,filteringandmanagingtherapidgrowthofonlineinformation.Thegoalof
automatictextcategorization is to classifydocumentsinto a certain numberof
predefined categoriesbyusingrulebasedormachinelearning.Theaim ofthisstudy
is thereforeto investigate theapplication of machine learning techniques for
vacancytextcategorization.
A totalof1678vacancyannouncementtextwitheightcategories:“ጤና”(health),
“ምህንድስና”(engineering),“የኮምፒዉተርሳይንስዘርፎች”(computing),“ተፈጥሮሳይንስ”(natural
science)“ማህበራዊሳይንስ”(socialscience),“ህግ”(law),“ግብርና”(agriculture)and“ቢዝነስ
እናኢኮኖሚክስ”(businessandeconomics)werecollected.Afterpreprocessingthetextfor
tokenization,stemming word variants and removing stop words and unwanted
charactersandweightingtheimportanceofaterm,1610pre-categorizedtextwere
usedtotraintheclassifier.Inthisstudythreesupervisedmachinelearningclassifiers,
namelysupportvectormachine,kNearestNeighborandNaïveBayesclassifiersare
usedtocategorizethevacancytext.
Experimentalresultshowsthat,SupportVectorMachineoutperformstheothertwo
classifiers(K-NearestNeighborandNaïveBayes)withanaccuracyof76.4%.Thisisa
promisingresulttodesignvacancytextcategorizationmodelforjobsannouncedin
Amhariclanguage.ህግ(law)categoryisanitem whichperformsthebestclassification
accuracyinthecurrentstudy.Because,law categoryisanitem thatsharetheleast
commontermswithotherfieldofstudywhencomparedwiththerestofanitemsused
inthecurrentstudy.However,therearechallengesindesigningjobvacancytext
categorizationmodel.Themainchallengeinthisstudyis;thereareconflictingtagsasa
resultofcommonwordsindifferentcategorieswhereitischallengingtaskformachine
xiii
to categorizethesewords.Itisthereforerecommendedto applysemanticbased
Amharicvacancytextcategorization.