《computationallinguisticsresearchonphilippinelanguages》由会员分享,可在线阅读,更多相关《computationallinguisticsresearchonphilippinelanguages(2页珍藏版)》请在金锄头文库上搜索。
1、Computational Linguistics Research on Philippine LanguagesRachel Edita O. ROXAS Software Technology Department De La Salle University 2401 Taft Avenue, Manila, Philippines ccsrorccs.dlsu.edu.phAllan BORRA Software Technology Department De La Salle University 2401 Taft Avenue, Manila, Philippines ccs
2、abbccs.dlsu.edu.phAbstractThis is a paper that describes computational linguistic activities on Philippines languages. The Philippines is an archipelago with vast numbers of islands and numerous languages. The tasks of understanding, representing and implementing these languages require enormous wor
3、k. An extensive amount of work has been done on understanding at least some of the major Philippine languages, but little has been done on the computational aspect. Majority of the latter has been on the purpose of machine translation.1Philippine LanguagesWithin the 7,200 islands of the Philippine a
4、rchipelago, there are about one hundred and one (101) languages that are spoken. This is according to the nationwide 1995 census conducted by the National Statistics Office of the Philippine Government (NSO, 1997). The languages that are spoken by at least one percent of the total household populati
5、on include Tagalog, Cebuano, Ilocano, Hiligaynon, Bikol, Waray, Pampanggo or Kapangpangan, Boholano, Pangasinan or Panggalatok, Maranao, Maguin-danao, and Tausug. Aside from these major languages, there are other Philippine dialects, which are variants of these major languages. Fortunato (1993) clas
6、sified these dialects into the top nine major languages as above (except for Boholano which is similar to Cebuano).2Language RepresentationsLinguistics information on Philippine languages are extensive on the languages mentioned above, except for Maranao, Maguin- danao, and Tausug, which are some of
7、 thelanguages spoken in Southern Philippines. But as of yet, extensive research has already been done on theoretical linguistics and little is known for computational linguistics. In fact, the computational linguistics researches on Philippine languages are mainly focused on Tagalog.1 There are also
8、 notable work done on Ilocano. Kroeger (1993) showed the importance of the grammatical relations in Tagalog, such as subject and object relations, and the insufficiency of a surface phrase structure paradigm to represent these relations. This issue was further discussed in the LFG98, which is on the
9、 problem of voice and grammatical functions in Western Austronesian Languages. Musgrave (1998) introduced the problem certain verbs in these languages that can head more than one transitive clause type. Foley (1998) and Kroeger (1998), in particular, discussed about long debated issues such as nouns
10、 in Tagalog that can be verbed, the voice system of Tagalog, and Tagalog as a symmetrical voice system. Latrouite (2000) argued that a level of semantic representation is still necessary to explicitly capture a words meaning. Crawford (1999) contributed to an issue on interrogative sentences and sug
11、gested that the restriction on wh-movement reveals the syntactic structure of Tagalog. Potet (1995) and Trost (2000) provided general materials on computational morphology, though, both presented examples on Tagalog. Rubino (1997, 1996) provided an in-depth analysis of Ilocano. Among the major contr
12、ibutions of the work include an extensive treatment of the complex morphology in the language, a thorough treatment of the discourse1 Tagalog (or Pilipino) has the most number of speakers in the country. This may be due to the fact that it was officially declared the national language of the Philipp
13、ines in 1946.particles, and the reference grammar of the language.3Applications in Machine TranslationCurrently, most of the empirical endeavours in computational linguistics are in machine translation.3.1Filipino MT SoftwareThere are several commercially available translation software, which includ
14、e Philippine language, but translation is done word-for-word. One such software is the Universal Translator 2000, which includes Tagalog among 40 other languages. Although omni-directional, trans- lation involving Tagalog excludes morpho- logical and syntactic aspects of the language Another softwar
15、e is the Filipino Language Software, which includes Tagalog, Visayan, Cebuano, and Ilocano languages.3.2Machine Translation ResearchIsaWika! is an English to Filipino machine translator that uses the augmented transition network as its computational architecture (Roxas, 1999). It translates simple a
16、nd compound declarative statements as well as imperative English statements. To date, it is the most serious research undertaking in machine translation in the Philippines. Borra (1999) presented another translation software that translates simple declarative and imperative statements from English to Filipino. The computational architecture of the system is based on LFG, which differs from IsaWikas ATN implementation. Part of the research was describing a possible set of semantic inf