Home Index

AU-KBC RESEARCH CENTRE

Tamil WordNet

Overview

Tamil WordNet is an attempt to build a lexical network for Tamil language along the lines of the English WordNet so that it can be used as a tool for enhancing the performance of MT systems involving Tamil. We propose to assign each word a set of all possible senses it can take. We also aim to capture various relationships between the words by networking the sense of these words in an appropriate manner using the relationship as a function. These word-level relations include synonymy, antonymy, hypernymy, hyponymy, meronymy and holonymy. A Machine Translation system having the source language as Tamil can effectively exploit these relations to resolve ambiguities in the text.

In this project, we intend to capture the relationships for 50,000 root words in Tamil along with the appropriate number of senses and concepts. This will be built using a database as a back-end, and a front-end user interface to view the relationships and senses.

Tamil WordNet relies on Rajendran's (2001) Modern Tamil Thesaurus which is based on Nida's (1975) Componential Analysis of Meaning. This work which is available in the electronic form repesents the ontological structure of Tamil vocabulary. Tamil vocabulary is classified into four major domains: entities, abstracts, events and relationals based on the part-of-speech categories. Besides Tamil thesaurus, we will also use existing resources like monolingual, bilingual dictionaries that are available for Tamil for increasing the size of the database.

Links:

 

S. Arulmozi