AU-KBC RESEARCH
CENTRE
Development of Lexical Resources in Tamil
1. TransLexGram TransLexGram
is a name for a set of electronic lexical resources being developed for
use in machine-aided translation from English to Indian languages. It is
an abbreviation for "Transfer Lexicon and Grammar". This is a project of
LERIL (Lexical Resources for Indian Languages), an open-source, collaborative
initiative of several groups (and individuals) to create shareable resources
for Indian languages. This initiative was launched at the "Workshop on
Lexical Resources for Natural Language Processing", 5 - 8 Jan 2001, IIIT
Hyderabad. The purpose behind this effort is to fill the lacuna in such
resources for Indian languages. TransLexGram will help in the development
of machine translation systems from English to Indian languages. TransLexGram
is a collaborative effort among individuals and institutions.We are working
on TransLexGram for Tamil.
2. AnnCorra The name AnnCorra,
shortened for " Annotated Corpora", is for an electronic lexical resource
of annotated corpora. It will be an important resource for the developement
of Indian language parsers, machine learning of grammars, lakshancharts
( discrimination nets for sense disambiguation) and a host of other tools.
The AnnCorra effort is being started based on the electronic corpora available
freely for various Indian languages. One such resource is the English-
Hindi Electronic Dictionary developed through a voluntary collaborative
effort Co- ordinated by Language Technologies Research Centre, Indian Institute
of Infromation Technology, Hyderabad. Another resource is an electronic
corpus of Hindi developed by Ministry of Information Technology, Government
of India.
S. Arulmozi