Home Index


Supertagger for Tamil


Parts-of-speech disambiguation techniques (taggers) are often used to eliminate (or substantially reduce) the parts-of-speech ambiguitiy prior to parsing. The taggers are all local in the sense that they use information from a limited context in deciding which tag(s) to choose for each word. As is well known, these taggers are quite successful.

In a lexicalized Tree-Adjoining Grammar (LTAG),each lexical item is associated with at least one elementary structure (tree). The elementary structures of LTAG localize dependencies, including long distance dependencies, by requiring that all and only the dependent elements be present within the same structure. As a result of this localization, a lexical item may be (and, in general, almost always is) associated with more than one elementary structure. We call these elementary structures supertags, in order to distinguish them from the standard parts-of-speech tags (Joshi & Srinivas, 1994).

LTAG has attractive linguistical and computational properties. These derive from the lexicalization and ability to localize dependencies. For these reasons, LTAG have been found to be useful for (partial) parsing, and implementing transfer lexicon and rules for translation.


To investigate the feasibility of the Lexicalized Tree Adjoining Grammar (LTAG) formalism for expressing Tamil syntax. If LTAG is found feasible, our aim is to build a large scale LTAG Grammar.

Need for the Project:

Large-scale grammar is necessary to process applications like parsing, generation and translation. Computationally attractive features like "almost parsing" (Supertagging) can be handled if such a system is built.

Earlier work:

Some groundwork on Supertags for Tamil was done during the Corpus based Natural Language Processing workshop held at the AU-KBC Research centre during 17-31st December 2001.

Approach of the Project:

Our approach in this project is to examine the work done with regard to English LTAG and to investigate the major grammatical features in Tamil. As a long-term goal, we plan to come up with resources like Light Weight Dependency Analyser, Supertagger for Tamil.



S. Arulmozi