AU-KBC RESEARCH
CENTRE
Supertagger for Tamil
Introduction:
Parts-of-speech disambiguation techniques (taggers) are often used to eliminate
(or substantially reduce) the parts-of-speech ambiguitiy prior to parsing.
The taggers are all local in the sense that they use information from a
limited context in deciding which tag(s) to choose for each word. As is
well known, these taggers are quite successful.
In a lexicalized Tree-Adjoining Grammar (LTAG),each lexical item is
associated with at least one elementary structure (tree). The elementary
structures of LTAG localize dependencies, including long distance dependencies,
by requiring that all and only the dependent elements be present within
the same structure. As a result of this localization, a lexical item may
be (and, in general, almost always is) associated with more than one elementary
structure. We call these elementary structures supertags, in order to distinguish
them from the standard parts-of-speech tags (Joshi & Srinivas, 1994).
LTAG has attractive linguistical and computational properties. These
derive from the lexicalization and ability to localize dependencies. For
these reasons, LTAG have been found to be useful for (partial) parsing,
and implementing transfer lexicon and rules for translation.
Aim:
To investigate the feasibility of the Lexicalized Tree Adjoining Grammar
(LTAG) formalism for expressing Tamil syntax. If LTAG is found feasible,
our aim is to build a large scale LTAG Grammar.
Need for the Project:
Large-scale grammar is necessary to process applications like parsing,
generation and translation. Computationally attractive features like "almost
parsing" (Supertagging) can be handled if such a system is built.
Earlier work:
Some groundwork on Supertags for Tamil was done during the Corpus based
Natural Language Processing workshop held at the AU-KBC Research centre
during 17-31st December 2001.
Approach of the Project:
Our approach in this project is to examine the work done with regard to
English LTAG and to investigate the major grammatical features in Tamil.
As a long-term goal, we plan to come up with resources like Light Weight
Dependency Analyser, Supertagger for Tamil.
References:
-
Doran, C., et.al. 1994. XTAG Technical Report. Department of Computer and
Information Sciences, University of Pennsylvania.
-
Joshi, A. K. & Srinivas, B. 1994. `Disambiguation of Super Parts of
Speech (or Supertags): Almost Parsing'. In Proceedings of the 17th International
Conference on Computational Linguistics (COLING'94), Kyoto, Japan.
-
Schabes, Y., Abeille, A., & Joshi, A.K. 1988. `Parsing strategies with
`lexicalized' grammars: Application to tree adjoining grammars'. In Proceedings
of the 12th International Conference on Computational Linguistics (COLING'88),
Budapest, Hungary.
-
Srinivas B (1998). `Migrating Supertags from English to Spanish'. In Proceedings
of Fourth International Workshop on Tree-Adjoining Grammars, Philadelphia.
S. Arulmozi