AU-KBC RESEARCH
CENTRE
Parse Representation of Tamil Syntax
The study of Tamil syntax, especially in its computational aspects, is till
in its formative stage and lags well behind the linguistic study of European
languages. Developing computational models for Tamil language still remains
an open-ended problem. This is an attempt to identify the syntactic structure
of Tamil language sentences and to develop appropriate phrase structure rules.
The syntactic structure of a sentence indicates the way that words
in the sentence is related to each other. This structure indicates how
the words are grouped together into phrases, what words modify what other
words, and what words are of central importance in the sentence. Most syntactic
representations of the language are based on the notion of phrase structure
grammars, which represent sentence structure in terms of what phrases are
subparts of other phrases. The phrase structure grammar formalism is used
to describe the rules of the language. It is a generative grammar formalism
from which the sentences can be generated by substituting the sign on the
left of the rule arrow by the signs on the right. For example, the rule
that a sentence (S) consists of a noun phrase (NP) followed by a verb phrase
(VP) is represented as below:S -> NP VP
A syntax analyser should identify the subject and objects of
each verb and determine what each modifying word or phrase modifies. The
syntax analyser which performs the task of recognizing a sentence and assigning
a syntactic structure to the sentence is also called as a parser. The identified
syntactic structure of the sentence can be represented as a parse tree. A
simple subset of a rule set for Tamil is as below:S -> NP VP
NP -> N
VP -> NP V where S = Sentence, N= Noun, V= Verb, NP = Noun
Phrase, VP = Verb Phrase
The parse tree for the sentence "paiyan paatam patithaan" (The
boy read a lesson) is shown below:
B. Kumara
Shanmugam