Home Index

AU-KBC RESEARCH CENTRE

Parse Representation of Tamil Syntax

The study of Tamil syntax, especially in its computational aspects, is till in its formative stage and lags well behind the linguistic study of European languages. Developing computational models for Tamil language still remains an open-ended problem. This is an attempt to identify the syntactic structure of Tamil language sentences and to develop appropriate phrase structure rules. The syntactic structure of a sentence indicates the way that words in the sentence is related to each other. This structure indicates how the words are grouped together into phrases, what words modify what other words, and what words are of central importance in the sentence. Most syntactic representations of the language are based on the notion of phrase structure grammars, which represent sentence structure in terms of what phrases are subparts of other phrases. The phrase structure grammar formalism is used to describe the rules of the language. It is a generative grammar formalism from which the sentences can be generated by substituting the sign on the left of the rule arrow by the signs on the right. For example, the rule that a sentence (S) consists of a noun phrase (NP) followed by a verb phrase (VP) is represented as below:S -> NP VP
 A syntax analyser should identify the subject and objects of each verb and determine what each modifying word or phrase modifies. The syntax analyser which performs the task of recognizing a sentence and assigning a syntactic structure to the sentence is also called as a parser. The identified syntactic structure of the sentence can be represented as a parse tree. A simple subset of a rule set for Tamil is as below:S -> NP VP
NP -> N
VP -> NP V where S = Sentence, N= Noun, V= Verb, NP = Noun Phrase, VP = Verb Phrase

The parse tree for the sentence  "paiyan paatam patithaan" (The boy read a lesson) is shown below:
 

Parse Tree

B. Kumara Shanmugam