Home Index

AU-KBC RESEARCH CENTRE

Tamil Morphological Analyser

        The current coverage of the morphological analyser is greater than 95% when tested over the three million word CIIL corpus. This follows the paradigm-based approach and is implemented as a Finite State Machine. This version can analyse nearly 3.5 million word forms.

        The objective of this tamil morphological analyser API is to retrieve the root from its inflected form.

         Words in Tamil have a strong postpositional inflectional component. As an example, for verbs, these inflections carry information on the gender, person and number of the subject. Further, modal and tense information for verbs is also collocated in the inflections. For nouns, inflections serve to mark the case (accusative, dative & etc) of the noun. The aim of this morphological analyser is to retrieve the root of the word along with the inflectional information. The documentation for this morph API is available. This documentation can also be downloaded as .doc  .pdf. .ps.

Screen shots and other relevant documents are listed below.

Verb Flowchart (pdf file)
Noun Flowchart (pdf file)
Screenshot of morph analyzer
Screenshot of morph generator
Note:
        The above documentation is slightly old and conforms to the morph version 1.0. Documentation for the next version will be available shortly. If you are interested in the finer details of the next version write to us.

 

S Ramesh Kumar, S Viswanathan