Project Title

Discourse Integrated Dravidian Language to Dravidian Language Machine Translation (DL-DiscoMT)

Background & Vision

Building a Speech to Speech Translation System (SSMT) across Indian languages is essential in India's multilingual context to overcome language barriers. This project creates an environment to leverage modern language computing technologies for translating from one language to another by developing Machine Translation (MT) systems capable of translating video or speech transcripts from English to Indian languages.

As India encompasses many language families, it is crucial to connect these families and develop translation systems across them. The Dravidian language to Dravidian language (DL-DiscoMT) translation connects to Hindi-Tamil systems and from Tamil to other Dravidian languages, forming a vital component of the Indian Language to Indian Language System and the larger Speech to Speech Machine Translation system.

Beyond Sentence-Level Translation

Texts have properties that go beyond individual sentences, manifested in the frequency and distribution of words, word senses, referential forms, and syntactic structures. This includes:

  • Discourse Coherence: Patterns realized through explicit and/or implicit relations between sentences, clauses, or referring forms
  • Anaphoric and Elliptic Expressions: Speakers exploit previous discourse context to convey subsequent information concisely

Therefore, this project incorporates discourse information at the cross-sentential level to enhance translation quality.

Target Domains

🏛️

Governance

Government policies and administration

🔬

Science & Technology

Biology, Chemistry, Physics, Computer Science, Engineering

💰

Economics

FinTech and financial services

🏥

Healthcare

Insurance and eMedical charts

🌾

Agriculture

Farmer-related information

Likely End User(s)

🎓

Students

🏭

Industry

🌐

Translation Services

🌾

Farmers

🏥

Health Organizations

🏛️

e-Governance Services

👥

Citizens

🏢

Governments

Project Deliverables

01

Discourse & Conversation Platform

A comprehensive platform for handling discourse analysis and conversational context

02

Machine Translation Systems

Text-to-text MT from Hindi to Tamil, Tamil to Hindi, Kannada, Malayalam, and Telugu (bi-directional). Incorporating discourse information in NMT and Sampark systems

03

Evaluation Leaderboard

Platform for continuous evaluation and benchmarking of translation quality

04

API/Services

Machine Translation solutions as APIs for integration with SSMT systems and end-user applications

Project Objectives

System Development & Deployment

Develop and deploy MT systems as services from Hindi to Tamil and Tamil to other Dravidian languages, playing a critical role in Indian language SSMT and TTMT systems

Discourse Integration

Include Discourse Analysis in MT to improve translation by identifying discourse markers and bringing coherence and cohesion to translated text

Ecosystem Development

Create and nurture an ecosystem involving startups and Central/State Government institutions to develop and deploy innovative products and translation services

Content Enhancement

Increase content in Dravidian languages on the Internet across domains of Governance, Science & Technology, Education, Health, and Agriculture

Project Duration & Scope

Duration

3 Years

Parallel Corpora

1 Lakh+ Sentences

Language Pairs

8 Systems

Partner Institutions

7 Organizations