Conference Overview

More information will be updated soon

CONFERENCE DATE DEC 19TH 2024 TO DEC 22ND 2024

Venue: AU-KBC Research Centre, MIT College, Chrompet, Chennai

WORKSHOPS

The following workshops have been accepted for ICON 2024.

December 19, 2024

Workshop Abstract
Open-source Tools for NLP (Half Day)

Dr. Rajeev R R
ICFOSS, Government of Kerala, Thiruvananthapuram

Submission Link | Workshop URL

This half-day workshop will introduce par- ticipants to key open-source tools and frameworks used in the field of Natural Language Processing (NLP). The workshop will provide participants with hands-on experience in using these tools to perform fundamental NLP tasks such as text preprocessing, sentiment analysis, machine translation, and speech-to-text processing. The focus will be on practical implementation using freely available open-source tools and libraries, emphasizing ease of use and accessibility for a wide range of NLP applications.

Big Ellipsis(Full Day)

Dr. Rajesh Bhatt, University of Massachusetts, Amherst
Dr. Sobha L, AU-KBC Research Centre, Chennai
Dr. Anushree Mishra, EFLU, Hyderabad

Submission Link | Workshop URL

We propose a full day workshop on Big Ellipsis. Initial work on ellipsis focused heavily on the phenomena of VP-ellipsis. While this has been a productive area of study, its relevance to South Asian languages has remained somewhat limited, largely due to the uncertainty regarding whether VP-ellipsis occurs in these languages (see Manetta 2011, 2019, 2021). However, it has long been recognized that ellipsis can also take place at the clausal level, a phenomenon we refer to as 'Big Ellipsis.'

Integrating Natural Language Processing and AI for Enhanced Healthcare Communication:
Addressing Language Barriers in Patient Care(Full Day)


Dr. Hannah Mary Thomas T, Christian Medical College Vellore
Dr. Vandan Mujadia, IIIT Hyderabad.

Submission Link | Workshop URL

In multilingual societies such as India, effective communication in healthcare is often obstructed by language barriers, especially in clinical interactions between patient and their healthcare providers, in documents provided by the hospitals for the patient’s use (patient-facing documents like consent to treatment/research participation, information sheets, discharge summaries etc).

This workshop will explore the application of Natural Language Processing (NLP), Computational Linguistics (CL), and Artificial Intelligence (AI) to create effective mediums of language translation systems that can be tailored for healthcare use cases. We will identify use cases that medical professionals and patients find challenging, explore available systems that can help translating medical information into patient- friendly language, various Indian languages, implementing multilingual speech recognition for clinical documentation, and automating content contextualization through AI. The primary goal is to unite researchers and professionals from linguistics, healthcare, and AI to collaboratively develop solutions that facilitate better communication in multilingual and fast-paced healthcare environments.


December 22, 2024

Workshop Abstract
Teaching of Natural Language Processing in the Era of LLMs (Full Day)

Dr. Vasudeva Varma, IIIT Hyderabad
Dr. Dipti Misra Sharma, IIIT Hyderabad
Dr. Pushpak Bhattacharya, IIT, Bombay
Dr. Sivaji Bandyopadhyay, Jadavpur University , Kolkota
Dr. Sobha Lalitha Devi, AU-KBC, Chennai
Dr. Sudeshna Sarkar, IIT Kharagpur , Kolkota
Dr. Asif Ekbal, IIT Patna/Jodhpur

Submission Link | Workshop URL

The field of Natural Language Processing (NLP) is undergoing a rapid transformation due to the rise of Large Language Models (LLMs) like ChatGPT and the widespread adoption of the Transformer architecture. As educators and researchers, we find ourselves at a crossroads: how should we rethink the way NLP is taught to better align with these developments? Should the focus shift away from traditional methods, and if so, which concepts remain valuable? In an academic context, especially in Indian universities with a rich linguistic diversity, how do we ensure that the modern NLP curriculum reflects not just global trends but also the unique challenges of Indian languages?

This workshop proposes to explore these questions and help educators think critically about what the future of NLP education should look like.

Workshop on Tamil Computing(Full Day)

Submission Link | Workshop URL
NLP Tools Development for Gujarati(Full Day)

Dr. C.K. Bhensdadia , Dr. Brijesh Bhatt, Dr. Jatayu Baxi
Dharmsinh Desai University, Nadiad

Submission Link | Workshop URL

The development of NLP technologies has created advancements in language-based applications. However, low resource languages like Gujarati, despite having significant number of speakers, has scarcity of linguistic resources and NLP tools. This workshop aims to address this gap by promoting the development of linguistic resources, models, and tools specific for the Gujarati language. The primary objective of this workshop is to bring together researchers, developers, linguists, and industry stakeholders to discuss the current challenges and potential solutions for building useful NLP tools for Gujarati. We aim to encourage the submission of research papers and tool demonstrations in this area.

TUTORIALS

The following Tutorials have been accepted for ICON 2024.

December 19, 2024

Tutorial Abstract
Automating Talent Acquisition - The process of Resume Parsing and Screening.
(9.30 AM - 5.30 PM)


Dr. Keyur Joshi and Dr. Vrunda Gadesha

Ahmedabad University

This tutorial explores the evolving process of resume screening in large corporations, focusing on the shift from traditional methods to Industry 4.0 standards using advanced Natural Language Processing (NLP) techniques. Attendees will gain a deep understanding of how automated resume scanning works in large companies and the importance of key entities in resumes, including fact-based (e.g., education, job title) and competency-based entities (e.g., skills, leadership abilities). Through hands-on experiments, participants will experience different approaches to resume parsing, from conventional methods to those optimized for modern recruitment technologies. The tutorial will also provide practical guidance on how to make resumes compatible with AI-driven screening systems, ensuring they meet Industry 4.0 standards. By the end of the session, attendees will be equipped with actionable insights and skills to navigate and excel in the technology-driven landscape of talent acquisition.

Disfluency Identification and Annotation in Indian Context for NLP Development.
(9.30 AM - 1.00 PM)


Dr.Vandan Mujadia, Dr.Chayan Kochar, Dr. Nikhilesh Bhatnagar, Dr. Parameswari Krishnamurthy and Dr.Pruthwik Mishra*

IIIT-Hyderabad and SVNIT, Surat*

Disfluency identification is a fundamental natural language processing (NLP) task. It improves the accuracy and fluency of spoken language processing applications such as automatic speech recognition (ASR), machine translation, dialog systems, and language understanding. Disfluencies are categorized into interruptions, hesitations, or corrections in spoken language impacting the overall performance and usability of such applications.
In this tutorial, we are going to talk about our work on the development of disfluency (Mu- jadia et al., 2024)annotated corpus in Indian English for the technical lecture domain. We will detail the annotation procedure and the guidelines. We will also present a technique for data augmentation that involves contextual embeddings and part-of-speech patterns. This synthetic data positively impacts the performance of the disfluency identification models. We will also shed light on the extension of this work (Kochar et al., 2024) for 6 Indian languages: Hindi, Bengali, Marathi, Telugu, Kannada and Tamil, highlighting the linguistic challenges and adaptations required to handle disfluencies in these diverse languages.

December 22, 2024

Tutorial Abstract
Text Augmentation for Indian Languages.
(2.00 PM - 5.00 PM)


Asha Hegdea, H L Shashirekhab

Mangalore University, Karnataka

Text Augmentation (TA) is a technique in Natural Language Processing (NLP) that involves artificially increasing the amount of text data. It also helps to enhance model performance by creating diverse variations of the existing data. TA is crucial for low-resource languages to address data scarcity issues. While extensive work has been done on data augmentation for high-resource language like English, there has been significantly less focus on low-resource languages, more specifically Indian languages, despite the need to overcome their data limitations. To address this issue, we intend to organize a tutorial on "Text Augmentation for Dravidian Languages" that aims to address TA for Indian languages. The tutorial covers a talk on TA for Indian languages and hands-on (demo codes).

Transformative Impact of Generative AI on Healthcare: Industry Case Studies.
(9.30 AM - 5.30 PM)


Dr. Manjira Saha

Tata Consultancy Services

This tutorial will explore the various ways Generative AI and LLMs are transforming healthcare, offering practical examples from the industry use cases and discussing the ethical, regulatory, and technical considerations. Attendees will leave with a clearer understanding of how AI is not only enhancing current medical practices but also paving the way for future innovations.

Harnessing the Power of Large Language Models for Multilingual and Code-mixed NLP task.
(9.30 AM - 5.30 PM)


Karthika Vijayan and Arindam Chatterjee

Sahaj Software, Bangalore and Pune, India

Multilingual communication is a natural and widespread phenomenon in linguistically di- verse societies. Additionally, code-mixing, the blending of two or more languages in informal communication, is common in these settings. To address this linguistic complexity, Natural Language Processing (NLP) systems must be designed to handle both multilingual data and the code-mixed nature of real-world communication. In this tutorial, we will explore the design and development of robust multilingual NLP systems with a focus on leveraging pre-trained Large Language Models (LLMs) as their foundation. We will delve into techniques such as fine-tuning, knowledge distillation, and using pre-trained embeddings from LLMs to train classifiers and other downstream systems. Moreover, we will address key challenges, including the handling of low-resource languages and the generalization of NLP systems across multiple tasks. Through this tutorial, participants will gain insights into the limitations of out-of-the-box solutions and the critical importance of customizing models to effectively solve specific multilingual and code- mixed NLP tasks.

Diffusion Probabilistic Models for Natural Language Processing.
(9.30 AM - 1.00 PM)


Tejomay Kishor Padole, Suyash Awate, Prof. Pushpak Bhattacharyya and Amar Prakash Azad*

Indian Institute of Technology, Bombay and
*Fujitsu Research, Bangalore

In the current era of Natural Language Processing, Large Language Models (LLMs) have risen to be the state-of-the-art generative models. But due to their autoregressive nature (i.e. sequential next word generation) they suffer from sampling drifts, tending to accumulate errors during their sequential generation process. Diffusion Probabilistic Models (DPMs) are a new class of generative models that generate data non autoregressively with iterative denoising which allows us to exhibit more control over the generation. This tutorial aims to present the foundations of DPMs as well as the current state-of-the-art techniques based on DPMs for generating text. The attendees will gain insights on how the DPM framework works and understand the key differences be- tween DPMs and autoregression. Along with the modeling techniques, we also aim to high- light existing challenges with applying DPMs to text and potential research directions in the area.

SHARED TASKS/TOOL/DEMOS

The following Shared Task have been accepted for ICON 2024.

More information will be updated soon

Workshop Format
Decoding Fake Narratives in Spreading Hateful Stories (Faux-Hate)
Submission Link | Shared Task URL
In-person