IIIT

Space-Time Aware Neural Networks for Efficient Multi-Variate Time Series Imputation

Author(s): Aman Atman
Advisor(s): Santosh Nannuru

Masters

May '25
Report no: IIIT/TH//
Center of SPCRC

Abs PDF

Space-Time Aware Neural Networks for Efficient Multi-Variate Time Series Imputation

Abstract

Time series can model many phenomena of interest. Additionally, with the advent of smart cities, we have sensors all around us measuring changing quantities, such as air quality, weather patterns, and traffic dynamics. Also, people regularly monitor their health indicators using fitness trackers. There exist several applications consisting of such multi-variate time series. Missing values are ubiquitous in these time series due to multiple reasons. Sensors can stop working over time due to environmental exposure, resulting in contiguous missing values until replaced. Momentary connectivity issues can also occur, resulting in missing values. Data storage and pre-processing errors also sometimes lead to data corruption. Privacy issues can further cause missing values. Using simple methods like interpolation can introduce significant bias. Imputation is a vital pre-processing step for any downstream time series analysis, including forecasting, classification, and anomaly detection. Inaccurate imputations can lead to biased analysis and flawed decision-making in healthcare, finance, and urban systems. Multi-variate time series imputation is a challenging problem because of the input’s temporal, spatial and space-time dependencies. For example, there can be a correlation in traffic speeds across a city’s roads over time. An accident on one road may affect the speed on a nearby road in the future. Real-world time series are additionally prone to being noisy and nonstationary. We propose two novel architectures – Product Graph U-networks using Temporal pooling for Spatiotemporal imputation (P-GUTS) and Sparse Attention-based Imputation Network for Time series (SAINT), which offer distinct advantages in terms of performance and efficiency. We are inspired to make efficient yet competitive networks due to hardware limitations. PGUTS can consider multiple data views to learn comprehensive representations, enabling it to generalize to multiple real-world datasets. SAINT is a more efficient network that is practical even for long-term forecasting. It leverages the idea of sparsity from the literature to divide the computations on the space-time graph sequentially. Both networks outperform strong baselines, are robust to increasing missing rates, and are useful for forecasting.

Exploratory Attempts Towards Typical and Atypical Disfluency Classification in Indian English

Author(s): Parvathi Priyanka Kommagouni
Advisor(s): Anil Kumar Vuppala

Masters

May '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

Exploratory Attempts Towards Typical and Atypical Disfluency Classification in Indian English

Abstract

Disfluencies are inherent components of fluent speech, reflecting the natural processes involved in speech planning and production. They can be categorized into typical disfluencies, such as hesitations and self-repairs, and atypical disfluencies associated with speech disorders like stuttering. From a theoretical perspective, disfluencies can be interpreted as indicators of cognitive errors or as functional elements of conversation. The cognitive view posits that disfluencies signal covert errors in language formulation, while the strategic view considers them as pragmatic tools that facilitate communication. For instance, hesitations may serve to stall for time during language formulation or to correct errors, whereas filled pauses can indicate the speaker’s need for attention or signal uncertainty in message planning. Similarly, repetitions can enhance narrative cohesion or act as stalling techniques. This thesis explores the intricate distinctions between typical and atypical speech disfluencies, emphasizing their implications for speech technology and clinical applications. Accurate classification of these disfluencies is critical for enhancing voice assistants (VAs) designed for Persons Who Stutter (PWS), as misidentification can lead to premature cutoffs during speech. Furthermore, early detection of stuttering in children is vital to prevent misdiagnosis as developmental language disfluency. To address these challenges, this research introduces the IIITH-TISA dataset, the first Indian English stammer corpus capturing atypical disfluencies, and extends the IIITH-IED dataset with detailed annotations for typical disfluencies. Employing Perceptually Enhanced Zero-Time Windowed Cepstral Coefficients (PE-ZTWCC) in conjunction with Shifted Delta Cepstra (SDC), we utilize a shallow Time Delay Neural Network (TDNN) classifier to achieve an average F1 score of 85.01% for disfluency classification, surpassing traditional feature sets. Additionally, this study leverages intermediate representations from four pre-trained self-supervised models—Wav2Vec2.0, HuBERT, WavLM, and TERA—to classify typical and atypical disfluencies within the context of two novel Indian English datasets. Classification experiments utilizing support vector machines (SVM) and convolutional neural networks (CNN) demonstrate that features extracted from HuBERT’s 5th layer yield a peak F1 score of 0.97. These results underscore the significance of intermediate layer representations in discerning subtle variations in speech patterns and contribute to the development of robust and interpretable systems for automatic speech disfluency classification. This research not only explores the understanding of speech disfluencies in a linguistically diverse context but also lays the groundwork for future innovations in speech technology that can better accommodate individuals with speech disorders.

Coreference Without Bells and Whistles

Author(s): S Kawshik Manikantan
Advisor(s): Vineet Gandhi

Masters

May '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

Coreference Without Bells and Whistles

Abstract

Coreference resolution (CR) is the task of identifying text spans that refer to the same entity. It is a fundamental component of natural language understanding with applications in various downstream NLP tasks, such as question answering, knowledge graph construction, and summarization. Despite its significance and the advancements made by neural coreference models, CR models face a major bottleneck: their limited generalization capability. Prior work attributes this generalization gap to differences in annotations, such as what constitutes a mention (or entity) and varying preferences to span boundaries. For a model to have strong referential capabilities, it must adapt to these annotation-specific nuances. However, achieving this level of adaptability remains a significant challenge, even for state-of-the-art (SOTA) models. This challenge is further amplified when evaluating the referential capabilities of large language models (LLMs) in a few-shot setting, where replicating nuanced annotations with just a few examples is highly unrealistic. We observe that these annotation-specific nuances, can be beneficial but are not essential for downstream tasks or for evaluating the core referential capabilities of an LLM. We describe these nuances as bells and whistles. In this work, we redefine the traditional formulation of coreference resolution by shifting focus away from its bells and whistles. Instead, we propose task formulations more aligned with practical applications and demonstrate improved generalizability across domains. Our first contribution introduces an alternative referential task, Major Entity Identification (MEI). MEI simplifies referential tasks by:(a) assuming that target entities are explicitly provided in the input, and (b) focusing exclusively on frequent entities. Assuming entities to be part of the input shifts the responsibility for domain-specific annotation adaptation—determining which entities are annotated—from the training phase to inference. Through extensive experiments, we show that MEI models generalize effectively across domains using both supervised approaches and LLM-based few-shot prompting across multiple datasets. Importantly, MEI aligns with the classification framework, enabling the use of robust, intuitive, and well-understood classification-based evaluation metrics. Beyond its theoretical appeal, MEI also has practical utility as it allows users to efficiently search for all mentions of a specific entity or a group of entities of interest. Our second major contribution addresses critical shortcomings identified in recent evaluations of large language models (LLMs) on coreference resolution. These studies revealed that traditional output formats and evaluation metrics fail to capture models’ referential understanding fully. Traditional evaluation methods require reproducing the entire document along with annotated cluster information or precisely replicating the antecedent span. This introduces additional bells and whistles, such as ensuring the accurate reproduction of spans and documents. To tackle this issue, we introduce IdentifyMe, a new benchmark for mention resolution that adopts a multiple-choice question (MCQ) format—a widely used evaluation approach for LLMs. With this simplified task design, any failure can now, be attributed exclusively to issues with mention resolution. IdentifyMe presents long narratives and applies heuristics to eliminate easily identifiable mentions, resulting in a more challenging and rigorous task. The benchmark incorporates a curated mix of various mention types and their corresponding entities, enabling fine-grained analysis of model performance. Notably, LLM performance remains substantially below human-level performance on IdentifyMe, highlighting considerable room for improvement even for advanced models like GPT-4. The evaluation also reveals key weaknesses in current LLMs, particularly with pronominal mentions, nested mentions, and other nuanced cases. Overall, this work moves beyond traditional coreference resolution formulations, focusing on tasks with practical applicability and providing fresh insights into the referential strengths and weaknesses of current models. We term this approach Coreference Without Bells and Whistles — a streamlined perspective that prioritizes utility and understanding of model capabilities over tailored annotation adaptation

Cross Lingual Lexical Transfer for Sentiment Classification in Hindi

Author(s): Vartika Rai
Advisor(s): Dipti Mishra Sharma

Masters

May '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

Cross Lingual Lexical Transfer for Sentiment Classification in Hindi

Abstract

Sentiment analysis has emerged as one of the most sought-after research areas in recent years. With the ever-increasing volume of data generated every day, it is essential to mine this information and analyze the opinions and facts it contains, in order to build better systems that can assist us in our day-to-day activities. With over 345 million speakers1 , Hindi is the fourth most widely spoken language in the world. In the 2000s, English and European languages dominated web forums and social media. However, over the following decade, Hindi experienced rapid growth in its online presence. In 2015, according to various sources, Hindi content consumption grew by 94% year over year, compared with just 19% for English2 . This boom in Hindi usage on the web calls for opinion-mining systems that can effectively process Hindi data and build accurate text-analysis models. Hindi is morphologically rich and has free word order, unlike English. Performing sentiment analysis on a resource-scarce language therefore remains a challenging task. Our work focuses on overcoming these challenges and creating an efficient system that can be easily scaled to other Indian languages for polarity determination. As part of this effort, we generate a Sentiment Treebank that provides word-level polarity along with other morphological features and a sentence-level score derived from heuristics. The Treebank contains over 10,000 sentences from various domains, each annotated with both word- and sentence-level polarity scores. In addition, we introduce a manually created corpus of 3,500 Hindi sentences mined from news blogs, each labeled with sentence-level polarity markers. Beyond resource creation, we perform sentence-level classification on the annotated corpus using three approaches: • Supervised approach using morpho-sentiment features. • Karaka-based approach, in which the thematic relations of the main verb and its participants are leveraged to generate patterns for cross-lingual sentiment indexing. • Shallow-parser approach, which extracts noun and adjective chunks to form phrases. Opinion scores from a resource-rich language (Language_1) with a well-developed sentiment lexicon are projected onto Hindi through phrase-level translation, thereby minimizing polarity-transfer errors. A key advantage of this method is its independence from domain-specific labeled data. This cross-lingual framework minimizes dependence on the source language and exploits sentiment scores in the source lexicon to assign scores to the target language. We assume that the presence of an opinion-bearing word in a chunk polarizes the entire chunk, allowing us to aggregate chunk-level sentiments into a sentence-level label. Finally, we analyze the merits and drawbacks of these systems, the impact of language representation on polarity determination, and the challenges inherent in processing user-generated Hindi text for opinion labeling.

Improving t-PIR Schemes via Star Product Properties of Berman Family of Codes

Author(s): Srikar Kale
Advisor(s): Prasad Krishnan

Masters

May '25
Report no: IIIT/TH//
Center of SPCRC

Abs PDF

Improving t-PIR Schemes via Star Product Properties of Berman Family of Codes

Abstract

This thesis presents novel contributions to the field of Private Information Retrieval (PIR), a fundamental cryptographic technique that enables users to query public databases without revealing the identity of the accessed data. The study begins by revisiting classical PIR schemes introduced by Chor et al. [1], establishing a foundational understanding of how user privacy is preserved through query obfuscation. Building on this foundation, the thesis advances PIR design by incorporating modern dis- tributed storage systems and powerful coding-theoretic tools, focusing on the application of Berman codes—binary linear codes of length nm (for n ≥ 2, m ≥ 1)—which are known to achieve the capacity of the binary erasure channel. A significant portion of the work explores (D, E)-retrieval schemes, which utilize linear codes for both storage and retrieval, enabling ro- bustness against server collusion and improving data access efficiency. A key contribution of this thesis is the development of a new PIR framework that supports up to t colluding servers. This construction employs the star product of Berman codes and achieves strong privacy guarantees while maintaining optimality in terms of PIR capacity. The proposed schemes provide flexible control over system parameters, including the number of servers, the storage overhead, the PIR rate, and the collusion threshold t. Our results introduce a new class of capacity-achieving PIR protocols that ensure the privacy of the retrieved file’s identity, even in adversarial settings with colluding databases. Notably, these schemes are constructed entirely over binary fields, offering a practical advantage by reducing implementation complexity compared to prior approaches that require large finite fields. Through rigorous theoretical analysis and constructive design, this work demonstrates the prac- tical viability and theoretical strength of Berman-code-based PIR systems. The methodologies proposed herein significantly enhance the privacy, efficiency, and deployability of PIR schemes, with far-reaching implications for secure data access in privacy-sensitive domains such as health- care, finance, and cloud computing.

Characterization of Maximal Quantum Secret Reconstruction in a 3-qubit Scenario

Author(s): Pratishtha Abrol
Advisor(s): Indranil Chakrabarty

Masters

May '25
Report no: IIIT/TH//
Center of CQST

Abs PDF

Characterization of Maximal Quantum Secret Reconstruction in a 3-qubit Scenario

Abstract

This thesis is based on a research problem carried out in [1]. The problem statement deals with the identification of secret sharable state in three qubit system for which maximum secret reconstruction is possible. We classify this set of states as MSR states. For a given value of maximum of the tele- portation fidelities that can be achieved by the bipartite channels of dealer-receiver and dealer-assistant, these states achieve the maximal secret reconstruction fidelity. In other words, it deals with finding a potential resource for secret sharing that maximizes the potential for secret sharing. A tripartite state can be considered a resource for secret sharing if it imposes limits on the teleportation fidelities in both bipartitions involving the dealer while still being useful to reconstruct the secret. Similarly, for a specific maximum value of the Bell-CHSH measurement in the bipartitions specified above, we determine the highest possible reconstruction fidelity. Another interesting discovery is that all secret-sharable states satisfy Bell’s inequality in both dealer- reconstructor and dealer-assistant channels. This reveals a novel form of mutual exclusivity between secret-sharable states and violations of Bell’s inequality. This thesis is based on our findings in [1] and it helps to identify the best candidates among secret sharing resource states to achieve maximum recon- struction fidelity, setting a practical limit on information transfer in a resource theoretic extension of secret sharing. Moreover, it highlights a new form of exclusivity between bipartite correlations and the ability to perform secret sharing in a tripartite setting

Exploring Synthetic Data Generation Techniques to Enhance Machine Learning Applications in VLSI Circuit Design

Author(s): Prasha Srivastava
Advisor(s): Zia Abbas

Masters

May '25
Report no: IIIT/TH//
Center of CVEST

Abs PDF

Exploring Synthetic Data Generation Techniques to Enhance Machine Learning Applications in VLSI Circuit Design

Abstract

In the rapidly evolving landscape of Very-Large-Scale Integration (VLSI) design, the integration of machine learning (ML) techniques has emerged as a powerful tool to enhance automation and optimization processes. However, the effectiveness of these ML applications is often hampered by a critical challenge: data scarcity. As VLSI systems grow in complexity, the demand for high-quality training datasets becomes increasingly vital for the development of robust and accurate ML models. This thesis addresses this pressing issue by exploring innovative strategies for data augmentation through the application of generative models, specifically Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and diffusion models. The initial segment of this research investigates the use of GANs to generate supplementary circuit data based on simulations performed in established design environments, including Cadence Virtuoso, HSPICE, and Microcap. By leveraging GANs, we aim to synthesize artificial circuit data that accurately reflects the characteristics of real-world data, thereby enhancing the training sets available for machine learning algorithms. A comprehensive comparative analysis is conducted to evaluate the performance of GANs against VAEs, highlighting their respective strengths and weaknesses in the context of data augmentation specifically tailored for VLSI applications. Subsequently, the thesis delves into the capabilities of diffusion models, which have recently gained prominence for their effectiveness in generating high-fidelity synthetic data. By utilizing these advanced generative techniques, we demonstrate the potential to produce artificial datasets that closely resemble real electronic circuit behavior, effectively addressing the limitations posed by traditional data collection methods. Through simulations conducted in the HSPICE design environment, we establish the quality and reliability of the synthetic data generated, thereby validating its applicability in enhancing ML model performance. A key focus of this research is the rigorous evaluation of the authenticity of the synthetic data and its impact on the predictive accuracy of machine learning models. The results reveal a significant reduction in prediction errors for circuit performance assessments when models are trained on augmented datasets, showcasing the transformative potential of generative models in the VLSI design domain. This improvement in accuracy not only highlights the effectiveness of the proposed methodologies but also underscores the importance of data-driven approaches in advancing the capabilities of electronic design automation In addition to presenting the main contributions of this thesis, this work also sheds light on the challenges encountered during the implementation of generative techniques. Issues such as the calibration of model parameters, the validation of synthetic data fidelity, and the integration of these data solutions into existing design workflows are critically examined. Furthermore, we provide insights into the lessons learned from experiments that did not yield successful results, offering valuable perspectives for future research endeavors in this field. By addressing the data scarcity issue through innovative generative modeling approaches, this thesis contributes to the ongoing dialogue within the VLSI community regarding the future of machine learning applications in electronic design automation. The findings of this research not only pave the way for more effective ML solutions in VLSI design but also serve as a foundation for future advancements that may ultimately reshape the landscape of electronic design and technology

AI-Based Autonomous Broker for Smart Grids: Theory, Design and Practice

Author(s): Chandlekar Sanjay Rajendrabhai
Advisor(s): Sujit Prakash Gujar

PhD

May '25
Report no: IIIT/TH//
Center of MLL

Abs PDF

AI-Based Autonomous Broker for Smart Grids: Theory, Design and Practice

Abstract

The emergence of AI-driven systems has transformed smart grid networks, enabling widespread automation in decision-making. A smart grid is an advanced electricity distribution system that empowers customers to actively participate through smart meters. These grids operate across three key markets: wholesale, tariff, and balancing markets. In this ecosystem, distribution companies, or electricity brokers, play a pivotal role by procuring electricity in the wholesale market through double auctions and selling it to customers in the tariff market. Meanwhile, the balancing market oversees real-time supply and demand, penalizing brokers responsible for imbalances. The primary goal of brokers is to maximize profits, which involves addressing three critical challenges: (i) minimizing procurement costs in the wholesale market, (ii) offering competitive yet profitable tariffs in the tariff market, and (iii) mitigating peak demand scenarios to avoid penalties. Achieving this requires brokers to develop intelligent strategies leveraging AI techniques. This thesis develops AI-based strategies that enhance the efficiency of electricity brokers, tested using a close-to-real-world simulation platform, PowerTAC. Specifically, we design various bidding strategies for brokers operating in the wholesale market, where electricity is traded through day-ahead periodic double auctions (PDAs). The first strategy draws inspiration from game theory, leveraging Nash equilibrium principles for a single-buyer and single-seller setup. This is extended to real-world scenarios consisting of multiple buyers and sellers using reinforcement learning techniques. The second strategy reduces procurement costs by exploiting the supply curve information of prominent sellers to inform bidding decisions. The third strategy employs Markov Perfect Nash Equilibrium (MPNE) policies to model buyer behavior with known supply curves and introduces an algorithm to address scenarios where supply curve information is unavailable. The fourth strategy leverages Monte Carlo Tree Search (MCTS) to operate in the continuous action space of bid prices for optimized bidding in PDAs. In the tariff market, we develop strategies for generating attractive tariff contracts to build and retain a robust customer base. Using game theory, we demonstrate that maintaining an optimal market share significantly boosts net revenue. To achieve this, we propose both heuristic and learning-based approaches, with the latter employing multiarmed bandit (MAB) techniques for tariff generation. To address peak demand scenarios, we propose a demand response mechanism that incentivizes customers to shift their electricity usage away from peak hours. We present an optimal algorithm for allocating discounts that maximize expected peak demand reduction while adhering to budget constraints. Additionally, we introduce an MAB-based online algorithm to handle cases where customer responses to incentives are initially unknown. Finally, we integrate these strategies to develop our autonomous broker, VidyutVanika, for the PowerTAC simulation-based tournament. VidyutVanika competed against other brokers with the objective of maximizing profits. The results from the PowerTAC 2021 and 2022 tournaments demonstrate the effectiveness of our approach, with VidyutVanika emerging as the champion in both years.

Face Sketch Generation and Recognition

Author(s): Kushal Kumar Jain
Advisor(s): Anoop Namboodiri

Masters

May '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

Face Sketch Generation and Recognition

Abstract

The field of sketch generation and recognition has seen significant advancements through the innovative application of generative models. This thesis presents a comprehensive exploration of face stylization , artistic portrait generation and forensic sketch synthesis, leveraging the power of stateof-the-art generative models like StyleGAN and StableDiffusion. Our work addresses key challenges in preserving identity, accommodating various poses, and bridging the modality gap between sketches and photographs. Through three interconnected studies, we demonstrate significant advancements in generating high-quality sketches and improving forensic applications. We begin by introducing a novel approach to face cartoonization that preserves identity and accommodates various poses. Unlike conditional-GAN methods, our technique utilizes an encoder to capture pose and identity information, generating embeddings within StyleGAN’s latent space. This approach uniquely adapts a pre-trained StyleGAN model, originally designed for realistic facial images, to produce cartoonized outputs without requiring a dedicated fine-tuned model. Building upon this foundation, we present Portrait Sketching StyleGAN (PS-StyleGAN), a style transfer approach tailored for portrait sketch synthesis. PS-StyleGAN leverages StyleGAN’s semantic W+ latent space to generate portrait sketches while allowing meaningful edits such as pose and expression alterations. By introducing Attentive Affine transform blocks and a specialized training strategy, we achieve high-quality sketch generation without fine-tuning StyleGAN itself. This method demonstrates superior performance over current state-of-the-art techniques, requiring only a small number of paired examples and minimal training time. Finally, we address the challenging task of forensic sketch-to-mugshot matching with CLIP4Sketch, a novel approach that uses diffusion models to generate diverse sketch images. By combining CLIP and Adaface embeddings of reference mugshots with textual-style descriptions, we create a comprehensive dataset of sketches corresponding to mugshots. This synthetic data significantly improves the accuracy of sketch-to-mugshot matching in face recognition systems, outperforming training on limited real face sketch data and datasets made by GAN-based methods. Collectively, these contributions push the boundaries of sketch generation and recognition, offering promising applications in both the artistic and forensic domains.

Predictive Modeling of Accident-Prone Road Zones and Action Recognition in Unstructured Traffic Scenarios using ADAS Systems at Population Scale

Author(s): Ravi Shankar Mishra
Advisor(s): Ravi Kiran Sarvadevabhatla

Masters

April '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

Predictive Modeling of Accident-Prone Road Zones and Action Recognition in Unstructured Traffic Scenarios using ADAS Systems at Population Scale

Abstract

This thesis addresses the critical challenge of improving road safety by introducing novel approaches to predictive modeling of accident-prone zones and action recognition in critical traffic scenarios. It makes two key contributions: the early identification of accident-prone zones using Advance Driving Assistance System (ADAS) data and the development of IDD-CRS, a comprehensive dataset for action recognition in unstructured road environments. In the first study, geo-tagged collision alert data from a fleet of 200 ADAS-equipped city buses in Nagpur, India, is leveraged to proactively identify high-risk zones across urban road networks. Using Kernel Density Estimation (KDE), this study captures the spatiotemporal distribution of collision alerts, enabling the detection of emerging blackspots before accidents occur. A novel recall-based metric evaluates the alignment of these predicted zones with historical blackspots, while Earth Mover Distance (EMD)-based analysis identifies previously unreported accident-prone areas. This predictive framework provides civic authorities with actionable insights for targeted interventions, such as traffic-calming measures and infrastructure improvements, thereby enhancing public safety. The second part of the thesis introduces the IDD-CRS dataset, a large-scale collection of traffic scenarios recorded using ADAS and dash cameras. IDD-CRS fills a critical gap in existing datasets by focusing on complex interactions between vehicles and pedestrians, with scenarios such as high-speed lane changes, unsafe vehicle approaches, and near-miss incidents. With precise temporal annotations powered by ADAS technology, the dataset ensures accurate event boundaries, providing a robust benchmark for action recognition and long-tail action recognition tasks. It includes 90 hours of footage spanning 5,400 one-minute videos and 135,000 frames, with hard negative examples to challenge existing models. Initial benchmarks highlight the limitations of current video backbones in recognizing rare events, emphasizing the need for further advancements. Together, these contributions provide a holistic framework for improving road safety through proactive accident prevention and robust action recognition in traffic scenarios. By addressing both spatial accident prediction and temporal event recognition, this work offers foundational resources and actionable insights to advance research and practical solutions for safer road environments.