IIIT

Shortcut Topological Navigation through dense Pixel-Level Loop Closures

Author(s): Sarthak Chittawar
Advisor(s): K Madhava Krishna

Masters

July '26
Report no: IIIT/TH//
Center of RRC

Abs PDF

Shortcut Topological Navigation through dense Pixel-Level Loop Closures

Abstract

Although topological mapping and navigation have been extensively studied, the specific role and
downstream impact of loop closures in purely topological representations remains relatively underexplored. Importantly, loop closures in topological maps are fundamentally different from those in
globally referenced metric SLAM systems, where they primarily serve to correct accumulated drift.
Building on recent advances in dense topologies grounded in pixel-level relative 3D geometry, we
propose PixelLoop, a navigation framework that introduces loop closures directly in pixel space. Unlike sparse image-level edges or pose-graph corrections, our formulation treats loop closures as dense
geometric shortcuts that explicitly modify graph connectivity and cost propagation.
By establishing zero-cost correspondences between matched pixels across revisited views, PixelLoop
enables fine-grained topological connectivity that more accurately reflects the underlying spatial structure of the environment. This dense connectivity allows for stable any-point-to-any-point navigation
and produces costmaps that closely approximate true geometric shortest paths. In particular, we highlight the advantages of applying loop closures at the pixel level over traditional image-level topological
representations.
Across extensive simulated experiments in the Habitat simulator, PixelLoop achieves over 35% absolute improvement in both Success Rate (SR) and Success weighted by Path Length (SPL) compared to
strong image-relative baselines, with the largest gains observed in scenarios requiring shortcut exploitation. These improvements are further validated through real-world mobile robot deployments using a
ROS Noetic middleware stack, demonstrating that dense pixel-level loop closures provide a practical,
scalable, and robust foundation for topological visual navigation

On the Security of Machine Learning Models: IP Protection and Adversarial Robustness

Author(s): Aaryan Ajay Sharma
Advisor(s): Ankit Gangwal

Masters

July '26
Report no: IIIT/TH//
Center of CSTAR

Abs PDF

On the Security of Machine Learning Models: IP Protection and Adversarial Robustness

Abstract

The rapid adoption of Machine Learning (ML) models in safety-critical and commercial applications has raised fundamental questions about their security. We investigate two complementary facets
of ML model security: Intellectual Property (IP) protection of Graph Neural Networks (GNNs) and
adversarial robustness of merged models.
In the first part, we present GENIE, the first watermarking scheme designed specifically for GNNs
trained on the Link Prediction (LP) task. Existing GNN watermarking techniques address only node or
graph classification, leaving LP--a task central to recommendation systems, social network analysis,
and knowledge graph completion--entirely unprotected. GENIE introduces a novel backdoor-based
watermark that supports both node-representation-based (GCN, GraphSAGE, NeoGNN) and subgraphbased (SEAL) LP methods. It constructs a secret trigger set and watermark vector to embed an ownership signature during training. We further propose Dynamic Watermark Thresholding (DWT), a statistically principled verification procedure that bounds the misclassification probability with high confidence under minimal distributional assumptions. Extensive experiments across 4 model architectures, 7
real-world datasets, and 21 watermark removal attacks demonstrate that GENIE preserves model utility
(less than 2% AUC degradation), achieves near-perfect watermark accuracy, and remains robust against
black-box, white-box, combination, and adaptive attacks.
In the second part, we study the adversarial robustness implications of Model Merging (MM), a
popular technique that combines multiple fine-tuned models into a single multi-task model without requiring access to training data. While recent work suggests that MM confers "free" adversarial robustness by mitigating backdoor attacks, we challenge this notion by demonstrating that MM significantly
increases vulnerability to adversarial transfer attacks. Through comprehensive evaluations spanning 8
MM methods, 7 image classification datasets, 6 attack methods, and 336 distinct attack settings, we establish three statistically validated findings: (1) stronger MM methods lead to higher adversarial transfer
rates (exceeding 80% on average); (2) mitigating representation bias through post-hoc surgery increases
vulnerability to transfer attacks from fine-tuned surrogates; and (3) weight averaging, despite being the
weakest MM method, is paradoxically the most vulnerable to transfer attacks. We provide theoretical
justification via gradient alignment analysis and the Cross-Task Linearity property.
Together, these contributions advance our understanding of ML model security from both defensive
(IP protection) and offensive (adversarial vulnerability) perspectives, highlighting the complex interplay
between model utility, robustness, and security in modern ML systems.

Advanced Design of Ultra-Low Power Analog Circuits with Optimised Energy and Area Efficiency

Author(s): Anubhab Banerjee
Advisor(s): Zia Abbas

Masters

July '26
Report no: IIIT/TH//
Center of CVEST

Abs PDF

Advanced Design of Ultra-Low Power Analog Circuits with Optimised Energy and Area Efficiency

Abstract

The rapid proliferation of energy-constrained electronic systems, including IoT nodes, biomedical
implants, and portable devices, has intensified the demand for ultra-low-power and area-efficient analog
circuit design. This thesis presents the design and analysis of advanced ultra-low-power analog building
blocks with a focus on current and voltage references, as well as high-speed comparators, optimized for
energy and area efficiency under wide process, voltage, and temperature variations.
A series of resistorless current and voltage reference architectures are explored, leveraging subthreshold operation and the complementary temperature characteristics of CTAT and PTAT voltages to
achieve robust temperature compensation. A sub-1V current reference generating 593 pA is demonstrated, achieving a temperature coefficient of 378 ppm/◦C over a range of −40◦C to 100◦C, with a line
sensitivity of 0.198%/V across a supply range of 0.8 V to 2.5 V. The design consumes only 3.3 nW at
27◦C and occupies 0.074 mm2
, while maintaining ±0.75% (3σ/µ) process variation, enabling trim-free
operation. Further, an integrated ultra-low-voltage reference operating at 0.5 V generates 90.7 pA and
288 mV current and voltage references, respectively, achieving temperature coefficients of 15.2 ppm/◦C
and 36.8 ppm/◦C. The circuit operates across a wide supply range of 0.5 V to 2.6 V with line sensitivities
of 0.028%/V and 0.154%/V, consuming only 275.26 pW and occupying 0.087 mm2
through the use of
gate leakage-based techniques.
To address line sensitivity and enable operation over extreme temperature ranges, a compact voltage
reference architecture employing only four transistors and a dual-loop regulation scheme is proposed.
The design achieves a reference voltage of 451.6 mV while consuming 37.6 pW, with an ultra-low
line sensitivity of 0.009%/V. It operates over a wide temperature range of −30◦C to 160◦C with a
temperature coefficient of 70.2 ppm/◦C, enabled by threshold voltage modulation and leakage control
techniques.
In addition to reference generation, a 0.3 V bulk-driven rail-to-rail comparator with dynamic transient
enhancement is introduced to overcome the speed limitations of subthreshold operation. The proposed
design achieves a 10× improvement in rise time, with a transient response of 7.56 ns and a bandwidth of
1.07 MHz while consuming only 28 nW of power, demonstrating portability across 65 nm and 180 nm
CMOS technologies.
While the previously presented resistorless architectures demonstrate excellent simulated performance in terms of ultra-low power and compactness, their reliance on device-level characteristics makes
them inherently more susceptible to process variations, potentially impacting yield and long-term reliability. To address these limitations, area-efficient resistance-based current references are investigated
and validated through silicon tape-out measurements. A fabricated design in 0.18 µm CMOS delivers
a 10 nA reference current while consuming 40 nW, achieving a temperature coefficient of 136 ppm/◦C
over −40◦C to 100◦C and a line sensitivity of 1.1%/V across a 1.4 V to 1.9 V supply range, with a compact area of 0.03 mm2
. Furthermore, a low-cost single-point auto-calibration technique is proposed,
enabling compensation of temperature and process variations. The calibrated 1 µA current reference
achieves ±2% accuracy with a temperature coefficient below 150 ppm/◦C over −40◦C to 100◦C, offering performance comparable to conventional trimming methods while significantly reducing calibration
complexity, time, and cost.
Additionally, this thesis presents a fully integrated ultra-low-power and area-efficient RC oscillator based on a resistance amplification technique for compact on-chip timing generation. Implemented
in a 0.18 µm CMOS process, the oscillator synthesizes a large effective resistance using amplified
polysilicon resistor characteristics, significantly reducing silicon area while maintaining frequency stability. Continuous-time offset mitigation techniques are employed to suppress low-frequency noise and
amplifier-induced offsets. The proposed oscillator generates a stable 5 kHz clock from a 1.4 V supply
with ultra-low power consumption, achieving a temperature coefficient of 100 ppm/◦C over −40◦C to
100◦C and a line sensitivity of 1%/V across a 1.4 V to 2 V supply range using single-point calibration.
Overall, the proposed architectures demonstrate significant improvements in power consumption,
area efficiency, temperature stability, and robustness, making them well-suited for next-generation ultralow-power analog and mixed-signal systems.

EM-based Model for RIS-aided Channel and its Achievable DoF

Author(s): Nobhendu Chowdhury
Advisor(s): Praful Mankar

Masters

July '26
Report no: IIIT/TH//
Center of SPCRC

Abs PDF

EM-based Model for RIS-aided Channel and its Achievable DoF

Abstract

This thesis presents a physically consistent electromagnetic (EM) framework for the design and
modelling of reconfigurable intelligent surfaces (RIS) in 6G wireless networks. Moving beyond simplified
ray-tracing approximations, we establish a wave-based model grounded in the inhomogeneous Helmholtz
equation to characterize the interaction between incident fields and programmable surfaces in the
wavenumber domain. We demonstrate that the RIS acts as a spectral-domain mixer, where the reflected
field is governed by a 2D spectral convolution constrained by the free-space propagation disk and the
RIS is programmed to achieve a target field desired at the receiver.
To quantify the information-carrying capacity in RIS-aided channels, we adopt a set-theoretic
approach based on the space-bandwidth product and the Brunn-Minkowski inequality. We derive universal
mathematical bounds on the spatial degrees of freedom (DoF) and introduce a spectral realizability factor,
η, to evaluate the efficiency of spectral expansion relative to the boundary of the propagation disk. This
theoretical development motivates a two-stage wavefront engineering strategy, involving a linear centring
operation to maximize spectral budget, followed by deterministic diffusion to enrich the channel rank.
Numerical results validate the framework by analysing the scenarios of isotropic RIS expansion
and multibeam synthesis. For the former, we quantify the quadratic decay in η caused by inefficient
RIS design strategies. For the latter, we identify the discrete spectral thresholds at which synthesized
components fail to travel into the far-field. The observations confirm that the proposed centring and
diffusion paradigm effectively preserves channel dimensionality, enriching shadowed and rank-deficient
environments. This work provides the fundamental analytical tools required to maximize the spatial
multiplexing potential of RIS-aided communication systems.

Tracking the Tug: Time-Frequency Signal Processing for Radial Velocity Exoplanet Detection

Author(s): Ashwini Kulkarni
Advisor(s): Santosh Nannuru

Masters

July '26
Report no: IIIT/TH//
Center of SPCRC

Abs PDF

Tracking the Tug: Time-Frequency Signal Processing for Radial Velocity Exoplanet Detection

Abstract

The detection of exoplanets using the radial velocity (RV) method needs better instrumentation,
better physical models and better signal processing. Planetary signals are meter/sub-meter-per-second
Doppler shifts embedded in time series corrupted by stellar variability, instrumental systematics, and
the irregular observational schedules inherent to ground-based astronomy. This thesis addresses this
challenge from two directions: a comprehensive review of existing methods, and the development of a
new time-frequency analysis framework, the Non-Uniform Stockwell Transform (NUST), specifically
designed for the non-uniformly sampled, non-stationary signals encountered in precision RV surveys.
The first contribution is a systematic review of fifteen signal processing methods used for RV exoplanet
detection, ranging from the Lomb-Scargle periodogram and its Bayesian extensions, through Gaussian
process regression, Bayesian MCMC inference, wavelet-based approaches, and machine learning classifiers. Each method is analysed in terms of its mathematical foundations, handling of non-uniform
sampling and correlated noise, computational cost, and limitations. A comparative summary table provides a practical reference for method selection.
The second and central contribution is the theoretical development and experimental validation of
the NUST. The NUST extends the Stockwell transform, a time-frequency representation providing
frequency-dependent time resolution through a Gaussian analysis window, to handle non-uniformly
sampled data without interpolation. Its core innovation is a doubly adaptive Gaussian window whose
width scales with both the analysis frequency and the local sample density at each point in time, ensuring
statistically consistent resolution across densely and sparsely observed periods. At each time-frequency
point a localised Generalised Lomb-Scargle fit is performed, connecting the NUST directly to the standard GLS significance framework and producing a real-valued power statistic bounded between zero and
one. The method is validated on synthetic periodic signals and on archival data from the multi-planetary
systems HD 10180, HD 40307, and GJ 581, where it separates the temporally coherent power of confirmed planetary signals from the evolving, season-dependent signature of stellar activity, a diagnostic
capability that the GLS periodogram, operating in the frequency domain alone, cannot provide.
Current limitations include the absence of an analytical global significance threshold, manual hyperparameter tuning, and an assumption of uncorrelated noise within each window. Extensions proposed include a Keplerian NUST for eccentric-orbit detection, integration with Bayesian pipelines, and
a multi-dimensional variant for joint analysis of RV and stellar activity indicators. The broader applicability of the NUST to other non-uniformly sampled, non-stationary domains, variable star photometry,paleoclimate proxy analysis, long-term biomedical monitoring, and structural health monitoring is also
discussed

Developing and Validating a Digital Version of the Indian Council of Medical ResearchNeuroCognitive Toolbox (ICMR-NCTB)

Author(s): George Paul
Advisor(s): Bhaktee Dongaonkar

Masters

July '26
Report no: IIIT/TH//
Center of CSL

Abs PDF

Developing and Validating a Digital Version of the Indian Council of Medical ResearchNeuroCognitive Toolbox (ICMR-NCTB)

Abstract

The Indian Council of Medical Research NeuroCognitive Toolbox (ICMR-NCTB) was developed
for culturally appropriate cognitive assessment across India's linguistic diversity. Its paper-based format, however, leaves much of the administration workflow dependent on manual timing, handwritten
scoring, later transcription, and physical handling of task artifacts. This thesis presents the design, implementation, and initial validation evidence for a tablet-based digital version of the ICMR-NCTB. The
work centers on three connected questions: whether a usable administrator-facing platform can support
the full battery, how individual cognitive tests can be digitized without losing their clinical intent, and
whether scores collected in the digital medium remain comparable to the original pen-and-paper format.
The resulting Flutter application supports multilingual instructions, structured score capture, local
media storage, spreadsheet export, and selective manual override when expert judgment is still required.
Digital versions were implemented for Picture Naming, Trail Making, Category and Phonemic Fluency,
Verbal Learning, Test des Neuf Images du 93 (TNI-93), Modified Taylor Complex Figure (MTCF),
Frenchay Aphasia Screening Test (FAST), and Line Bisection, along with the associated questionnaires.
Validation was conducted with 83 adults who completed digital and pen-and-paper sessions in counterbalanced order; the full inferential analysis was restricted to 67 younger adults below 45 years, with a
smaller older subgroup reported through baseline descriptive checks. At the composite level in younger
adults, the two media showed broad alignment: there was no overall baseline medium-of-testing effect,
no meaningful order effect, and a clear practice effect across sessions. The test-level picture was more
mixed. Most subtests behaved comparably across media, while Trail Making Part B and key Verbal
Learning outcomes showed more consistent sensitivity to testing medium. The thesis therefore presents
initial validation evidence for the digital battery as a promising administrator-guided platform for structured administration, while identifying the subtests that still need additional norming, calibration, and
patient-cohort validation before clinical or large-scale deployment.

Spectrum Sensing for RIS-aided Communication Systems: Analysis and Design

Author(s): Parihar Nikhilsingh Pradipsingh
Advisor(s): Praful Mankar

Masters

July '26
Report no: IIIT/TH//
Center of SPCRC

Abs PDF

Spectrum Sensing for RIS-aided Communication Systems: Analysis and Design

Abstract

The rapid growth of wireless communication systems and advances in beyond-5G and 6G
networks have increased the demand for utilizing the limited spectrum efficiently. A large portion of the licensed spectrum remains underutilized, and static spectrum allocation prevents
their efficient use. Cognitive Radio (CR) addresses this problem by enabling unlicensed users
to opportunistically access spectrum without causing harmful interference to the licensed users.
A conventional approach for such opportunistic access is through reliable spectrum sensing, in
which the unlicensed or secondary user (SU) determines the state of the spectrum before transmission. However, in practise, spectrum sensing faces challenges in low signal-to-noise ratio
(SNR), multipath fading, spatially correlated noise and uncertain channel state information
(CSI), which eventually degrades the sensing performance.
Recent advances in Reconfigurable Intelligent Surfaces (RIS) have introduced a new paradigm
by enabling programmable control over the uncontrolled propagation environment. Unlike
conventional systems that rely on and arbitrary path and wireless channel, RIS intelligently
directs the reflected signals to improve the quality of the received observations. Inspired by
the concept of RIS, this thesis investigates the rigorous spectrum sensing frameworks for RISassisted wireless systems with more emphasis on correlated propagation environments and
imperfect or unknown channel. The first part of the thesis considers spectrum sensing using
Maximum Eigenvalue Detection (MED). Under a complex Gaussian observation model, the received observations are characterized through a sample covariance matrix following a central
complex Wishart distribution. Spectrum sensing is formulated as a binary hypothesis testing
problem based on the maximum eigenvalue of the sample covariance matrix. While existing
approaches typically rely on asymptotic approximations, deriving the exact distribution of the
largest eigenvalue becomes challenging in the presence of correlated Rayleigh fading and correlated Gaussian noise. To address this problem, an exact cumulative distribution function of
the largest eigenvalue is derived under both the null and alternative hypotheses using random
matrix theory. These results enable a closed-form evaluation of the probabilities of false alarm
and detection without relying on asymptotic analysis. Furthermore, RIS phase shifts are optimized to maximize the expected value of the detection or test statistic, which helps to improve
the statistical separability between the two hypotheses. The simulation results validate the analytical expressions and demonstrate significant improvements in the sensing performance with
optimized RIS configurations.
The second part of the thesis addresses practical scenarios in which the channel and the
primary user's transmit power are unknown. A generalized likelihood ratio test (GLRT)-based
sensing framework is developed for RIS-assisted systems operating in correlated noise environments. To facilitate reliable parameter estimation, a group-wise RIS activation strategy
is proposed using a Beyond-Diagonal (BD)-RIS architecture. Maximum likelihood estimation is used to jointly estimate the unknown channel and transmit power, after which the RIS
phase shifts are optimized to maximize the effective sensing SNR. These estimated parameters
are subsequently incorporated into the GLRT, resulting in a robust joint estimation-detection
framework capable of operating under CSI uncertainty. Numerical results demonstrate that
the proposed detector consistently outperforms conventional energy detection, particularly in
correlated and low-SNR environments.
This thesis establishes a theoretical framework for RIS-assisted spectrum sensing by integrating random matrix theory, estimation, hypothesis testing, and programmable wireless propagation through RIS. The proposed methodologies provide an exact analytical performance in
correlated environments while addressing practical challenges arising from unknown channel
parameters. These contributions advance the theoretical understanding of RIS-assisted spectrum sensing and provide practical design methodologies for reliable spectrum access in future
cognitive radio and beyond-5G/6G wireless communication systems.

Automating the Lifecycle of Architecture Design Decisions using Generative AI

Author(s): Adyansh Kakran
Advisor(s): Karthik Vaidhyanathan

Masters

July '26
Report no: IIIT/TH//
Center of SERC

Abs PDF

Automating the Lifecycle of Architecture Design Decisions using Generative AI

Abstract

Software architecture is shaped by the design decisions made throughout a system's lifecycle. These
decisions influence the structure and quality of the system, and when the reasoning behind them is not
preserved, future maintenance becomes more difficult and error-prone. An effective way of capturing
this reasoning and the decision in a structured manner has been the use of architectural decision records.
However, keeping these records up to date requires sustained manual effort, which often leads to incomplete or missing entries as project timelines tighten. Beyond individual decisions, assessing whether
an architecture as a whole meets its quality goals requires organising dedicated review sessions with
multiple stakeholders, a process that is difficult to fit into the pace of most development cycles.
Generative AI has shown practical value in supporting software engineering tasks such as code generation and requirements analysis. Because architecture knowledge management relies heavily on writing
and interpreting text, these models are a reasonable fit for this area. However, current uses of LLMs in
software architecture tend to rely on basic prompting strategies, which often fall short when tasks require project-specific context or consistent, structured reasoning. As a result, the manual effort involved
in managing architecture design decisions has remained largely unaddressed.
This thesis investigates how LLMs and agentic workflows can be applied to reduce the manual effort
across the architecture design decision lifecycle. It proposes a structured approach to help developers
draft complete architectural decision records from brief inputs. To support the ongoing quality of these
records, the work examines automated assessment techniques using different language model configurations. These ideas are then extended to whole-architecture evaluation through a multi-agent framework
designed to simulate established review processes.
The work is grounded in a dataset of open-source architectural decision records and published architecture case studies. The results show that combining model fine-tuning with historical examples
supports the generation of concise records that align reasonably well with human writing patterns. The
automated assessment configurations offer a practical way to flag unclear or incomplete records without requiring continuous human review. The multi-agent evaluation framework shows some ability to
identify structural risks and tradeoffs, and can reduce the initial effort required for scenario-based architecture reviews.
Overall, the thesis suggests that combining domain-adapted language models with structured agentic
workflows can provide a useful foundation for managing architecture design decisions. By lowering
the effort involved in recording and evaluating decisions, these approaches support more consistent
documentation practices and easier long-term system maintenance.

Molecular Subgraph Extraction for Explainable Drug-Target Interaction Prediction Using Graph Neural Networks

Author(s): Mittapally Nivesh
Advisor(s): Krishna Reddy Polepalli

Masters

July '26
Report no: IIIT/TH//
Center of DSAC

Abs PDF

Molecular Subgraph Extraction for Explainable Drug-Target Interaction Prediction Using Graph Neural Networks

Abstract

The identification of Drug-Target Interactions (DTIs) is a critical step in the drug discovery process,
involving the assessment of binding affinities between small-molecule drug candidates and biological
target proteins. Over the past decade, graph neural network (GNN) based approaches have substantially
improved the accuracy of binding affinity prediction by representing drug molecules as molecular graphs
and learning chemically informative embeddings through message-passing operations. Despite these
advances, a fundamental gap persists in the literature: existing GNN-based DTI models predict whether
a drug candidate is likely to bind to a target, but do not identify which subgraph of the drug molecule
is structurally responsible for that interaction. From a drug discovery perspective, this subgraph, known
as the pharmacophore, is the essential structural and chemical feature that mediates binding, and its
identification is indispensable for lead optimisation, scaffold design, and the rationalisation of structureactivity relationships. There is an opportunity to address this gap by exploiting the knowledge encoded
in a trained GNN model to identify potential pharmacophores from drug candidates.
To address this problem, in this thesis we propose ExplainableDeepGNN, a two-phase GNN-based
framework in which the Monte Carlo Tree Search (MCTS) algorithm is employed to extract potential pharmacophores from drug candidates by leveraging a trained GNN model. In the first phase, we
employ a GNN-based model named GraphNetDTI to compute the binding affinity score for a given
drug-target pair, and potential drug candidates are identified based on this score. In the second phase,
by employing the MCTS algorithm, the trained GraphNetDTI model is leveraged to extract potential
subgraphs representing pharmacophores from the identified drug candidates.
The framework is evaluated on three benchmark datasets: Davis, Kiba, and Allergy. The experimental results show that GraphNetDTI achieves high predictive accuracy on all three datasets. Furthermore,
ExplainableDeepGNN successfully extracts pharmacophores of relatively small size from the identified
drug candidates, with the predicted binding affinities of the extracted subgraphs being very similar to
those of the corresponding drug candidates.
The proposed approach helps explore the potential pharmacophore of a given drug candidate, which
enhances the explainability of drug-target interaction prediction and provides deeper insights into crucial
molecular interactions.

Geometry-Enabled Models for Robust Robot Perception

Author(s): Jayanti Rohit Sreekanth
Advisor(s): K Madhava Krishna

Masters

July '26
Report no: IIIT/TH//
Center of DSAC

Abs PDF

Geometry-Enabled Models for Robust Robot Perception

Abstract

Imagine an embodied agent navigating a complex, unstructured environment. To operate effectively,
whether it is retrieving a specific object or building a comprehensive map of a building, the robot must
reliably recognize and associate the same physical entities across different points in time and space.
Historically, this capability relied on matching sparse, isolated pixels. However, the paradigm of visual
correspondence is shifting toward segment matching: establishing connections between semantically
and geometrically coherent regions. By reasoning over entire objects or structural surfaces rather
than individual points, robots can achieve a higher-level, interpretable, and fundamentally more robust
understanding of their surroundings.
Despite its vast potential, segment matching breaks down under wide-baseline conditions. When an
agent revisits a scene from an opposing viewpoint and experiences extreme perspective distortion, scale
variation, and camera rotations up to 180◦
, the 2D appearance of objects changes drastically. Current
state-of-the-art vision foundation models and video propagators, trained primarily on 2D priors, struggle
severely in these scenarios. Their routine failure modes include perceptual instance aliasing (confusing
identical but distinct objects) and topological collapse (failing to recognize the exact same object from a
new angle), ultimately leading to fragmented maps and navigational failures.
To resolve this blindspot, this thesis introduces SegMASt3R (Geometry Grounded Segment Matching), a novel architecture that repurposes the strong spatial inductive biases of a 3D Foundation Model
(3DFM) for robust, wide-baseline segment association. The proposed pipeline leverages a frozen MASt3R
backbone to extract dense, 3D-aware patch features. A newly designed segment-feature head then aggregates these dense representations into compact, instance-discriminative descriptors. To gracefully handle
the reality of occlusions and restricted fields-of-view, the assignment process is formulated through a
differentiable optimal transport layer. By employing a Sinkhorn solver equipped with a learnable dustbin
parameter, the network learns to explicitly reject unmatchable segments while enforcing strict geometric
consistency for valid pairs.
Extensive empirical evaluations demonstrate that SegMASt3R achieves state-of-the-art accuracy
in wide-baseline segment association. It outperforms existing 2D foundational baselines and dense
local feature matchers by up to 30% on the AUPRC metric across challenging indoor benchmarks,
including ScanNet++ and Replica. Furthermore, the learned geometry-grounded representations exhibit
strong zero-shot generalization to unconstrained outdoor environments, such as the MapFree dataset.
Finally, this thesis validates the practical utility of SegMASt3R in downstream robotic perception
pipelines, demonstrating significantly reduced identity fragmentation in 3D instance mapping and
enhanced localization success in object-relative topological navigation.