Our research is currently supported by ANRF, DST, ISRO, AWL, Samsung, Adobe, Honda, Wobot, Fractal, MoES, Power Grid, to name a few.
Automatic recognition of visual entities under different learning scenarios.
Our research in visual recognition from images and videos focuses on enabling machines to understand and interpret complex visual scenes under unconstrained conditions. We address key challenges such as the semantic variability of visual categories, temporal dynamics in video data, and the need for context-aware recognition. In video-based reasoning, we emphasize the extraction of temporally coherent representations and scene graphs that can capture object interactions and event semantics. For image-based recognition, we investigate methods to align visual content with semantic concepts in a scalable manner. Our work also explores strategies for handling label noise, sparse supervision, and structured prediction across diverse visual tasks. These efforts are aimed at building perceptual models that can function reliably across varying spatial, temporal, and conceptual granularities in both static and dynamic settings.
Techniques to optimize the deep learning structure for different applications.
In parallel, we focus on designing robust deep learning models that can function reliably in the presence of data imperfections and environmental uncertainty. Our research explores principled approaches for improving generalization under limited, noisy, or non-i.i.d. data through strategies such as self-supervised representation learning, contrastive alignment, and noise-aware modeling. We also investigate architectural innovations that enhance robustness, including modular, generative, and hypernetwork-based designs, which facilitate efficient parameter sharing and dynamic task adaptation. Beyond accuracy, we emphasize the interpretability and calibration of deep models, ensuring that confidence estimates align well with predictive performance. These contributions collectively aim to build deep learning systems that are resilient, trustworthy, and capable of adapting to a wide spectrum of real-world applications.
Research on GeoCV, exploiting different modalities of remote sensing data.
Our remote sensing research is rooted in the design of domain-informed learning frameworks tailored to the structured nature of geospatial data. Unlike conventional visual data, remote sensing imagery spans multiple modalities and resolutions, often under constraints of label scarcity and environmental variability. We address these challenges by developing models that leverage spectral-spatial correlations, sensor fusion, and physics-aware priors. Our work spans hyperspectral classification, SAR interpretation, and multi-modal fusion (e.g., optical-LiDAR), targeting tasks such as land use mapping, vegetation monitoring, and disaster assessment. We also explore scalable learning paradigms—such as meta-learning and few-shot adaptation—for rapid deployment in unseen geographies. With a strong emphasis on data efficiency and transferability, our methodology bridges geoscientific understanding with modern machine learning to deliver reliable and interpretable remote sensing analytics.
Our research is currently supported by ANRF, DST, ISRO, AWL, Samsung, Adobe, Honda, Wobot, Fractal, MoES, Power Grid, to name a few.