Automatic recognition of visual entities under different learning scenarios.
Traditional supervised learning techniques assume that the data distributions governing the training and test samples are identical. However, such an assumption is abruptly violated in real-world scenarios. Domain adaptation techniques are used to construct robust supervised learning systems under the notion of domain-shift. We have been working on different deep learning based approaches for end-to-end cross-domain classifier design for different types of visual data and different domain adaptation setups like open-set, closed-set, multi-source, partial, to name a few. (CVPRw 19, GCPR 19, IEEE J-STARS 17 etc.)
Recent success of deep learning techniques in visual recognition can largely be accredited to the availability of an abundance of labeled data. However, manually annotating such a large amount of data is a humongous task, specifically considering the ever growing nature of the object categories in the real-world. Re-training of inference models for every new batch of classes cannot be regarded as cost-effective due to obvious reasons, thus limiting the performance of the artificial intelligence systems. On the other hand, humans possess a remarkable ability of recognizing previously unseen object classes without resorting to any visual training samples (such as images or video) of those classes but exploring merely some semantic side information. To this end, Zero-shot Learning mimics the human learning ability where machine learning models are trained to recognize previously unseen object classes. We have been working extensively on both discriminative and generative approaches for zero-shot learning. (IEEE ICPR 18, BMVC 18, BMVC 19, ACM ICVGIP 18 etc.)
Lately, we have started working on different meta learning approaches for scenarios like few-shot learning, incremental learning, incremental few-shot learning etc.
We have extensively worked on human activity recognition from videos. In this regard, we are interested in finding fine-grained video features by applying image based ConvNets for video data. (JVRIU 18, ICASSP 17)
Techniques to optimize the deep learning structure for different applications.
The over-parameterized deep learning models are difficult to deploy in low-latency devices due to memory issues. Knowledge distillation techniques are used to inherit the properties of the large model (also known as the Teacher) into a comparatively shallow model (or Student). We have been working on robust distillation strategies for different datasets like remote sensing and models like spiking neural networks. (ICCVw 19)
We work in different aspects of graph convolution networks specially for multi-label image classification, and cross-modal image retrieval. Since the images can be represented as region adjacency graphs, the usage of graph CNN offers more fine-grained analysis of image data by exploring the mutual dependence of the regions. (Neurocomputing 19, CVIU 19)
For cost-effective analysis, we have worked in the domain of multi-task learning. In particular, we are interested in different visual inference tasks like semantic segmentation, surface-normal prediction, and depth estimation from monocular images. We are interested in developing efficient multi-task models in this regard. (CVPRw 20)
Other areas of interest in computer vision.
Our research is currently supported by SERB, DST, ISRO, and AWL INC. Japan.