IIITH

Exploring pan-cancer similarities from a deep learning perspective

Frontiers in Oncology, FIO, 2021

Core Rank : - Google Rank :133

Abs PDF bibTex

@inproceedings{bib_Expl_2021, AUTHOR = {R, Ashish and Singh, Piyush and Krishnanunni, Vinod Palakkad and V, Jawahar C }, TITLE = {Exploring pan-cancer similarities from a deep learning perspective}, BOOKTITLE = {Frontiers in Oncology}. YEAR = {2021}}

Exploring pan-cancer similarities from a deep learning perspective

Abstract

Histopathology image analysis is widely accepted as a gold standard for cancer diagnosis. The Cancer Genome Atlas (TCGA) contains large repositories of histopathology whole slide images spanning across several organs and subtypes. However, not much work has gone into analysis of all the organs and subtypes and their similarities. Our work attempts to bridge this gap by training deep learning models to classify cancer vs normal patches for 11 subtypes spanning 7 organs (9792 tissue slides) to achieve near-perfect classification performance. We used these models to investigate their performances in the test set of other organs (cross organ inference). We found that every model had a good cross organ inference accuracy when tested on breast, colorectal and liver cancers. Further, a high accuracy is observed between models trained on the cancer subtypes originating from same organ (kidney and lung). We also validated these performances by showing the separability of cancer and normal samples in a high dimensional feature space. We further hypothesized that the high cross organ inferences are due to shared tumor morphologies among organs. We validated the hypothesis by showing the overlap in the Gradient-weighted Class Activation Mapping (GradCAM) visualizations and similarities in the distributions of nuclei geometrical features present within the high attention regions. Keywords: TCGA, histopathology, cancer classification, cross-organ inference, deep learning, tissue morphology, class activation map.

Interactive Learning for Assisting Whole Slide Image Annotation

Asian Conference on Pattern Recognition, ACPR, 2021

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Inte_2021, AUTHOR = {R, Ashish and Singh, Piyush and Krishnanunni, Vinod Palakkad and V, Jawahar C }, TITLE = {Interactive Learning for Assisting Whole Slide Image Annotation}, BOOKTITLE = {Asian Conference on Pattern Recognition}. YEAR = {2021}}

Interactive Learning for Assisting Whole Slide Image Annotation

Abstract

Owing to the large dimensions of the histopathology whole slide images (WSI), visually searching for clinically significant regions (patches) is a tedious task for a medical expert. Sequential analysis of several such images further increases the workload resulting in poor diagnosis. A major impediment towards automating this task using deep learning models is that it requires a huge chunk of laboriously annotated data in the form of WSI patches. Our work suggests a novel CNN-based, expert feedback-driven interactive learning technique to mitigate this issue. The proposed method seeks to acquire labels of the most informative patches in small increments with multiple feedback rounds to maximize the throughput. It requires the expert to query a patch of interest from one slide and provide feedback to a set of unlabelled patches chosen using the proposed sampling strategy from a ranked list. The experiments on a large patient cohort of colorectal cancer histological patches (100K images with nine classes of tissues) show a significant reduction (≈ 95%) in the amount of labelled data required to achieve state-of theart results when compared to other existing interactive learning methods (35% − 50%). We also demonstrate the utility of the proposed technique to assist a WSI tumor segmentation annotation task using the ICIAR breast cancer challenge dataset (≈ 12.5K patches per slide). The proposed technique reduces the scanning and searching area to about 2% of the total area of WSI (by seeking labels of ≈ 250 informative patches only) and achieves segmentation outputs with 85% IOU. Thus our work helps avoid the routine procedure of exhaustive scanning and searching during annotation and diagnosis in general.

Personalized One-Shot Lipreading for an ALS Patient

British Machine Vision Conference, BMVC, 2021

Core Rank : B Google Rank :65

Abs PDF bibTex

@inproceedings{bib_Pers_2021, AUTHOR = {Sen, Bipasha and Agarwal, Aditya and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P and V, Jawahar C }, TITLE = {Personalized One-Shot Lipreading for an ALS Patient}, BOOKTITLE = {British Machine Vision Conference}. YEAR = {2021}}

Personalized One-Shot Lipreading for an ALS Patient

Abstract

Lipreading or visually recognizing speech from the mouth movements of a speaker is a challenging and mentally taxing task. Unfortunately, multiple medical conditions force people to depend on this skill in their day-to-day lives for essential communication. Patients suffering from ‘Amyotrophic Lateral Sclerosis’ (ALS) often lose muscle control, consequently their ability to generate speech and communicate via lip movements. Existing large datasets do not focus on medical patients or curate personalized vocabulary relevant to an individual. Collecting large-scale dataset of a patient, needed to train modern data-hungry deep learning models is however, extremely challenging. In this work, we propose a personalized network to lipread an ALS patient using only one-shot examples. We depend on synthetically generated lip movements to augment the one-shot scenario. A Variational Encoder based domain adaptation technique is used to bridge the real-synthetic domain gap. Our approach significantly improves and achieves high top-5 accuracy with 83.2% accuracy compared to 62.6% achieved by comparable methods for the patient. Apart from evaluating our approach on the ALS patient, we also extend it to people with hearing impairment relying extensively on lip movements to communicate.

Multi-Domain Incremental Learning for Semantic Segmentation

Winter Conference on Applications of Computer Vision, WACV, 2021

Core Rank : - Google Rank :109

Abs PDF bibTex

@inproceedings{bib_Mult_2021, AUTHOR = {V, Jawahar C and Garg, Prachi and Saluja, Rohit and Balasubramanian, Vineeth N and Arora, Chetan and Subramanian, Anbumani }, TITLE = {Multi-Domain Incremental Learning for Semantic Segmentation}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2021}}

Multi-Domain Incremental Learning for Semantic Segmentation

Abstract

Recent efforts in multi-domain learning for semantic segmentation attempt to learn multiple geographical datasets in a universal, joint model. A simple fine-tuning experiment performed sequentially on three popular road scene segmentation datasets demonstrates that existing segmentation frameworks fail at incrementally learning on a series of visually disparate geographical domains. When learning a new domain, the model catastrophically forgets previously learned knowledge. In this work, we pose the problem of multi-domain incremental learning for semantic segmentation. Given a model trained on a particular geographical domain, the goal is to (i) incrementally learn a new geographical domain, (ii) while retaining performance on the old domain, (iii) given that the previous domain’s dataset is not accessible. We propose a dynamic architecture that assigns universally shared, domain-invariant parameters to capture homogeneous semantic features present in all domains, while dedicated domain-specific parameters learn the statistics of each domain. Our novel optimization strategy helps achieve a good balance between retention of old knowledge (stability) and acquiring new knowledge (plasticity). We demonstrate the effectiveness of our proposed solution on domain incremental settings pertaining to realworld driving scenes from roads of Germany (Cityscapes), the United States (BDD100k), and India (IDD). 1

Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2021

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Inte_2021, AUTHOR = {Gupta, Anchit and Khan, Faizan Farooq and Mukhopadhyay, Rudrabha and Namboodiri, Vinay P and V, Jawahar C }, TITLE = {Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2021}}

Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

Abstract

This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern

Efficient and Generic Interactive Segmentation Framework to Correct Mispredictions during Clinical Evaluation of Medical Images

International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI, 2021

Core Rank : A Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Effi_2021, AUTHOR = {S, Bhavani and Gupta, Ashutosh and V, Jawahar C and Arora, Chetan }, TITLE = {Efficient and Generic Interactive Segmentation Framework to Correct Mispredictions during Clinical Evaluation of Medical Images}, BOOKTITLE = {International Conference on Medical Image Computing and Computer Assisted Intervention}. YEAR = {2021}}

Efficient and Generic Interactive Segmentation Framework to Correct Mispredictions during Clinical Evaluation of Medical Images

Abstract

Semantic segmentation of medical images is an essential first step in computer-aided diagnosis systems for many applications. However, given many disparate imaging modalities and inherent variations in the patient data, it is difficult to consistently achieve high accuracy using modern deep neural networks (DNNs). This has led researchers to propose interactive image segmentation techniques where a medical expert can interactively correct the output of a DNN to the desired accuracy. However, these techniques often need separate training data with the associated human interactions, and do not generalize to various diseases, and types of medical images. In this paper, we suggest a novel conditional inference technique for DNNs which takes the intervention by a medical expert as test time constraints and performs inference conditioned upon these constraints. Our technique is generic can be used for medical images from any modality. Unlike other methods, our approach can correct multiple structures simultaneously and add structures missed at initial segmentation. We report an improvement of 13.3, 12.5, 17.8, 10.2, and 12.4 times in user annotation time than full human annotation for the nucleus, multiple cells, liver and tumor, organ, and brain segmentation respectively. We report a time saving of 2.8, 3.0, 1.9, 4.4, and 8.6 fold compared to other interactive segmentation techniques. Our method can be useful to clinicians for diagnosis and post-surgical follow-up with minimal intervention from the medical expert. The source-code and the detailed results are available here [1].

ICDAR 2021 Competition on Document Visual Question Answering

International Conference on Document Analysis and Recognition Workshops, ICDAR-W, 2021

Abs PDF bibTex

@inproceedings{bib_ICDA_2021, AUTHOR = {Tito, Rub`en and MATHEW, MINESH and V, Jawahar C and Valveny, Ernest and Karatzas, Dimosthenis }, TITLE = {ICDAR 2021 Competition on Document Visual Question Answering}, BOOKTITLE = {International Conference on Document Analysis and Recognition Workshops}. YEAR = {2021}}

ICDAR 2021 Competition on Document Visual Question Answering

Abstract

In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5 , 000 infographics images and 30 , 000 question-answer pairs. The winner methods have scored 0 .6120 ANLS in Infographics VQA task, 0 .7743 ANLSL in Document Collection VQA task and 0 .8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.

Towards Boosting the Accuracy of Non-latin Scene Text Recognition

International Conference on Document Analysis and Recognition Workshops, ICDAR-W, 2021

Abs PDF bibTex

@inproceedings{bib_Towa_2021, AUTHOR = {SANJANA, G and Saluja, Rohit and V, Jawahar C }, TITLE = {Towards Boosting the Accuracy of Non-latin Scene Text Recognition}, BOOKTITLE = {International Conference on Document Analysis and Recognition Workshops}. YEAR = {2021}}

Towards Boosting the Accuracy of Non-latin Scene Text Recognition

Abstract

Scene-text recognition is remarkably better in Latin languages than the non-Latin languages due to several factors like multiple fonts, simplistic vocabulary statistics, updated data generation tools, and writing systems. This paper examines the possible reasons for low accuracy by comparing English datasets with non-Latin languages. We compare various features like the size (width and height) of the word images and word length statistics. Over the last decade, generating synthetic datasets with powerful deep learning techniques has tremendously improved scene-text recognition. Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images. We discover that these factors are critical for the scene-text recognition systems. The English synthetic datasets utilize over 1400 fonts while Arabic and other non-Latin datasets utilize less than

Transfer Learning for Scene Text Recognition in Indian Languages

International Conference on Document Analysis and Recognition Workshops, ICDAR-W, 2021

Abs PDF bibTex

@inproceedings{bib_Tran_2021, AUTHOR = {SANJANA, G and Saluja, Rohit and V, Jawahar C }, TITLE = {Transfer Learning for Scene Text Recognition in Indian Languages}, BOOKTITLE = {International Conference on Document Analysis and Recognition Workshops}. YEAR = {2021}}

Transfer Learning for Scene Text Recognition in Indian Languages

Abstract

Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We perform experiments on the conventional CRNN model and STAR-Net to ensure generalisability. To study the effect of change in different scripts, we initially run our experiments on synthetic word images rendered using Unicode fonts. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages due to similarity in their n-gram distributions and visual features like the vowels and conjunct characters. We then study the transfer learning among six Indian languages

Asking questions on handwritten document collections

International Journal on Document Analysis and Recognition, IJDAR, 2021

Core Rank : - Google Rank :23

Abs PDF bibTex

@inproceedings{bib_Aski_2021, AUTHOR = {MATHEW, MINESH and Gomez, Lluis and Karatzas, Dimosthenis and V, Jawahar C and V, Jawahar C }, TITLE = {Asking questions on handwritten document collections}, BOOKTITLE = {International Journal on Document Analysis and Recognition}. YEAR = {2021}}

Asking questions on handwritten document collections

Abstract

This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognitionfree approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org.