Abstract
                                                                        This paper introduces PMIndiaSum, a new  multilingual and massively parallel headline  summarization corpus focused on languages in  India. Our corpus covers four language families, 14 languages, and the largest to date, 196  language pairs. It provides a testing ground for  all cross-lingual pairs. We detail our workflow  to construct the corpus, including data acquisition, processing, and quality assurance. Furthermore, we publish benchmarks for monolingual, cross-lingual, and multilingual summarization by fine-tuning, prompting, as well  as translate-and-summarize. Experimental results confirm the crucial role of our data in  aiding the summarization of Indian texts. Our  dataset is publicly available and can be freely  modified and re-distributed.