Abstract
Popularity prediction is a well-studied machine learning task with wide-ranging business applications. The primary goal in a popularity prediction problem is to estimate the future popularity of a piece of content that is put up on the Internet. For example, a popularity prediction model might estimate the number of page views that a particular online news article would accrue over a time period. Or a social-media popularity prediction system might try to foretell the number of comments or shares that some social media post would receive over a week. Being able to predict the prospective popularity of content within a very short time after its publication has multiple downstream applications in article recommendation and advertising, and document retrieval. Additionally, proactively forecasting document popularity effectively allows an editor to improve a document in order to make it more engaging. Though there has been much work on the document-level prediction of prospective popularity labels for news articles, online petitions, social media posts, etc., there have been no attempts to forecast information popularity at a more granular level. Sentence-level popularity forecasting is of interest not only because of the insights it produces about how informative sentences contribute to a document’s overall popularity but also because of its potential applications in tasks such as popularity-guided text
summarization, pull quote selection, catchphrase extraction, webpage search engine optimization, etc. In this thesis, we overcome existing challenges to capture information popularity at the level of individual sentences, and we present neural models that effectively forecast sentence-specific information popularity.
Formally, we introduce the task of proactively forecasting relative popularities of sentences within online news documents solely utilizing their natural language content. Instead of taking a simplistic
route where one would predict binary popular-or-not labels for individual sentences, we frame sentencepopularity forecasting as a regression task where we assign a normalized score ∈ [0, 1] to each sentence indicating its intra-document information popularity relative to all other sentences. To overcome the absence of sentence-level labeled data for information popularity, we curate INFOPOP, a dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents sourced from 26 reputed news websites. To the best of our knowledge, INFOPOP is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We observe how certain aspects of information popularity are related to sentence salience and propose auxiliary transfer learning subtasks of sentence salience prediction for pretraining our neural sentence popularity forecasting models. Sentence salience is well studied in the context of automatic text
summarization. In this thesis, we provide an overview of document summarization approaches and
present our findings from two scholarly document summarization tasks. In particular, we describe an
abstractive and an extractive approach for summarizing scientific research papers, which have led to
state-of-the-art results on two shared tasks, namely LaySumm and LongSumm, at the Scholarly Document Processing workshop (SDP) at EMNLP ’20. We further discuss the distinction between salience
and sentences’ summary-inclusion worthiness in an extractive summarization setting.
For sentence-specific popularity forecasting, we experiment with unsupervised baselines, a rudimentary recurrent neural network-based model, and a robust sentence-sequence regression architecture
based on BERT (Bidirectional Encoder Representations from Transformers). Our results show that our
proposed Transfer Learning approach produces significant performance gains on all metrics – reaching
nDCG values over 0.8. We conduct extensive evaluations to highlight insightful distinctions between
our primary and auxiliary tasks. Notably, our findings showcase a non-trivial takeaway: though information popularity and information salience are varying concepts, transfer learning from salience prediction
significantly enhances sentence popularity forecasting.