Abstract
Knowledge graphs (KGs) have been playing a crucial role in leveraging informa- tion on web for several downstream tasks, making it vital to construct and maintain them. Despite previous efforts in populating KGs, these methods typically do not focus on analyzing entity-specific content exclusively but rely on a fixed collection of documents. We define an approach to populate such KGs by utilizing entity-specific content on the web, for generating entity embeddings to establish entity-category interconnections. We empirically prove our ap- proach’s effectiveness, by utilizing it for a downstream task of Notability detection, associated with one of the most popular and important Knowledge Graphs - Wikipedia platform. To mod- erate the content uploaded to Wikipedia, “Notability” guidelines are defined by its editors to identify named entities that warrant their article on Wikipedia. So far notability is enforced by humans, which makes scalability an issue, and there has been no significant work on automat- ing this process. In this paper, we define a multipronged category-agnostic approach based on web-based entity features and their text-based salience encodings, to construct entity embed- dings for determining an entity’s notability. We distinguish entities based on their categories and utilize neural networks to perform classification. For validation, we utilize accuracy and prediction confidence on popular Wikipedia pages. Our system outperforms machine learning- based classifier approaches and handcrafted entity salience detection algorithms, by achieving performance accuracy of around 88%. Our system provides an efficient and scalable alterna- tive to manual decision-making about the importance of a topic, which could be extended to other such KG-based tasks