Abstract
Science journalism is the art of conveying a detailed scientific research paper in a form that nonscientists can understand and appreciate while ensuring that its underlying information is conveyed accurately. It aims to transform jargon-laden scientific articles into a form that a common reader can comprehend while ensuring that the meaning of the article is retained. It plays a crucial role in making scientific content suitable for consumption by the public at large. Recent advances in Deep Learning research and it’s applications in natural language processing have made way to impressive results in Natural Language Generation. We leverage these advances to explore the possibility of their use in journalism, science journalism in particular, as comprehension of scientific content is much harder challenge than most of the other forms of content, like shorthand, which journalists use while writing articles. In this work, we introduce the problem of automated science journalism and present ways to automate some parts of the workflow by automatically generating the ‘title’ of a blog version of a scientific paper. We have built a corpus of 87, 328 pairs of research papers and their corresponding blogs from two science news aggregators and have used it to build ScienceBlogger - a pipeline-based architecture consisting of a two-stage mechanism to generate the blog titles. To demonstrate the models, we built an interactive tool, where a user can give abstract and title of a research paper, which would be processed by our APIs to produce a blog title, along with some relevant information about the model used for the generation. Evaluation using standard metrics indicate viability of the proposed system.