Abstract
Community-driven Question Answering (CQA) platforms such as Stack Overflow, Quora, Yammer, AnswerHub, etc., have gained immense popularity in recent years among the Internet users to seek, learn and share information. These platforms not only provide users with the means to discuss their queries with the community and fulfill their information needs, but also serve the purpose of knowledge-base creation. Each CQA platform encourages rich content by recognizing contributions of its users in terms
of different metrics of reputation. Thus, peers on these platforms dissuade other users to post low- quality content by closing, disliking or not answering their questions. This is expected to keep the platform focused; however, the aggressive attitude of the community towards marking the posts as low- quality leads to negative impact on users’ experience of the platform (especially for new users) leading to reduced interest in the platform. Thus, to improve the user experience and maintain the content quality
on the CQA platforms includes issues such as the increasing count of low-quality questions, aggressive communities and the editing process for the low-quality questions on the platforms.
In this thesis, we address two issues related to the content quality management at the CQA platforms: (1) We note that editing closed questions can be time consuming, especially for new users: Often, the edited versions of the closed questions may still be found to be unsuitable by moderators and remain closed for a long time. This motivates the need of automated mechanisms that can potentially help users with appropriate feedback on their closed, disliked or unanswered questions, so that they can obtain rel-evant answers in a timely fashion without getting unnecessarily stuck waiting for moderators’ feedback.(2) We also note that another reason behind questions getting closed is duplicity. The community marks
them as duplicate considering their semantic similarity with the related questions already available on the platform. Although users can search for related questions using the search & retrieval systems of these platforms before posting their questions. But most of these systems capture and return questions primarily based on syntactic similarity which may not meet the exact need of the user.
To address the above-mentioned issues: First, we present a framework to assist the users in the re-opening process of closed questions on the platform. We build a predictive modeling framework that suggests to a user as to whether the edited version of his/her closed question will get successfully re-opened or not. This can assist users at large by retracting them from entering the review process with improper edits. To learn these models effectively, we analyze the closed questions of established & non-established users (determined by their reputation score). We note that the established users have higher odds of getting their closed questions reopened than non-established users. Thus, we present
an approach to leverage the better editing skills of established users to train a binary classifier that determines whether the edited closed questions will get reopened or not. In order to train such a classi- fier, we identify a rich set of relevant and effective features based on exploratory analysis of a popular CQA platform, namely Stack Overflow. Through empirical evaluation of the proposed approach using publicly available data from Stack Overflow CQA platform, (i) we demonstrate that the proposed pre- dictive modeling framework can be used to estimate whether an edited version of a closed question will get reopened or not, (ii) we show that by leveraging such a framework, the users can potentially save
substantial amount of time that currently goes waste while waiting for a response from the moderators. Second, to reduce the number of duplicate questions we present a mechanism to generate a set of
factual questions from the accepted answers of the top-rated questions on the Stack Overflow platform. The mechanism is based on the SRL (Semantic Role Labels) parse of the sentences in the accepted answers of these questions. We demonstrate that the generated set of factual questions can potentially increase the syntactic coverage of the questions on the platform, thus yielding better results while using the search & retrieval system of the platform. Eventually, this can lead to lesser number of duplicate
closed questions and improved content quality on the Stack Overflow platform.