Abstract
Data scientists very often find that a central step in their work, is to implement an appropriate transformation restructuring the originally given data into a new and more revealing form. Although specific domain knowledge can be used to help design representations, data-driven learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. In order to make it possible to apply machine learning to different domains, it is very important to make learning algorithms less dependent on manual feature engineering, so that novel AI based applications could be developed faster.
The work in this thesis focuses not only on the mapping from representation to output but also the representation itself by mining hidden patterns from available data. Keeping this in mind, we propose a learning model to alleviate difficulties involved in feature engineering through automation. We develop a learning model, based on regression between feature pairs, that discovers underlying patterns and its variations in the data, by the way features are related to each other and selects a very small number of new features to create a significant improvement in predictive performance. Because this model takes into account the inherent feature structures, we took our next motivating example from bioinformatics. The features in this domain have some natural spatial order (for instance, in the medical domain, genes are often associated with different types of clinical features) and thus incorporating such structure can help select more important features and achieve more accurate classification accuracy. It also provides better readability and interpretability to the models.
The final component of the work focuses on the problem of fake news detection. The low cost, easy access and rapid circulation of information over the internet has encouraged more people than ever, to seek out and consume news from online sources or social media rather than traditional news organizations. While the traditional (count-based) feature engineering strategies for textual data are effective methods for extracting features from text, due to the inherent nature of the model being just a bag of unstructured words, we lose additional information like the semantics, structure, sequence and context around nearby words in each text document. This formed as a motivation for us to explore more sophisticated models which can capture this information and give us features which are vector representation of words. Keeping in mind that fake news is intentionally written to mislead readers to believe false information, it can be difficult to detect it, solely based on news content. Therefore, we need to include auxiliary information, such as user social engagements on social media, to help make adetermination. We therefore design, a novel attention based hybrid network which integrates the article information along with the meta data and claims of the news articles.