Abstract
A wide variety of information pertaining to human behaviour, ranging from connections on social
networks to accounts of discrimination and abuse, is being produced on the internet. Predictive models
built using this data can be valuable for various domains including computational social science, personalization, recommendation, and marketing. Creating such predictive models can be challenging owing
to the fact that the data available on online media relevant to human behaviour may be unstructured,
inadequate and/or highly imbalanced. In this thesis, we consider two contrasting instances of human
behaviour: trust and discrimination, especially gender-based discrimination, i.e., sexism. We formulate
a novel approach based on matrix factorization for predicting user-user trust by capitalizing on community memberships and devising a new way to model the homophily theory. With regard to sexism, we
introduce the problem of multi-label categorization of accounts describing sexism of any kind(s) and
propose neural network based solutions for it.
The prediction of trust relations between users of social networks is critical for finding credible
information. Inferring trust is challenging, since user-specified trust relations are highly sparse and
power-law distributed. In this thesis, we explore utilizing community memberships for trust prediction
in a principled manner. We also propose a novel method to model homophily that complements existing
work and a way of capitalizing on helpfulness scores of users’ item ratings.
To the best of our knowledge, this is the first work that mathematically formulates a hypothesis
pertaining to community memberships for unsupervised trust prediction. We propose and model the
hypothesis that a user is more (or as) likely to develop a trust relation within the user’s community than
(or as) outside it. Unlike existing work, our approach for encoding homophily directly links user-user
similarities with the pair-wise trust model. Our formulation for tapping rating helpfulness scores relates
to the notion that users receiving low rating helpfulness scores should not receive high levels of trust on
average. We derive mathematical factors that model our hypothesis relating community memberships
to trust relations, the homophily effect, and the premise involving helpfulness scores. Along with an
existing method, they are combined into chTrust, the proposed multi-faceted optimization framework.
Our experiments on the standard Ciao and Epinions datasets show that in the majority of the settings
involving the fraction of the trust relations used for training, whether all user pairs or only the low-degree
ones are considered in the evaluation, and the dataset, the proposed framework outperforms all but one
of the unsupervised trust prediction baselines considered; no proposed or baseline method performs the
best across all settings.We also develop a supervised method for trust prediction that avoids any dependence on auxiliary
information such as ratings or reviews. We propose combining representation learning with the inference of trust relations. Using only a small amount of binary user-user trust relations, our approach
simultaneously learns embeddings for the users and a trust prediction model.
Another human behaviour that we endeavour to analyze through predictive models is discrimination,
especially the form of discrimination that is perpetrated on the basis of sex. Sexism, an injustice that
gives rise to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing
documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism
has the potential to assist social scientists and policy makers in studying and thereby countering sexism.
The existing work on sexism classification has certain limitations in terms of the categories of sexism
used and/or whether they can co-occur. To the best of our knowledge, this is the first work on the
multi-label classification of sexism of any kind(s), and we contribute the largest dataset for sexism
categorization.
We also consider the related task of the classification of misogyny. While sexism classification
is performed on textual accounts describing sexism suffered or observed, misogyny classification is
carried out on tweets perpetrating misogyny.
We devise a novel neural framework for classifying sexism and misogyny that can combine text representations obtained using models such as BERT with distributional and linguistic word embeddings
using a flexible architecture involving recurrent components and optional convolutional ones. Further,
we leverage unlabeled accounts of sexism to infuse domain-specific elements into our framework. In
order to evaluate the versatility of our neural approach for tasks pertaining to sexism and misogyny,
we experiment with adapting it for misogyny identification. For categorizing sexism, we investigate
m