Abstract
Data mining is the process of discovering significant and potentially useful knowledge in the form of patterns from the data. As a result, the notion of interestingness is very important for extracting useful knowledge patterns. Numerous interestingness measures have been discussed in the literature to assess the interestingness of a knowledge pattern. In this thesis, we focus on selecting a right interestingness measure for mining association rules, in particular rare association rules.
Association rule mining is an important knowledge discovery technique in the field of data mining. It involves finding interesting associations between the sets of objects in a transactional database. A rare association rule is an association rule with items having low support.
In many real-world applications, rare association rules can provide useful information to the users. Typically, association rules are extracted with support and con f idence measures. Several other interestingness measures, such as li f t and all-con f idence, have also been used to extract association rules. Each interestingness measure has its own selection bias that justifies the significance of an association rule over others. Thus, there exists no single interestingness measure which is better than others in all application domains. Each interestingness measure has a set of properties. A framework exists in the literature which suggests to select a measure
based on the properties of interest to the user. However, it is unclear which properties a user should consider for mining rare association rules. In this thesis, we have analyzed the properties of different interestingness measures and suggest the properties which the user should consider for extracting rare association rules. The experimental results from real-world datasets show that the measures satisfying the prescribed properties can efficiently extract rare association rules.
In addition, in this thesis, we have proposed a new model to extract periodic-frequent patterns. Periodic-frequent patterns is a class of user-interest based frequent patterns which uses temporal periodicity as the interestingness criterion. Informally, a pattern is said to be periodicfrequent if it occurs at regular intervals specified by the user throughout the database. We have analyzed the existing approaches to mine periodic-frequent patterns. The basic model of periodic-frequent pattern mining uses “single constraints” and suffers from “rare item problem.”
To confront the problem, an alternative model based on the notion of “multiple constraints” exists in the literature. However, this model is computationally expensive to implement. It is because the periodic-frequent patterns discovered with this model do not satisfy downward closure property. Furthermore, it has been observed that this model still generates
some uninteresting patterns as periodic-frequent patterns. In this thesis, we propose an alternative model for extracting periodic-frequent patterns to address the issues of the existing approaches. We have performed experiments on both synthetic and real-world datasets. The results from the experiments show that the proposed approaches can efficiently extract periodic-frequent patterns with low support.