Abstract
The field of data mining has emerged to extract knowledge hidden in large databases for better decision making. The process of frequent pattern (a set of items represents a pattern (or an itemset)) mining finds interesting information about the association among the items in a transactional database. The notion of support is employed to extract the frequent patterns. A pattern is called a frequent pattern if it satisfies the user-defined threshold on minimum support. An important criterion to assess the interestingness of a frequent pattern is its temporal occurrences in a database. That is, whether a frequent pattern is occurring periodically, irregularly, or mostly at specific time intervals in a database. The class of frequent patterns that are occurring periodically within a database are known as periodic-frequent patterns. Finding these patterns is a significant task with many real-world applications like improving the performance of recommender systems, intrusion detection in computer networks, discovering events in Twitter.
Current periodic-frequent pattern models cannot handle datasets in which multiple transactions share a common timestamp or when transactions occur at irregular time intervals. This issue limits the applicability of the model as in many real-world databases like e-Commerce, Twitter, etc., transactions share a common timestamp and uneven time gaps exist in between the consecutive transactions. Most previous models on periodic-frequent pattern mining have focused on finding all patterns in a transactional database that satisfy the user-specified minimum support (minSup) and maximum periodicity (maxP er) constraints. The minSup constraint controls the minimum number of transactions that a pattern must cover in a database. The maxP er constraint controls the maximum duration between the two transactions below which a pattern should reoccur in a database. The usage of a single minSup and maxP er for an entire database leads to the rare item problem, because real-world databases have a non-uniform item distribution, which considers that items have different support and periodicity values. Also, current periodic-frequent pattern models have focused on discovering full periodic-frequent patterns, i.e., finding all patterns that have exhibited complete cyclic repetitions throughout the entire database. These models evaluate the periodic interestingness of a frequent pattern by determining whether all of its inter-arrival times are within the user-specified maxP er threshold. Therefore, the model cannot assess the partial periodic behavior of a frequent pattern in a database. However, partial periodic-frequent patterns are more common due to the imperfect nature of real-world databases.
So, to address the above issues, in this thesis, we are proposing two improved approaches that discover periodic-correlated patterns and partial periodic-frequent patterns in non-uniform temporal databases, respectively. In the first approach, we tackle rare item problem by proposing a improved
model that discovers periodic-correlated patterns in a non-uniform temporal database. In this thesis,
we consider temporal database as a collection of transactions, ordered by their timestamps. Further, a temporal database facilitates multiple transactions to share a common timestamp and allows time-gaps in between consecutive transactions. A temporal database is said to be non-uniform if it contains items with dissimilar support and periodicity. To tackle rare item problem in non-uniform temporal databases, the proposed model considers a pattern as interesting if its support and periodicity are close to that of its individual items. The existing all-confidence measure is used to determine how close is the support of a pattern with respect to the support of its individual items. A new interestingness measure, called periodic-all-confidence, is being proposed to determine how close is the periodicty of a pattern with respect to the periodicity of its individual items. A pattern-growth algorithm has also been discussed to find periodic-correlated patterns. Experimental results show that the proposed model is efficient and tackles rare item problem. We discuss the usefulness of periodic-correlated patterns with a real-world case study on FAA-Accidents database and show that the proposed model may be utilized to discover interesting periodic-correlated patterns involving both frequent and rare items effectively.
In the second approach, we have introduced a improved model to discover partial periodic-frequent
patterns in non-uniform temporal databases. The proposed model lets the user specify a different maximum inter-arrival time (M IAT ) for each item. An inter-arrival time of a pattern is considered periodic (or cyclic) if it is no more than period. Thus, different patterns may satisfy different period depending on their items’ M IAT values. This solves the rare item problem in partial periodic-frequent pat