Abstract
The present day world is becoming more and more information based. At present, the data which is generated and stored at a rapid rate has exceeded the human comprehensibility and also, there is a huge amount of information stored in the databases which is potentially important and needs to be discovered. The process of extraction of previously unknown and potentially useful information from large data sets is known as data mining. It is considered as an increasingly important tool to transform data into useful information giving an informational advantage.
The data which is associated with the time at which it is generated is known as the time series data. It captures the characteristics of events with respect to time. Recently, analyzing time series data has acquired prominence as many of the real world phenomena deal with time series data. In addition, time series data is a valuable resource for many business organizations and financial markets. Furthermore, several emerging applications in the information providing services, such as on-line services and the World Wide Web call for various data mining techniques to better understand the user behavior to increase the business opportunities. In recent times, data mining is used in several domains like natural sciences, environmental study, network monitoring, genomics, financial markets study. In this thesis, we have made two contributions, we proposed an approach for the efficient placement of the banner advertisements for a web site by analyzing the click stream data generated by the web site visitors. In addition, we proposed a framework to identify the changes in the suitable crop cultivation period over the years by analyzing the past temperature values of a region.
E-commerce, web services as well as online advertising can greatly benefit from the insight gained from knowledge discovery of transactional data which include click streams. In an online banner advertising scenario, an advertiser expects that his/her banner advertisement should be displayed to a certain percentage of web site visitors. In this context, to generate more revenue for a given web site, the publisher has to meet the demands of several advertisers by providing appropriate sets of web pages. To help the publishers and advertisers, in this thesis, we propose a model of coverage patterns and a methodology to extract the potential coverage patterns by analyzing the click stream data. Given web pages of a web site, a coverage pattern is a set of web pages visited by a certain percentage of visitors. The proposed approach has the potential to enable the publisher in meeting the demands of several advertisers. The experimental results indicate that the proposed approach is efficient and practical from both publisher and advertiser’s perspective.
In the area of crop production, the crop is cultivated in a certain time period in a year based on the favorability of weather conditions. However, in the recent years there is ongoing debate about the climate change and also its impact on crop productivity. In general, the crop productivity is influenced by weather factors like rainfall, humidity, wind velocity, solar radiation and temperature. Temperature is one of the important factors. For any crop, its productivity is influenced by fluctuations in the temperature values during its cultivation period. In this thesis, an effort has been made to propose a framework to understand the changes in the suitable time period for crop cultivation by analyzing the temperature data. For this, we have introduced a notion called penalty matrix which contains penalty values. A penalty value indicates the negative effect on the crop productivity for the given temperature deviation. Using the proposed framework, we have reported some observations by computing suitability score for the rice crop for two regions by assuming the penalty values. The proposed framework is generic and can be used for any crop and region.