Abstract
Real world phenomena, in various domains including climatology, ecology, environment(natural); diseases, theft, urbanization (man-made), are largely dynamic and their evolution over space and time is central to our understanding of the science behind them, including the processes. Such phenomena have agreat impact in the regions where they occur raising a need to assess their behavior – regions where they occur, the duration of occurrence, extent of influence etc., to help manage them suitably. Current approaches focus on collection of snapshots of data at certain intervals of time. Though this approach provides an insight into the state of the event at given times and how changes occur from one snapshot to other, it is inadequate to explain and analyze the dynamic processes.
A significant amount of work has been done in spatio-temporal change analysis such as detecting land use change as part of global and regional environmental change. While trying to find major changes, we tend to ignore places with minimal changes, whose invariant behavior might give important clues about the process under study. For instance, the localized behavior of climate, especially rainfall, has been well investigated in recent years to understand the dynamics of such a phenomenon with varied outcomes. It is observed that to better understand the controlling and regulating factors, in addition to identifying or locating the change and its quantum, there is also a need to understand the invariant behavior that the phenomena exhibits. Therefore, this study focuses on the detection of these invariant regions for a given phenomenon.A region can be a single or a set of representative points, with a defined property. Given the dynamic nature, this information can be really useful not only to understand the phenomenon better but can be used as reference points for sampling/interpolation.
In this thesis, we propose MiSTIC, a data format generic method based on watershed delineation, neighborhood analysis and frequent item mining to identify the set of spatially distributed regions highly influenced by a phenomenon over a period of time.The method involves both a spatial analysis step to detect focal points(representative points) and a spatio-temporal analysis over the entire data time period to identify Core regions orCores. As the nature of the cores is affected by the neighborhood in which it occurs, to understand their behavior, cores are classified as Cores with Contiguous points (CC) and Cores with defined Radius (CR). Further, the cores are sub-classified to determine the level of spatio-temporal invariance (confinement in space over time) by considering the frequency of occurrence of the focal points that constitute it. These are - Cores with Highly Dominating points (CHD), Cores with Less Dominating points (CLD) and Cores with No Dominating points (CND).
Therefore, the frequent/predominantly occurring focal points capture the localized invariant behavior of a dynamic phenomenon whereas the neighborhood constraints capture the extent/degree of dynamic nature and direction of influence of the phenomenon, if any.
The method proposed in this paper tries to detect such regions for all domains – climatology, environment, theft, diseases to name a few. The results are extracted by using location, time and attribute value (investigates where, when and what). For every domain, these invariant regions signify that there exist places where dynamic phenomenon is highly concentrated. Such detection can help in taking appropriate measures tounderstand and handle the phenomena. For instance, in case ofhigh occurrence of events with negative impacts like storms, theft, diseases, etc., detecting such localized occurrences can help better manage them. In addition, this study provides an insight into the area of influence of the phenomenon for a given time period.
To demonstrate the generality of the proposed method and its applicability, a case study on Indian Monsoon rainfall and another on Theft in NSW, Australia has been chosen. The two datasets used in the two case studies have different time periods, different set of points for analysis and different data formats. It has been observed that the analysis in prior work on spatial data mining is restricted to a particular data format - either raster or vector. Though many suggest that their proposed approach will work for both the cases, it is not been well demonstrated. While in our study, it is clearly seen the method can be applied to a wide range of domains irrespective of the data format – Monsoon rainfall is captured in raster format whereas Theft pattern is analyzed for NSW LGAs which is polygonal data. In addition to this, ourstudy also highlights the use of topological distances and directionality while extracting relationships among entities based on an attribute varying across space and time. The use of topological relationships overcomes the need for data to be