Abstract
Formerly, locational information and multi-domain geographic data was stored mostly
in the form of maps and merged at the desktop level. It has been less than a decade
since the data is being exchanged through the internet. Data transferability and interoperability
is required for visualizing, analyzing and modelling the data. Spatial
data infrastructure (SDI) is a framework which mainly focuses on bringing together
geospatial data providers and agencies to share data and services thus reducing the
redundancy in data collection and improving the planning and decision making process.
Although, there are multiple data formats for spatial data, Geography mark-up
language (GML), which is an XML based coding standard for geographic data has
come to stay as the standard for these SDIs. GML is mainly used for storage, exchange
and querying of geographic data. The standard GML format for storage is in
text format leading to large data sizes. Also, GIS uses multiple thematic layers for
decision making and visualization. For example, one layer of spatial data showing
the the district boundaries of india with 600 districts is around 16MB. With these
large data sizes, transferring of data becomes cumbersome consuming a lot of both
bandwidth and storage. Client management also becomes difficult. Thus, if the data
size is reduced/compressed both the storage and its management will be improved.
Also, GML data compression will aid in higher information transfer and allow the
users for faster access & quicker visualization.
Recent methods on GML compression have largely focused on the structure of
data and used standard text based compression algorithms like gzip which ignore the topological properties like neighbourhood etc. Due to which, there are high
chances of data re-occurances. Hence, they fail to provide an effective compression.
This compressed data becomes inoperable and the user has to decompress the data
in order to get any information.
This thesis work proposes a topology based compression method called GTrees
approach (multiple and single tree) which exploits both topological and structural
characteristics of the geo-spatial data. In addition, to achieve a lossless compression
and for easy storage and retrieval of the complete data, a tree based data structure
is adapted to represent the coordinates which further helps in easy storage and retrieval.
The experimental results show that these methods applied together achieve
a compression of 50% to 70% for data sets ranging from 244 to 16,000 KBytes.
Another compression method based on the tree structure, where a single two
way tree structure is created to compress the whole file is also proposed here . This
method compresses the coordinate data as well as the attribute data. An overall
compression percentage range of 60-75% is achieved here for the same datasets.
Futher, a query system called GTQueries is developed for querying the compressed
data. Most of the spatial queries are based on topological relationships.
Given that GTrees approach embeds the inherent topological relationships, the queries
can be answered without decompressing the data to get the required information.
For testing and evaluation of the proposed system a seamless desktop based user
interface application has also been developed. The user interface is used for compression
, decompression and viewing of the GML data. For viewing the data, the
compressed or decompressed files are converted into SVG. So, it becomes easier for
the users to do all the operations. Thus, transferring and viewing of the geographic
data becomes simpler.
The work proposed here helps faster data transfer and reduced storage space due
to reduced size of data and thus helps SDI access more data and use it appropriately.
The data can be decompressed and viewed. Spatial queries are also answered using the compressed data providing important information. Thus, the proposed work is
an effort to help improve SDI and its usage when new functionalities are demanded
of it.