Abstract
With the daily expansion of computer network and internet applications, the risk of network
related crimes are also increasing [1]. Most of the present day intrusion protection systems,
security applications and antivirus software aim at detecting, and a few preventing, the damage
done to end systems, at the premises where they are installed. In today’s world of rapidly paced
networks, we need efficient techniques to identify the source of attacks so as to completely
eliminate the malicious systems and the attacks caused by them which can benefit the entire
network. Network Forensics includes the field of study which deals with techniques that aims
at identifying the source of the attack. The most important part in the field of cybercrime
investigation is the need to capture and store data for determining potential legal evidence and
at the same time safeguard the privacy of the data plying through the network.
Packet attribution systems have been used to perform trace back of the packet in the internet
to its source. At the packet header level, determining the source machine from the source IP
address is not an effective solution, as IP address can be altered. In order to perform trace
back based on packet payload, payload attribution has to be performed at every network device
through which packet traverses. This concept of payload attribution was introduced by Shan-
mugasundaram et al. [2, 3]. From now throughout the remaining paper let us refer the content
of packets to be queried for trace back as “excerpt”. During trace back, the excerpt has to be
matched with the stored packet payload at every network device and determine the path taken
by the packets. But storing the entire payload at every network device is not a practical solution
due to storage and privacy concerns. Various efficient techniques based on bloom filter and its
variations have been proposed until now [4].
Source Path Isolation Engine (SPIE) [5] is a payload attribution system which uses space
efficient bloom filters for hashing and storing packet payload. SPIE performs repeated hashing
of packets, including the invariant portion of the packet header and a part of packet payload
and inserts them in Bloom Filters. These compact filters are then stored at the routers. When
all routers implement SPIE, it is possible to trace back the packet to the source. The problem
with SPIE is to store the packet with its non-mutable headers and a part of the payload and
reconstruct the packets during trace back querying [2]. This issue is overcome by the concept
of Block Bloom Filters (BBF), where the packet payload is split into blocks of certain size
and hashes performed on these blocks are stored in the bloom filters. This technique has to
consider all possible offset values and hence the false positive probability (FPP) increases [2].
The value of FPP can be reduced by performing attribution of hierarchies of blocks, a concept
named as Hierarchical Bloom Filters (HBF) [2]. But the practical implementation of these
concepts on all the routers installed in different networks and network levels increases burden
on these routers in terms of processing and storage size.
In this thesis, we propose a flexible environment for the implementation of these concepts
and increase its practicality in the present day network devices. We put forward our analysis
on certain parameters over which system administrators can analyze and take final decision
on implementing these techniques on the network devices. To trace back the packet from the
destination to the source, all routers in the path of the network have to implement these payload
attribution mechanisms. Imposing the same implementation rules and constraints on all these
routers is not practical owing to their difference in their capacity of processing and storage
capabilities. Depending on the capabilities of the network device, parameters like block size,
false positive probability can be adjusted to suite the situation. We have prototyped these con-
cepts as software and provided experimental results for processing and storage requirements,
by varying these parameters.
In the later section of the thesis, we have proposed the use of attribution techniques with only
the control packets that ply on the network. This information can be used to trace back vital
network control information among routers like network access, session control, trafficking,
etc. Our experimental results show that this approach is extremely efficient in terms of storage
size and processing speeds. We then discuss our analysis on possible heterogeneous approaches
which combine the idea of attributing control packets and data packets individually at every
router with variable parameters to configure the bloom filter used for forensic analysis.