Abstract
Robot localization is the problem of estimating a robot’s coordinates relative to an external reference
frame, say a map. Given a map of its environment, the robot has to localize itself relative to this map by
consulting its sensor data. The task of the robot is to find out where it is, through sensing and motion.
Localization has proved to be the most critical aspect of mobile robot navigation. There are different
forms of the localization problem: Position tracking, which seeks to compensate small dead reckoning
errors (odometry errors) under the assumption that the initial robot pose is known, and global self-
localization, which addresses the problem of localization with no apriori information i.e. a robot is not
told its initial pose but instead has to determine it from scratch.
In environments which possess relatively few features that enable a robot to unambiguously deter-
mine its location, global localization algorithms can result in multiple hypotheses about the location of
a robot. This is inevitable as the local environment seen by a robot repeats at several parts of the map.
Such a situation occurs commonly in indoor navigation such as in corridors and office rooms. For effec-
tive localization the robot has to be actively guided to those locations where there is a maximum chance
of eliminating most of the ambiguous states. This task is often referred to as ‘active localization’. Active
localization is based on the decision of ‘where to move’ and ‘where to look’ so as to best localize the
robot. These actions mainly comprise of moving towards unique features in the map (e.g., distinct obsta-
cles in a static environment), such that the sensor readings become unique and hence the robot localizes.
This thesis presents a learning framework for implementing active localization. In a given map, the
framework should be able to determine what actions to execute in a state, such that localization is ac-
celerated. Since global localization poses the problem of perceptual aliasing, a belief state instead of a
state is used to describe a robot’s position in the map. A belief-state MDP (Markov Decision Process)
is used to capture the mapping between the robot’s belief state(s) and action(s) which will maximize the expected discounted rewards. The novelty of this work lies in the fact that it proposes a hierarchical
belief state framework for the MDP to perform the active localization task. Markov localization is used
typically to estimate a belief-state from a given set of robot’s sensor readings. This belief-state space
being extensive is represented by a yet another hierarchical belief-state space, which makes the MDP
more tractable. Hence the framework enables to perform localization with accuracy of fine grid granu-
larity ( as in Markov localization) without having to handle large state space MDP(s). A mathematical
formulation in detail for the framework has been provided.
The feasibility of this framework is shown by testing it for different aspects of active localization task
. It is used to perform multi-robot active localization, where the framework not only learns to seek out
unique features in the map, but also learns to detect other robots, thereby facilitating localization even
in the absence of map features. It is observed that multi-robot localization algorithm proves to be an
intuitive, semi-distributed algorithm which is able to attain an autonomy with respect to the number of
robots. That is to say, unlike a centralized algorithm that requires re-calculation from the scratch with
the change in the number of agents involved, this algorithm can be re-used without re-training for a
range of number of robots. The analysis of localization performance is done by comparing it with a
learning framework which uses random walk.
The framework is also used to perform transfer of the learned localization policy from a robot with
higher sensor capabilities to the one with the lower sensor capabilities, thereby enabling the possibil-
ity of accelerating localization in a setup with heterogenous robots. The change in the sensor range
is considered as the change in observation function in MDP context and a transfer policy is designed
to transfer the knowledge between the two belief states MDP(s). The localization performance with
transfer is analyzed by comparing with performance , when learning is done from the scratch without
any transfer. It is observed that transfer leads to a better learning rate as the performance reaches an
optimum in lower number of training iterations. One of the observation also shows the effect of map
type on transfer and suggests the fact that the advantages of transfer is more pre-dominant in the case
of sparse maps than the dense maps. The contribution here lies in attempting transfer across belief
states MDP(s), since transfer has been limited to just MDP(s) till now. Also the transfer aspect is