Abstract
Accurate localization of sound sources is essential in many acoustic sensing and monitoring applications. In the absence of temporal continuity models, many methods produce unrealistic direction of arrival (DOA) estimates involving sudden changes. To address this, we propose an approach that trains a neural network to predict DOA derivatives in Cartesian coordi- nates (x′, y′, z′), which capture the rate of change in DOA (x, y, z) over time. By combining the predicted DOAs with the predicted derivatives, our method can suppress sudden DOA changes and generate smooth motion trajectories. We introduce an update rule that combines the predicted DOAs with the predicted derivatives to obtain the final DOAs. We validate our approach using the TAU-NIGENS Spatial Sound Events (TNSSE) 2021 dataset. Our results demonstrate that incorporating DOA derivatives improves the accuracy of DOA estimation, particularly in low signal-to- noise ratio scenarios