Edinburgh Research Archive

Bayesian framework for multiple acoustic source tracking

dc.contributor.advisor
Hopgood, James R.
en
dc.contributor.advisor
Mulgrew, Bernard
en
dc.contributor.author
Zhong, Xionghu
en
dc.contributor.sponsor
Wing-Yip bursary
en
dc.date.accessioned
2011-02-02T11:24:31Z
dc.date.available
2011-02-02T11:24:31Z
dc.date.issued
2010
dc.description.abstract
Acoustic source (speaker) tracking in the room environment plays an important role in many speech and audio applications such as multimedia, hearing aids and hands-free speech communication and teleconferencing systems; the position information can be fed into a higher processing stage for high-quality speech acquisition, enhancement of a specific speech signal in the presence of other competing talkers, or keeping a camera focused on the speaker in a video-conferencing scenario. Most of existing systems focus on the single source tracking problem, which assumes one and only one source is active all the time, and the state to be estimated is simply the source position. However, in practical scenarios, multiple speakers may be simultaneously active, and the tracking algorithm should be able to localise each individual source and estimate the number of sources. This thesis contains three contributions towards solutions to multiple acoustic source tracking in a moderate noisy and reverberant environment. The first contribution of this thesis is proposing a time-delay of arrival (TDOA) estimation approach for multiple sources. Although the phase transform (PHAT) weighted generalised cross-correlation (GCC) method has been employed to extract the TDOAs of multiple sources, it is primarily used for a single source scenario and its performance for multiple TDOA estimation has not been comprehensively studied. The proposed approach combines the degenerate unmixing estimation technique (DUET) and GCC method. Since the speech mixtures are assumed window-disjoint orthogonal (WDO) in the time-frequency domain, the spectrograms can be separated by employing DUET, and the GCC method can then be applied to the spectrogram of each individual source. The probabilities of detection and false alarm are also proposed to evaluate the TDOA estimation performance under a series of experimental parameters. Next, considering multiple acoustic sources may appear nonconcurrently, an extended Kalman particle filtering (EKPF) is developed for a special multiple acoustic source tracking problem, namely “nonconcurrent multiple acoustic tracking (NMAT)”. The extended Kalman filter (EKF) is used to approximate the optimum weights, and the subsequent particle filtering (PF) naturally takes the previous position estimates as well as the current TDOA measurements into account. The proposed approach is thus able to lock on the sharp change of the source position quickly, and avoid the tracking-lag in the general sequential importance resampling (SIR) PF. Finally, these investigations are extended into an approach to track the multiple unknown and time-varying number of acoustic sources. The DUET-GCC method is used to obtain the TDOA measurements for multiple sources and a random finite set (RFS) based Rao-blackwellised PF is employed and modified to track the sources. Each particle has a RFS form encapsulating the states of all sources and is capable of addressing source dynamics: source survival, new source appearance and source deactivation. A data association variable is defined to depict the source dynamic and its relation to the measurements. The Rao-blackwellisation step is used to decompose the state: the source positions are marginalised by using an EKF, and only the data association variable needs to be handled by a PF. The performances of all the proposed approaches are extensively studied under different noisy and reverberant environments, and are favorably comparable with the existing tracking techniques.
en
dc.identifier.uri
http://hdl.handle.net/1842/4752
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
X. Zhong and J. R. Hopgood, “Time-frequency masking based multiple acoustic sources tracking applying rao-blackwellised monte carlo data association,” in Proc. IEEE 15th Workshop on Statistical Signal Processing, pp. 253–256, Aug. 2009.
en
dc.relation.hasversion
X. Zhong and J. Hopgood, “Nonconcurrent multiple speakers tracking based on extended kalman particle filter,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 293–296, 2008.
en
dc.subject
Bayesian filter
en
dc.subject
particle filtering
en
dc.subject
tracking
en
dc.title
Bayesian framework for multiple acoustic source tracking
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Zhong2010.pdf
Size:
15.28 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)