Bayesian framework for multiple acoustic source tracking
dc.contributor.advisor
Hopgood, James R.
en
dc.contributor.advisor
Mulgrew, Bernard
en
dc.contributor.author
Zhong, Xionghu
en
dc.contributor.sponsor
Wing-Yip bursary
en
dc.date.accessioned
2011-02-02T11:24:31Z
dc.date.available
2011-02-02T11:24:31Z
dc.date.issued
2010
dc.description.abstract
Acoustic source (speaker) tracking in the room environment plays an important role in many
speech and audio applications such as multimedia, hearing aids and hands-free speech communication
and teleconferencing systems; the position information can be fed into a higher
processing stage for high-quality speech acquisition, enhancement of a specific speech signal
in the presence of other competing talkers, or keeping a camera focused on the speaker in
a video-conferencing scenario. Most of existing systems focus on the single source tracking
problem, which assumes one and only one source is active all the time, and the state to be estimated
is simply the source position. However, in practical scenarios, multiple speakers may
be simultaneously active, and the tracking algorithm should be able to localise each individual
source and estimate the number of sources. This thesis contains three contributions towards
solutions to multiple acoustic source tracking in a moderate noisy and reverberant environment.
The first contribution of this thesis is proposing a time-delay of arrival (TDOA) estimation
approach for multiple sources. Although the phase transform (PHAT) weighted generalised
cross-correlation (GCC) method has been employed to extract the TDOAs of multiple sources,
it is primarily used for a single source scenario and its performance for multiple TDOA estimation
has not been comprehensively studied. The proposed approach combines the degenerate
unmixing estimation technique (DUET) and GCC method. Since the speech mixtures are assumed
window-disjoint orthogonal (WDO) in the time-frequency domain, the spectrograms can
be separated by employing DUET, and the GCC method can then be applied to the spectrogram
of each individual source. The probabilities of detection and false alarm are also proposed to
evaluate the TDOA estimation performance under a series of experimental parameters.
Next, considering multiple acoustic sources may appear nonconcurrently, an extended Kalman
particle filtering (EKPF) is developed for a special multiple acoustic source tracking problem,
namely “nonconcurrent multiple acoustic tracking (NMAT)”. The extended Kalman filter
(EKF) is used to approximate the optimum weights, and the subsequent particle filtering (PF)
naturally takes the previous position estimates as well as the current TDOA measurements into
account. The proposed approach is thus able to lock on the sharp change of the source position
quickly, and avoid the tracking-lag in the general sequential importance resampling (SIR) PF.
Finally, these investigations are extended into an approach to track the multiple unknown and
time-varying number of acoustic sources. The DUET-GCC method is used to obtain the TDOA
measurements for multiple sources and a random finite set (RFS) based Rao-blackwellised PF
is employed and modified to track the sources. Each particle has a RFS form encapsulating
the states of all sources and is capable of addressing source dynamics: source survival, new
source appearance and source deactivation. A data association variable is defined to depict the
source dynamic and its relation to the measurements. The Rao-blackwellisation step is used
to decompose the state: the source positions are marginalised by using an EKF, and only the
data association variable needs to be handled by a PF. The performances of all the proposed
approaches are extensively studied under different noisy and reverberant environments, and are
favorably comparable with the existing tracking techniques.
en
dc.identifier.uri
http://hdl.handle.net/1842/4752
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
X. Zhong and J. R. Hopgood, “Time-frequency masking based multiple acoustic sources tracking applying rao-blackwellised monte carlo data association,” in Proc. IEEE 15th Workshop on Statistical Signal Processing, pp. 253–256, Aug. 2009.
en
dc.relation.hasversion
X. Zhong and J. Hopgood, “Nonconcurrent multiple speakers tracking based on extended kalman particle filter,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 293–296, 2008.
en
dc.subject
Bayesian filter
en
dc.subject
particle filtering
en
dc.subject
tracking
en
dc.title
Bayesian framework for multiple acoustic source tracking
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Zhong2010.pdf
- Size:
- 15.28 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

