Language of music: a computational model of music interpretation

McLeod, Andrew Philip

Language of music: a computational model of music interpretation

Simple item page

dc.contributor.advisor

Steedman, Mark

en

dc.contributor.advisor

King, Simon

en

dc.contributor.author

McLeod, Andrew Philip

en

dc.date.accessioned

2018-07-19T12:04:52Z

dc.date.available

2018-07-19T12:04:52Z

dc.date.issued

2018-07-02

dc.description.abstract

Automatic music transcription (AMT) is commonly defined as the process of converting an acoustic musical signal into some form of musical notation, and can be split into two separate phases: (1) multi-pitch detection, the conversion of an audio signal into a time-frequency representation similar to a MIDI file; and (2) converting from this time-frequency representation into a musical score. A substantial amount of AMT research in recent years has concentrated on multi-pitch detection, and yet, in the case of the transcription of polyphonic music, there has been little progress. There are many potential reasons for this slow progress, but this thesis concentrates on the (lack of) use of music language models during the transcription process. In particular, a music language model would impart to a transcription system the background knowledge of music theory upon which a human transcriber relies. In the related field of automatic speech recognition, it has been shown that the use of a language model drawn from the field of natural language processing (NLP) is an essential component of a system for transcribing spoken word into text, and there is no reason to believe that music should be any different. This thesis will show that a music language model inspired by NLP techniques can be used successfully for transcription. In fact, this thesis will create the blueprint for such a music language model. We begin with a brief overview of existing multi-pitch detection systems, in particular noting four key properties which any music language model should have to be useful for integration into a joint system for AMT: it should (1) be probabilistic, (2) not use any data a priori, (3) be able to run on live performance data, and (4) be incremental. We then investigate voice separation, creating a model which achieves state-of-the-art performance on the task, and show that, used as a simple music language model, it improves multi-pitch detection performance significantly. This is followed by an investigation of metrical detection and alignment, where we introduce a grammar crafted for the task which, combined with a beat-tracking model, achieves state-of-the-art results on metrical alignment. This system’s success adds more evidence to the long-existing hypothesis that music and language consist of extremely similar structures. We end by investigating the joint analysis of music, in particular showing that a combination of our two models running jointly outperforms each running independently. We also introduce a new joint, automatic, quantitative metric for the complete transcription of an audio recording into an annotated musical score, something which the field currently lacks.

en

dc.identifier.uri

http://hdl.handle.net/1842/31371

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

McLeod, A., Schramm, R., Steedman, M., & Benetos, E. (2017). Automatic Transcription of Polyphonic Vocal Music. Applied Sciences, 7(12).

en

dc.relation.hasversion

McLeod, A., & Steedman, M. (2016, January). HMM-based voice separation of MIDI performance. Journal of New Music Research, 45(1), 17–26.

en

dc.relation.hasversion

McLeod, A., & Steedman, M. (2017). Meter detection in symbolic music using a lexicalized PCFG. In SMC (pp. 373–379).

en

dc.relation.hasversion

McLeod, A., & Steedman, M. (2018). Evaluating automatic polyphonic music transcription. In ISMIR.

en

dc.relation.hasversion

McLeod, A., & Steedman, M. (2018). Meter detection and alignment of MIDI performance. In ISMIR.

en

dc.relation.hasversion

Schramm, R., McLeod, A., Steedman, M., & Benetos, E. (2017). Multi-pitch detection and voice assignment for a cappella recordings of multiple singers. In ISMIR (pp. 552–559).

en

dc.subject

music information retrieval

en

dc.subject

automatic music transcription

en

dc.subject

music language modelling

en

dc.title

Language of music: a computational model of music interpretation

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: McLeod2018.pdf
Size:: 1.41 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection