Audio-visual deep learning methods for musical instrument classification and separation

  1. SLIZOVSKAIA, OLGA
unter der Leitung von:
  1. Glòria Haro Ortega Doktorvater/Doktormutter

Universität der Verteidigung: Universitat Pompeu Fabra

Fecha de defensa: 21 von Oktober von 2020

Gericht:
  1. Xavier Giro Nieto Präsident/in
  2. Xavier Serra Casals Sekretär/in
  3. Estefanía Cano López Vocal

Art: Dissertation

Teseo: 634820 DIALNET

Zusammenfassung

In music perception, the information we receive from a visual system and audio system is often complementary. Moreover, visual perception plays an important role in the overall experience of being exposed to a music performance. This fact brings attention to machine learning methods that could combine audio and visual information for automatic music analysis. This thesis addresses two research problems: instrument classification and source separation in the context of music performance videos. A multimodal approach for each task is developed using deep learning techniques to train an encoded representation for each modality. For source separation, we also study two approaches conditioned on instrument labels and examine the influence that two extra sources of information have on separation performance compared with a conventional model. Another important aspect of this work is in the exploration of different fusion methods which allow for better multimodal integration of information sources from associated domains.