Recherche et sélection de publications
Interface en ou

Downbeat Detection with Conditional Random Fields and Deep Learned Features

Simon Durand #1, Slim Essid #1
#1 Laboratoire Traitement et Communication de l'Information [Paris] (LTCI)
  • Télécom ParisTech
  • CNRS : UMR5141
References
International Society for Music Information Retrieval (ISMIR), New York City, USA, August 2016, pp. 386-392
Abstract

In this paper, we introduce a novel Conditional Random Field (CRF) system that detects the downbeat sequence of musical audio signals. Feature functions are computed from four deep learned representations based on harmony, rhythm, melody and bass content to take advantage of the high-level and multi-faceted aspect of this task. Downbeats being dynamic, the powerful CRF classification system allows us to combine our features with an adapted temporal model in a fully data-driven fashion. Some meters being under-represented in our training set, we show that data augmentation enables a statistically significant improvement of the results by taking into account class imbalance. An evaluation of different configurations of our system on nine datasets shows its efficiency and potential over a heuristic based approach and four downbeat tracking algo- rithms.

Keywords
Category
Paper in proceedings
Research Area(s)
Statistics/Applications
Computer Science/Machine Learning
Computer Science/Neural and Evolutionary Computing
Computer Science/Sound
Computer Science/Signal and Image Processing
Engineering Sciences/Signal and Image processing
Statistics/Machine Learning
Identifier(s)
Bibliographic key SD:ISMIR-16
File(s)
Export
Last update
on january 19, 2017 by Simon Durand


Responsable du service
Dominique Asselineau dominique.asselineau@telecom-paristech.fr
Copyright © 1998-2017, Télécom ParisTech/Dominique Asselineau