Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Chettri, Bhusan
    et al.
    Queen Mary Univ London, Sch EECS, London, England..
    Mishra, Saumitra
    Queen Mary Univ London, Sch EECS, London, England..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH. Queen Mary Univ London, Sch EECS, London, England..
    Analysing the predictions of a CNN-based replay spoofing detection system2018In: 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), IEEE , 2018, p. 92-97Conference paper (Refereed)
    Abstract [en]

    Playing recorded speech samples of an enrolled speaker – “replay attack” – is a simple approach to bypass an automatic speaker ver- ification (ASV) system. The vulnerability of ASV systems to such attacks has been acknowledged and studied, but there has been no research into what spoofing detection systems are actually learning to discriminate. In this paper, we analyse the local behaviour of a replay spoofing detection system based on convolutional neural net- works (CNNs) adapted from a state-of-the-art CNN (LC N NF F T ) submitted at the ASVspoof 2017 challenge. We generate tempo- ral and spectral explanations for predictions of the model using the SLIME algorithm. Our findings suggest that in most instances of spoofing the model is using information in the first 400 milliseconds of each audio instance to make the class prediction. Knowledge of the characteristics that spoofing detection systems are exploiting can help build less vulnerable ASV systems, other spoofing detection systems, as well as better evaluation databases.

  • 2.
    Chettri, Bhusan
    et al.
    Queen Mary Univ London, Sch EECS, London, England..
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH. Queen Mary Univ London, Sch EECS, London, England..
    Benetos, Emmanouil
    Queen Mary Univ London, Sch EECS, London, England..
    ANALYSING REPLAY SPOOFING COUNTERMEASURE PERFORMANCE UNDER VARIED CONDITIONS2018In: 2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) / [ed] Pustelnik, N Ma, Z Tan, ZH Larsen, J, IEEE , 2018Conference paper (Refereed)
    Abstract [en]

    In this paper, we aim to understand what makes replay spoofing detection difficult in the context of the ASVspoof 2017 corpus. We use FFT spectra, mel frequency cepstral coefficients (MFCC) and inverted MFCC (IMFCC) frontends and investigate different back-ends based on Convolutional Neural Networks (CNNs), Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs). On this database, we find that IMFCC frontend based systems show smaller equal error rate (EER) for high quality replay attacks but higher EER for low quality replay attacks in comparison to the baseline. However, we find that it is not straightforward to understand the influence of an acoustic environment (AE), a playback device (PD) and a recording device (RD) of a replay spoofing attack. One reason is the unavailability of metadata for genuine recordings. Second, it is difficult to account for the effects of the factors: AE, PD and RD, and their interactions. Finally, our frame-level analysis shows that the presence of cues (recording artefacts) in the first few frames of genuine signals (missing from replayed ones) influence class prediction.

  • 3.
    Hallström, Eric
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Mossmyr, Simon
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Vegeborn, Victor
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Wedin, Jonas
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    From Jigs and Reels to Schottisar och Polskor: Generating Scandinavian-like Folk Music with Deep Recurrent Networks2019Conference paper (Refereed)
    Abstract [en]

    The use of recurrent neural networks for modeling and generating music has been shown to be quite effective for compact, textual transcriptions of traditional music from Ireland and the UK. We explore how well these models perform for textual transcriptions of traditional music from Scandinavia. This type of music has characteristics that are similar to and different from that of Irish music, e.g., mode, rhythm, and structure. We investigate the effects of different architectures and training regimens, and evaluate the resulting models using three methods: a comparison of statistics between real and generated transcriptions, an appraisal of generated transcriptions via a semi-structured interview with an expert in Swedish folk music, and an ex- ercise conducted with students of Scandinavian folk music. We find that some of our models can generate new tran- scriptions sharing characteristics with Scandinavian folk music, but which often lack the simplicity of real transcrip- tions. One of our models has been implemented online at http://www.folkrnn.org for anyone to try.

  • 4.
    Holzapfel, André
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Coeckelbergh, Mark
    Department of Philosophy, University of Vienna, Vienna, Austria.
    Ethical Dimensions of Music Information Retrieval Technology2018In: Transactions of the International Society for Music Information Retrieval, E-ISSN 2514-3298, Vol. 1, no 1, p. 44-55Article in journal (Refereed)
    Abstract [en]

    This article examines ethical dimensions of Music Information Retrieval (MIR) technology.  It uses practical ethics (especially computer ethics and engineering ethics) and socio-technical approaches to provide a theoretical basis that can inform discussions of ethics in MIR. To help ground the discussion, the article engages with concrete examples and discourse drawn from the MIR field. This article argues that MIR technology is not value-neutral but is influenced by design choices, and so has unintended and ethically relevant implications. These can be invisible unless one considers how the technology relates to wider society. The article points to the blurring of boundaries between music and technology, and frames music as “informationally enriched” and as a “total social fact.” The article calls attention to biases that are introduced by algorithms and data used for MIR technology, cultural issues related to copyright, and ethical problems in MIR as a scientific practice. The article concludes with tentative ethical guidelines for MIR developers, and calls for addressing key ethical problems with MIR technology and practice, especially those related to forms of bias and the remoteness of the technology development from end users.

  • 5.
    Mishra, Saumitra
    et al.
    Queen Mary University of London.
    Stoller, Daniel
    Queen Mary University of London.
    Benetos, Emmanouil
    Queen Mary University of London.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Dixon, Simon
    Queen Mary University of London.
    GAN-Based Generation and Automatic Selection of Explanations for Neural Networks2019Conference paper (Refereed)
    Abstract [en]

    One way to interpret trained deep neural networks (DNNs) is by inspecting characteristics that neurons in the model respond to, such as by iteratively optimising themodelinput(e.g.,animage)tomaximallyactivatespecificneurons. However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow. We introduce a new metricthatusesFr´echetInceptionDistance(FID)toencouragesimilaritybetween model activations for real and generated data. This provides an efficient way to evaluateasetofgeneratedexamplesforeachsettingofhyper-parameters. Wealso propose a novel GAN-based method for generating explanations that enables an efficient search through the input space and imposes a strong prior favouring realistic outputs. We apply our approach to a classification model trained to predict whether a music audio recording contains singing voice. Our results suggest that thisproposedmetricsuccessfullyselectshyper-parametersleadingtointerpretable examples, avoiding the need for manual evaluation. Moreover, we see that examples synthesised to maximise or minimise the predicted probability of singing voice presence exhibit vocal or non-vocal characteristics, respectively, suggesting that our approach is able to generate suitable explanations for understanding concepts learned by a neural network.

  • 6.
    Rodríguez-Algarra, Francisco
    et al.
    Queen Mary University of London.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Dixon, Simon
    Queen Mary University of London.
    Characterising Confounding Effects in Music Classification Experiments through Interventions2019In: Transactions of the International Society for Music Information Retrieval, p. 52-66Article in journal (Refereed)
    Abstract [en]

    We address the problem of confounding in the design of music classification experiments, that is, the inability to distinguish the effects of multiple potential influencing variables in the measurements. Confounding affects the validity of conclusions at many levels, and so must be properly accounted for. We propose a procedure for characterising effects of confounding in the results of music classification experiments by creating regulated test conditions through interventions in the experimental pipeline, including a novel resampling strategy. We demonstrate this procedure on the GTZAN genre collection, which is known to give rise to confounding effects.

  • 7.
    Sturm, Bob
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    What do these 5,599,881 parameters mean?: An analysis of a specific LSTM music transcription model, starting with the 70,281 parameters of its softmax layer2018In: Proceedings of the 6th International Workshop on Musical Metacreation (MUME 2018), 2018Conference paper (Refereed)
    Abstract [en]

    A folk-rnn model is a long short-term memory network (LSTM) that generates music transcriptions. We have evaluated these models in a variety of ways – from statistical analyses of generated transcriptions, to their use in music practice – but have yet to understand how their behaviours precipitate from their parameters. This knowledge is essential for improving such models, calibrating them, and broadening their applicability. In this paper, we analyse the parameters of the softmax output layer of a specific model realisation. We discover some key aspects of the model’s local and global behaviours, for instance, that its ability to construct a melody is highly reliant on a few symbols. We also derive a way to adjust the output of the last hidden layer of the model to attenuate its probability of producing specific outputs.

  • 8.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Ben-Tal, Oded
    Kingston University, UK.
    Let’s Have Another Gan Ainm: An experimental album of Irish traditional music and computer-generated tunes2018Report (Other academic)
    Abstract [en]

    This technical report details the creation and public release of an album of folk music, most which comes from material generated by computer models trained on transcriptions of traditional music of Ireland and the UK.For each computer-generated tune appearing on the album, we provide below the original version and the alterations made.

  • 9.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Ben-Tal, Oded
    Kingston University, UK.
    Monaghan, Úna
    Cambridge University, UK.
    Collins, Nick
    Durham University, UK.
    Herremans, Dorien
    University of Technology and Design, Singapore.
    Chew, Elaine
    Queen Mary University of London, UK.
    Hadjeres, Gäetan
    Sony CSL, Paris.
    Deruty, Emmanuel
    Sony CSL, Paris.
    Pachet, François
    Spotify, Paris.
    Machine Learning Research that Matters for Music Creation: A Case StudyIn: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027Article in journal (Refereed)
    Abstract [en]

    Research applying machine learning to music modeling and generation typically proposes model architectures, training methods and datasets, and gauges system performance using quantitative measures like sequence likelihoods and/or qualitative listening tests. Rarely does such work explicitly question and analyse its usefulness for and impact on real-world practitioners, and then build on those outcomes to inform the development and application of machine learning. This article attempts to do these things for machine learning applied to music creation. Together with practitioners, we develop and use several applications of machine learning for music creation, and present a public concert of the results. We reflect on the entire experience to arrive at several ways of advancing these and similar applications of machine learning to music creation.

  • 10.
    Sturm, Bob
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Iglesias, Maria
    Joint Research Centre, European Commission.
    Ben-Tal, Oded
    Kingston University.
    Miron, Marius
    Joint Research Centre, European Commission.
    Gómez, Emilia
    Joint Research Centre, European Commission.
    Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis2019In: MDPI Arts, ISSN 2076-0752, Vol. 8, no 3, article id 115Article in journal (Refereed)
    Abstract [en]

    The application of artificial intelligence (AI) to music stretches back many decades, and presents numerous unique opportunities for a variety of uses, such as the recommendation of recorded music from massive commercial archives, or the (semi-)automated creation of music. Due to unparalleled access to music data and effective learning algorithms running on high-powered computational hardware, AI is now producing surprising outcomes in a domain fully entrenched in human creativity—not to mention a revenue source around the globe. These developments call for a close inspection of what is occurring, and consideration of how it is changing and can change our relationship with music for better and for worse. This article looks at AI applied to music from two perspectives: copyright law and engineering praxis. It grounds its discussion in the development and use of a specific application of AI in music creation, which raises further and unanticipated questions. Most of the questions collected in this article are open as their answers are not yet clear at this time, but they are nonetheless important to consider as AI technologies develop and are applied more widely to music, not to mention other domains centred on human creativity.

1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf