Digitala Vetenskapliga Arkivet

Change search
Refine search result
12345 1 - 50 of 206
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Abrahamsson, M.
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Subglottal pressure variation in actors’ stage speech2007In: Voice and Gender Journal for the Voice and Speech Trainers Association / [ed] Rees, M., VASTA Publishing , 2007, p. 343-347Chapter in book (Refereed)
  • 2. Alku, Paavo
    et al.
    Airas, Matti
    Björkner, Eva
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    An amplitude quotient based method to analyze changes in the shape of the glottal pulse in the regulation of vocal intensity2006In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 120, no 2, p. 1052-1062Article in journal (Refereed)
    Abstract [en]

    This study presents an approach to visualizing intensity regulation in speech. The method expresses a voice sample in a two-dimensional space using amplitude-domain values extracted from the glottal flow estimated by inverse filtering. The two-dimensional presentation is obtained by expressing a time-domainmeasure of the glottal pulse, the amplitude quotient (AQ), as a function of the negative peak amplitude of the flow derivative (d(peak)). The regulation of vocal intensity was analyzed with the proposed method from voices varying from extremely soft to very loud with a SPL range of approximately 55 dB. When vocal intensity was increased, the speech samples first showed a rapidly decreasing trend as expressed on the proposed AQ-d(peak) graph. When intensity was further raised, the location of the samples converged toward a horizontal line, the asymptote of a hypothetical hyperbola. This behavior of the AQ-d(peak) graph indicates that the intensity regulation strategy changes from laryngeal to respiratory mechanisms and the method chosen makes it possible to quantify how control mechanisms underlying the regulation of vocal intensity change gradually between the two means. The proposed presentation constitutes an easy-to-implement method to visualize the function of voice production in intensity regulation because the only information needed is the glottal flow wave form estimated by inverse filtering the acoustic speech pressure signal.

  • 3. Baker, C. P.
    et al.
    Sundberg, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Purdy, S. C.
    Rakena, T. O.
    Female adolescent singing voice characteristics: an exploratory study using LTAS and inverse filtering2022In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, p. 1-13Article in journal (Refereed)
    Abstract [en]

    Background and Aim: To date, little research is available that objectively quantifies female adolescent singing-voice characteristics in light of the physiological and functional developments that occur from puberty to adulthood. This exploratory study sought to augment the pool of data available that offers objective voice analysis of female singers in late adolescence. Methods: Using long-term average spectra (LTAS) and inverse filtering techniques, dynamic range and voice-source characteristics were determined in a cohort of vocally healthy cis-gender female adolescent singers (17 to 19 years) from high-school choirs in Aotearoa New Zealand. Non-parametric statistics were used to determine associations and significant differences. Results: Wide intersubject variation was seen between dynamic range, spectral measures of harmonic organisation (formant cluster prominence, FCP), noise components in the spectrum (high-frequency energy ratio, HFER), and the normalised amplitude quotient (NAQ) suggesting great variability in ability to control phonatory mechanisms such as subglottal pressure (Psub), glottal configuration and adduction, and vocal tract shaping. A strong association between the HFER and NAQ suggest that these non-invasive measures may offer complimentary insights into vocal function, specifically with regard to glottal adduction and turbulent noise in the voice signal. Conclusion: Knowledge of the range of variation within healthy adolescent singers is necessary for the development of effective and inclusive pedagogical practices, and for vocal-health professionals working with singers of this age. LTAS and inverse filtering are useful non-invasive tools for determining such characteristics. 

  • 4. Baker, C. P.
    et al.
    Sundberg, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Purdy, S. C.
    Rakena, T. O.
    Leão, S. H. D. S.
    CPPS and Voice-Source Parameters: Objective Analysis of the Singing Voice2024In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 38, no 3, p. 549-560Article in journal (Refereed)
    Abstract [en]

    Introduction: In recent years cepstral analysis and specific cepstrum-based measures such as smoothed cepstral peak prominence (CPPS) has become increasingly researched and utilized in attempts to determine the extent of overall dysphonia in voice signals. Yet, few studies have extensively examined how specific voice-source parameters affect CPPS values. Objective: Using a range of synthesized tones, this exploratory study sought to systematically analyze the effect of fundamental frequency (fo), vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental on CPPS values. Materials and Methods: A series of scales were synthesised using the freeware Madde. Fundamental frequency, vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental were systematically and independently varied. The tones were analysed in PRAAT, and statistical analyses were conducted in SPSS. Results: CPPS was significantly affected by both fo and source-spectrum tilt, independently. A nonlinear association was seen between vibrato extent and CPPS, where CPPS values increased from 0 to 0.6 semitones (ST), then rapidly decreased approaching 1.0 ST. No relationship was seen between the amplitude of the voice-source fundamental and CPPS. Conclusion: The large effect of fo should be taken into account when analyzing the voice, particularly in singing-voice research, when comparing pre and posttreatment data, and when comparing inter-subject CPPS data. 

  • 5. Baker, Calvin P.
    et al.
    Sundberg, Johan
    Stockholm University, Faculty of Humanities, Department of Linguistics. KTH (Royal Institute of Technology), Sweden; University College of Music Education Stockholm, Sweden.
    Purdy, Suzanne C.
    Rakena, Te Oti
    Female adolescent singing voice characteristics: an exploratory study using LTAS and inverse filtering2024In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 49, no 2, p. 83-95Article in journal (Refereed)
    Abstract [en]

    Background and Aim: To date, little research is available that objectively quantifies female adolescent singing-voice characteristics in light of the physiological and functional developments that occur from puberty to adulthood. This exploratory study sought to augment the pool of data available that offers objective voice analysis of female singers in late adolescence.

    Methods: Using long-term average spectra (LTAS) and inverse filtering techniques, dynamic range and voice-source characteristics were determined in a cohort of vocally healthy cis-gender female adolescent singers (17 to 19 years) from high-school choirs in Aotearoa New Zealand. Non-parametric statistics were used to determine associations and significant differences.

    Results: Wide intersubject variation was seen between dynamic range, spectral measures of harmonic organisation (formant cluster prominence, FCP), noise components in the spectrum (high-frequency energy ratio, HFER), and the normalised amplitude quotient (NAQ) suggesting great variability in ability to control phonatory mechanisms such as subglottal pressure (Psub), glottal configuration and adduction, and vocal tract shaping. A strong association between the HFER and NAQ suggest that these non-invasive measures may offer complimentary insights into vocal function, specifically with regard to glottal adduction and turbulent noise in the voice signal.

    Conclusion: Knowledge of the range of variation within healthy adolescent singers is necessary for the development of effective and inclusive pedagogical practices, and for vocal-health professionals working with singers of this age. LTAS and inverse filtering are useful non-invasive tools for determining such characteristics.

  • 6. Baker, Calvin P.
    et al.
    Sundberg, Johan
    Stockholm University, Faculty of Humanities, Department of Linguistics. KTH (Royal Institute of Technology), Sweden; University College of Music Education, Sweden.
    Purdy, Suzanne C.
    Rakena, Te Oti
    de S. Leão, Sylvia H.
    CPPS and Voice-Source Parameters: Objective Analysis of the Singing Voice2024In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 38, no 3, p. 549-560Article in journal (Refereed)
    Abstract [en]

    Introduction. In recent years cepstral analysis and specific cepstrum-based measures such as smoothed cepstral peak prominence (CPPS) has become increasingly researched and utilized in attempts to determine the extent of overall dysphonia in voice signals. Yet, few studies have extensively examined how specific voice-source parameters affect CPPS values.

    Objective. Using a range of synthesized tones, this exploratory study sought to systematically analyze the effect of fundamental frequency (fo), vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental on CPPS values.

    Materials and Methods. A series of scales were synthesised using the freeware Madde. Fundamental frequency, vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental were systematically and independently varied. The tones were analysed in PRAAT, and statistical analyses were conducted in SPSS.

    Results. CPPS was significantly affected by both fo and source-spectrum tilt, independently. A nonlinear association was seen between vibrato extent and CPPS, where CPPS values increased from 0 to 0.6 semitones (ST), then rapidly decreased approaching 1.0 ST. No relationship was seen between the amplitude of the voice-source fundamental and CPPS.

    Conclusion. The large effect of fo should be taken into account when analyzing the voice, particularly in singing-voice research, when comparing pre and posttreatment data, and when comparing inter-subject CPPS data. 

  • 7. Baptista La, Filipa Martins
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Pregnancy and the Singing Voice: Reports From a Case Study2012In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 26, no 4, p. 431-439Article in journal (Refereed)
    Abstract [en]

    Objectives. Significant changes in body tissues occur during pregnancy; however, literature concerning the effects of pregnancy on the voice is sparse, especially concerning the professional classically trained voice. Hypotheses. Hormonal variations and associated bodily changes during pregnancy affect phonatory conditions, such as vocal fold motility and glottal adduction. Design. Longitudinal case study with a semiprofessional classically trained singer. Methods. Audio, electrolaryngograph, oral pressure, and air flow signals were recorded once a week during the last 12 weeks of pregnancy, 48 hours after birth and during the following consecutive 11 weeks. Vocal tasks included diminuendo sequences of the syllable /pae/sung at various pitches, and performing a Lied. Phonation threshold pressures (PTPs) and collision threshold pressures (CTPs), normalized amplitude quotient (NAQ), alpha ratio, and the dominance of the voice source fundamental were determined. Concentrations of sex female steroid hormones were measured on three occasions. A listening test of timbral brightness and vocal fatigue was carried out. Results. Results demonstrated significantly elevated concentrations of estrogen and progesterone during pregnancy, which were considerably reduced after birth. During pregnancy, CTPs and PTPs were high; and NAQ, alpha ratio, and dominance of the voice source fundamental suggested elevated glottal adduction. In addition, a perceptible decrease of vocal brightness was noted. Conclusions. The elevated CTPs and PTPs during pregnancy suggest reduced vocal fold motility and increased glottal adduction. These changes are compatible with expected effects of elevated concentrations of estrogen and progesterone on tissue viscosity and water retention.

  • 8. Björklund, Staffan
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. University College of Music Education, Sweden.
    Relationship Between Subglottal Pressure and Sound Pressure Level in Untrained Voices2016In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 30, no 1, p. 15-20Article in journal (Refereed)
    Abstract [en]

    Objectives Subglottal pressure (Ps) is strongly correlated with sound pressure level (SPL) and is easy to measure by means of commonly available equipment. The SPL/Ps ratio is strongly dependent on the efficiency of the phonatory apparatus and should be of great relevance to clinical practice. However, published normative data are still missing. Method The subjects produced sequences of the syllable [pæ], and Ps was measured as the oral pressure during the [p] occlusion. The Ps to SPL relationship was determined at four pitches produced by 16 female and 15 male healthy voices and analyzed by means of regression analysis. Average correlation between Ps and SPL, average SPL produced with a Ps of 10 cm H2O, and average SPL increase produced by a doubling of Ps were calculated for the female and for the male subjects. The significance of sex and pitch conditions was analyzed by means of analysis of variance (ANOVA). Results Pitch was found to be an insignificant condition. The average correlation between Ps and SPL was 0.83 and did not differ significantly between the female and male subjects. In female and male subjects, Ps = 10 cm H2O produced 78.1 dB and 80.0 dB SPL at 0.3 m, and a doubling of Ps generated 11.1 dB and 9.3 dB increase of SPL. Both these gender differences were statistically significant. Conclusions The relationship between Ps and SPL can be reliably established from series of repetitions of the syllable [pæ] produced with a continuously changing degree of vocal loudness. Male subjects produce slightly higher SPL for a given pressure than female subjects but gain less for a doubling of Ps. As these relationships appear to be affected by phonation type, it seems possible that in the future, the method can be used for documenting degree of phonatory hypofunction and hyperfunction.

  • 9.
    Björkner, Eva
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Alku, P.
    Subglottal pressure and NAQ variation in voice production of classically trained baritone singers2005In: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 2005, p. 1057-1060Conference paper (Refereed)
    Abstract [en]

    The subglottal pressure (Ps) and voice source characteristics of five professional baritone singers were analyzed. Glottal adduction was estimated with amplitude quotient (AQ), defined as the ratio between peak-to-peak pulse amplitude and the negative peak of the differentiated flow glottogram, and with normalized amplitude quotient (NAQ), defined as AQ divided by fundamental period length. Previous studies show that NAQ and its variation with Ps represent an effective parameter in the analysis of voice source characteristics. Therefore, the present study aims at increasing our knowledge of these two parameters further by finding out how they vary with pitch and Ps in operatic baritone singers, singing at high and low pitch. Ten equally spaced Ps values were selected from three takes of the syllable [pae], repeated with a continuously decreasing vocal loudness and initiated at maximum vocal loudness. The vowel sounds following the selected Ps peaks were inverse filtered. Data on peak-to-peak pulse amplitude, maximum flow declination rate, AQ and NAQ will be presented.

  • 10.
    Björkner, Eva
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Cleveland, T
    Stone, E
    Voice source differences between registers in female musical theater singers2006In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 20, no 2, p. 187-197Article in journal (Refereed)
    Abstract [en]

    Musical theater singing typically requires women to use two vocal registers. Our investigation considered voice source and subglottal pressure P-s characteristics of the speech pressure signal recorded for a sequence of /pae/ syllables sung at constant pitch and decreasing vocal loudness in each register by seven female musical theater singers. Ten equally spaced P-s values were selected, and the relationships between P-s and several parameters were examined; closed-quotient (Q(closed)), peak-to-peak pulse amplitude (Up-t-p), amplitude of the negative peak of the differentiated flow glottogram. ie, the maximum flow declination rate (MFDR), and the normalized amplitude quotient (NAQ) [Up-t-p/(TO*MFDR)], where TO is the fundamental period. P, was typically slightly higher in chest than in head register. As P, influences the measured glottogram parameters, these were also compared at an approximately identical P-s of 11 cm H2O. Results showed that for typical tokens, MFDR and Q(closed) were significantly greater, whereas Up-t-p and therefore NAQ were significantly lower in chest than in head.

  • 11.
    Björkner, Eva
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Cleveland, T.
    Stone, R. E.
    Voice source register differences in female musical theatre singers2004In: Proc Baltic-Nordic Acoustics Meeting 2004, BNAM04, Mariehamn, 2004Conference paper (Refereed)
  • 12.
    Björkner, Eva
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Cleveland, Tom
    Vanderbilt Voice Center, Dept. of Otolaryngology, Vanderbilt University Medical Center, Nashville.
    Stone, R E
    Vanderbilt Voice Center, Dept. of Otolaryngology, Vanderbilt University Medical Center, Nashville.
    Voice source characteristics in different registers in classically trained female musical theatre singers2004In: Proceedings of ICA 2004 : the 18th International Congress on Acoustics, Kyoto International Conference Hall, 4-9 April, Kyoto, Japan: acoustical science and technology for quality of life, Kyoto, Japan, 2004, p. 297-300Conference paper (Refereed)
    Abstract [en]

    Musical theatre singing requires the use of twovocal registers in the female voice. The voice source and subglottal pressure Pscharacteristics of these registers are analysed by inverse filtering. The relationship between Psand closed quotient Qclosed, peak-to-peak pulse amplitude Up-t-p, maximum flow declination rate MFDR and the normalised amplitude quotient NAQ were examined. Pswastypically slightly higher in chest than in head register . For typical tokens MFDR and Qclosed were significantly greater while NAQ and Up-t-p were significantly lower in chest than in head.

  • 13. Borch, D. Zangger
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Some Phonatory and Resonatory Characteristics of the Rock, Pop, Soul, and Swedish Dance Band Styles of Singing2011In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 25, no 5, p. 532-537Article in journal (Refereed)
    Abstract [en]

    This investigation aims at describing voice function of four nonclassical styles of singing, Rock, Pop, Soul, and Swedish Dance Band. A male singer, professionally experienced in performing in these genres, sang representative tunes, both with their original lyrics and on the syllable /pae/. In addition, he sang tones in a triad pattern ranging from the pitch Bb2 to the pitch C4 on the syllable /pae/in pressed and neutral phonation. An expert panel was successful in classifying the samples, thus suggesting that the samples were representative of the various styles. Subglottal pressure was estimated from oral pressure during the occlusion for the consonant [p]. Flow glottograms were obtained from inverse filtering. The four lowest formant frequencies differed between the styles. The mean of the subglottal pressure and the mean of the normalized amplitude quotient (NAQ), that is, the ratio between the flow pulse amplitude and the product of period and maximum flow declination rate, were plotted against the mean of fundamental frequency. In these graphs, Rock and Swedish Dance Band assumed opposite extreme positions with respect to subglottal pressure and mean phonation frequency, whereas the mean NAQ values differed less between the styles.

  • 14.
    Bresin, Roberto
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Director musices: The KTH performance rules system2002In: Proceedings of SIGMUS-46, Information Processing Society of Japan, , 2002, p. 43-48Conference paper (Refereed)
    Abstract [en]

    Director Musices is a program that transforms notated scores into musical performances. It implements the performance rules emerging from research projects at the Royal Institute of Technology (KTH). Rules in the program model performance aspects such as phrasing, articulation, and intonation, and they operate on performance variables such as tone, inter-onset duration, amplitude, and pitch. By manipulating rule parameters, the user can act as a metaperformer controlling different feature of the performance, leaving the technical execution to the computer. Different interpretations of the same piece can easily be obtained. Features of Director Musices include MIDI file input and output, rule palettes, graphical display of all performance variables (along with the notation), and userdefined performance rules. The program is implemented in Common Lisp and is available free as a stand-alone application both for Macintosh and Windows platforms. Further information, including music examples, publications, and the program itself, is located online at http://www.speech.kth.se/music/performance. This paper is a revised and updated version of a previous paper published in the Computer Music Journal in year 2000 that was mainly written by Anders Friberg (Friberg, Colombo, Frydén and Sundberg, 2000). 

    Download full text (pdf)
    fulltext
  • 15.
    Carlson, Rolf
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Speech and music performance. Parallels and contrasts1987In: STL-QPSR, Vol. 28, no 4, p. 007-023Article in journal (Other academic)
    Download full text (pdf)
    fulltext
  • 16.
    Carlson, Rolf
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Speech and music performance: parallels and contrasts1989In: Contemporary Music Review, ISSN 0749-4467, E-ISSN 1477-2256, Vol. 4, p. 389-402Article in journal (Refereed)
    Abstract [en]

    Speech and music performance are two important systems for interhuman communication by means of acoustic signals. These signals must be adapted to the human perceptual and cognitive systems. Hence a comparitive analysis of speech and music performances is likely to shed light on these systems, particularly regarding basic requirements for acoustic communication. Two computer programs are compared, one for text-to-speech conversion and one for note-to-tone conversion. Similarities are found in the need for placing emphasis on unexpected elements, for increasing the dissimilarities between different categories, and for flagging structural constituents. Similarities are also found in the code chosen for conveying this information, e.g. emphasis by lengthening and constituent marking by final lengthening. 

    Download full text (pdf)
    fulltext
  • 17.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Lindblom, Björn
    Risberg, Arne
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Gunnar Fant 1920-2009 In Memoriam2009In: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 66, no 4, p. 249-250Article in journal (Refereed)
  • 18. Dong, Li
    et al.
    Kong, Jiangping
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Long-term-average spectrum characteristics of Kunqu Opera singers' speaking, singing and stage speech2014In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 39, no 2, p. 72-80Article in journal (Refereed)
    Abstract [en]

    Long-term-average spectrum (LTAS) characteristics were analyzed for ten Kunqu Opera singers, two in each of five roles. Each singer performed singing, stage speech, and conversational speech. Differences between the roles and between their performances of these three conditions are examined. After compensating for Leq difference LTAS characteristics still differ between the roles but are similar for the three conditions, especially for Colorful face (CF) and Old man roles, and especially between reading and singing. The curves show no evidence of a singer's formant cluster peak, but the CF role demonstrates a speaker's formant peak near 3 kHz. The LTAS characteristics deviate markedly from non-singers' standard conversational speech as well as from those of Western opera singing.

  • 19. Dong, Li
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Kong, Jiangping
    Loudness and Pitch of Kunqu Opera2014In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 28, no 1, p. 14-19Article in journal (Refereed)
    Abstract [en]

    Equivalent sound level (Leq), sound pressure level (SPL), and fundamental frequency (F-0) are analyzed in each of five Kunqu Opera roles, Young girl and Young woman, Young man, Old man, and Colorful face. Their pitch ranges are similar to those of some western opera singers (alto, alto, tenor, baritone, and baritone, respectively). Differences among tasks, conditions (stage speech, singing, and reading lyrics), singers, and roles are examined. For all singers, Leq of stage speech and singing were considerably higher than that of conversational speech. Interrole differences of Leq among tasks and singers were larger than the intrarole differences. For most roles, time domain variation of SPL differed between roles both in singing and stage speech. In singing, as compared with stage speech, SPL distribution was more concentrated and variation of SPL with time was smaller. With regard to gender and age, male roles had higher mean Leq and lower average F-0, MF0, as compared with female roles. Female singers showed a wider F-0 distribution for singing than for stage speech, whereas the opposite was true for male singers. The Leq of stage speech was higher than in singing for young personages. Younger female personages showed higher Leq, whereas older male personages had higher Leq. The roles performed with higher Leq tended to be sung at a lower MF0.

  • 20. Echternach, Matthias
    et al.
    Birkholz, Peter
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics. University College of Music Education, Stockholm, Sweden.
    Traser, Louisa
    Korvink, Jan Gerrit
    Richter, Bernhard
    Resonatory Properties in Professional Tenors Singing Above the Passaggio2016In: Acta Acoustica united with Acustica, ISSN 1610-1928, E-ISSN 1861-9959, Vol. 102, no 2, p. 298-306Article in journal (Refereed)
    Abstract [en]

    Introduction: The question of formant tuning in male professional voices has been a matter of discussion for many years. Material and Methods: In this study four very successful Western classically trained tenors of different repertoire were analysed. They sang a scale on the vowel conditions /a,e,i,o,u/ from the pitch C4 (250 Hz) to A4 (440 Hz) in their stage voice avoiding a register shift to falsetto. Formant frequencies were calculated from inverse filtering of the audio signal and from two-dimensional MRI data. Results: Both estimations showed only for vowel conditions with low first formant (F1) a tuning F1 adjusted to the first harmonic. For other vowel conditions, however, no clear systematic formant tuning was observed. Conclusion: For most vowel conditions the data are not able to support the hypothesis of a systematic formant tuning for professional classically trained tenors.

  • 21. Echternach, Matthias
    et al.
    Dippold, Sebastian
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Arndt, Susan
    Zander, Mark F.
    Richter, Bernhard
    High-Speed Imaging and Electroglottography Measurements of the Open Quotient in Untrained Male Voices' Register Transitions2010In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 24, no 6, p. 644-650Article in journal (Refereed)
    Abstract [en]

    Vocal fold oscillation patterns in vocal register transitions are still unclarified. The vocal fold oscillations and the open quotient were analyzed with high-speed digital imaging (HSDI) and electroglottography (EGG) in 18 male untrained subjects singing a glissando from modal to the falsetto register. Results reveal that the open quotient changed with register in both HSDI. and EGG. The in-class correlations for different HSDI and EGG determinations of the open quotient were high. However, we found only weak interclass correlations between both methods. In ID subjects, irregularities of vocal fold vibration occurred during the register transition. Our results confirm previous observations that falsetto register is associated with a higher open quotient compared with modal register. These data suggest furthermore that irregularities typically observed in audio and electroglottographic signals during register transitions are caused by irregularities in vocal fold vibration.

  • 22. Echternach, Matthias
    et al.
    Doellinger, Michael
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Traser, Louisa
    Richter, Bernhard
    Vocal fold vibrations at high soprano fundamental frequencies2013In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 133, no 2, p. EL82-EL87Article in journal (Refereed)
    Abstract [en]

    Human voice production at very high fundamental frequencies is not yet understood in detail. It was hypothesized that these frequencies are produced by turbulences, vocal tract/vocal fold interactions, or vocal fold oscillations without closure. Hitherto it has been impossible to visually analyze the vocal mechanism due to technical limitations. Latest high-speed technology, which captures 20 000 frames/s, using transnasal endoscopy was applied. Up to 1568Hz human vocal folds do exhibit oscillations with complete closure. Therefore, the recent results suggest that human voice production at very high F0s up to 1568Hz is not caused by turbulence, but rather by airflow modulation from vocal fold oscillations. (C) 2013 Acoustical Society of America

  • 23. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Arndt, Susan
    Breyer, Tobias
    Markl, Michael
    Schumacher, Martin
    Richter, Bernhard
    Vocal tract and register changes analysed by real-time MRI in male professional singers - a pilot study2008In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 33, no 2, p. 67-73Article in journal (Refereed)
    Abstract [en]

    Changes of vocal tract shape accompanying changes of vocal register and pitch in singing have remained an unclear field. Dynamic real-time magnetic resonance imaging (MRI) was applied to two professional classical singers (a tenor and a baritone) in this pilot study. The singers sang ascending scales from B3 to G#4 on the vowel /a/, keeping the modal register throughout or shifting to falsetto register for the highest pitches. The results show that these singers made few and minor modifications of vocal tract shape when they changed from modal to falsetto and some clear modifications when they kept the register. In this case the baritone increased his tongue dorsum height, widened his jaw opening, and decreased his jaw protrusion, while the tenor merely lifted his uvula. The method used seems promising and should be applied to a greater number of singer subjects in the future.

  • 24. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Arndt, Susan
    Markl, Michael
    Schumacher, Martin
    Richter, Bernhard
    Vocal Tract in Female Registers: A Dynamic Real-Time MRI Study2010In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 24, no 2, p. 133-139Article in journal (Refereed)
    Abstract [en]

    The area of vocal registers is still unclarified. In a previous investigation, dynamic real-time magnetic resonance imaging (MRI), which is able to produce up to 10 frames per second, was successfully applied for examinations of vocal tract modifications in register transitions in male singers. In the present study, the same MRI technique was used to study vocal tract shapes during four professional young sopranos' lower and upper register transitions. The subjects were asked to sing a scale on the vowel /a/ across their transitions. The transitions were acoustically identified by four raters. In neither of these transitions, clear vocal tract changes could be ascertained. However, substantial changes, that is, widening of the lips, opening of the jaw, elevation of the tongue dorsum, and continuous widening of the pharynx, were observed when the singers reached fundamental frequencies that were close to the frequency of the first formant of the vowel sung. These findings suggest that in these subjects register transition was not primarily the result of modifications of the vocal tract.

  • 25. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Baumann, Tobias
    Markl, Michael
    Richter, Bernhard
    Vocal tract area functions and formant frequencies in opera tenors' modal and falsetto registers2011In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 129, no 6, p. 3955-3963Article in journal (Refereed)
    Abstract [en]

    According to recent model investigations, vocal tract resonance is relevant to vocal registers. However, no experimental corroboration of this claim has been published so far. In the present investigation, ten professional tenors' vocal tract configurations were analyzed using MRI volumetry. All subjects produced a sustained tone on the pitch F4 (349 Hz) on the vowel /a/(1) in modal and (2) in falsetto register. The area functions were estimated from the MRI data and their associated formant frequencies were calculated. In a second condition the same subjects repeated the same tasks in a sound treated room and their formant frequencies were estimated by means of inverse filtering. In both recordings similar formant frequencies were observed. Vocal tract shapes differed between modal and falsetto register. In modal as compared to falsetto the lip opening and the oral cavity were wider and the first formant frequency was higher. In this sense the presented results are in agreement with the claim that the formant frequencies differ between registers.

  • 26. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Markl, Michael
    Richter, Bernhard
    Professional Opera Tenors' Vocal Tract Configurations in Registers2010In: Folia Phoniatrica et Logopaedica, ISSN 1021-7762, E-ISSN 1421-9972, Vol. 62, no 6, p. 278-287Article in journal (Refereed)
    Abstract [en]

    Objective: Tenor singers may reach their top pitch range either by shifting from modal to falsetto register or by using their so-called 'voix mixte'. Material and Methods: In this study, dynamic real-time MRI of 8 frames per second was used to analyze the vocal tract profile in 10 professional opera tenors, who sang an ascending scale from C4 (262 Hz) to A4 (440 Hz) on the vowel /a/. The scale included their register transition and the singers applied both register techniques in different takes. Results: Modal to falsetto register changes were associated with only minor vocal tract modifications, including elevation and tilting of the larynx and a lifted tongue dorsum. Transitions to voix mixte, by contrast, were associated with major vocal tract modifications. Under these conditions, the subjects widened their pharynges, their lip and jaw openings, and increased their jaw protrusion. These modifications were stronger in more 'heavy' tenors than in more 'light' tenors. The acoustic consequences of these articulatory changes are discussed.

  • 27. Echternach, Matthias
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Zander, Mark F.
    Richter, Bernhard
    Perturbation Measurements in Untrained Male Voices' Transitions From Modal to Falsetto Register2011In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 25, no 6, p. 663-669Article in journal (Refereed)
    Abstract [en]

    Purpose. Voice periodicity during transitions from modal to falsetto register still remains an unclarified question. Method. We examined the acoustic and electroglottographic signals of 20 healthy untrained male voices' transitions from modal to falsetto register on the vowels /a, e, i, o, u, and ae/. Results. In addition to discontinuities in fundamental frequency (F0), an independent increase of jitter, relative average perturbation, and shimmer was observed during and apparently caused by the register transition. In falsetto, the jitter was higher than in the modal register. The contact quotient derived from the electroglottographic signal tended to be lower for higher than for lower F0. Conclusion. Register transitions are associated with increase of perturbation.

  • 28.
    Ekström, Axel G.
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Moran, Steven
    Sundberg, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Lameira, Adriano R.
    PREQUEL: SUPERVISED PHONETIC APPROACHES TO ANALYSES OF GREAT APE QUASI-VOWELS2023In: ICPhS 2023, 2023Conference paper (Refereed)
    Abstract [en]

     There is renewed interest in potential vowel production by nonhuman primates, but no agreedupon methodologies for its estimation from reallife vocalizations. Here, we present a set of supervised approaches for estimating primate vowel-like articulation, with reference to orangutan long call pulses (N=36). We summarize our approach as a cohesive framework, the Primate Quasi-Vowel (PREQUEL) protocol. We (1) estimated f0 from correlograms, (2) and vocal tract resonances (formants) from spectrograms, (3) the results of which were then compared against synthesized vowels for those frequency values; and (4) presented to uninformed listeners (N=16), who largely agreed on the categorization of vowel-like qualities for vocalizations (Cronbach’s alpha=.701). We also provide descriptions of methods that are seemingly inadequate for formant estimation in great ape calls. We argue that a combination of phonetic methods is required to develop a science of nonhuman primate articulation.

    Download full text (pdf)
    fulltext
  • 29.
    Enflo, Laura
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics. Linköping Univ, Sweden.
    Herbst, C.T.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    McAllister, A.
    Comparing Vocal Fold Contact Criteria Derived From Audio and Electroglottographic Signals2016In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 30, no 4, p. 381-388Article in journal (Refereed)
    Abstract [en]

    Objectives. Collision threshold pressure (CTP), that is, the lowest subglottal pressure facilitating vocal fold contact during phonation, is likely to reflect relevant vocal fold properties. The amplitude of an electroglottographic (EGG) signal or the amplitude of its first derivative (dEGG) has been used as criterion of such contact. Manual measurement of CTP is time consuming, making the development of a simpler, alternative method desirable. Method. In this investigation, we compare CTP values measured manually to values automatically derived from dEGG and to values derived from a set of alternative parameters, some obtained from audio and some from EGG signals. One of the parameters was the novel EGG wavegram, which visualizes sequences of EGG or dEGG cycles, normalized with respect to period and amplitude. Raters with and without previous acquaintance with EGG analysis marked the disappearance of vocal fold contact in dEGG and in wavegram displays of /pa:/-=sequences produced with continuously decreasing vocal loudness by seven singer subjects. Results. Vocal fold contact was mostly identified accurately in displays of both dEGG amplitude and wavegram. Automatically derived CTP values showed high correlation with those measured manually and with those derived from the ratings of the visual displays. Seven other parameters were tested as criteria of such contact. Mainly, because of noise in the EGG signal, most of them yielded CTP values differing considerably from those derived from the manual and the automatic methods, although the EGG spectrum slope showed a high correlation. Conclusion. The possibility of measuring CTP automatically seems promising for future investigations.

  • 30.
    Enflo, Laura
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Threshold Pressure For Vocal Fold Collision2007In: Proceedings of Pan European Voice Conference 7 (PEVOC 7), Groningen, The Netherlands, 2007, p. 69-Conference paper (Refereed)
  • 31.
    Enflo, Laura
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Vocal fold collision threshold pressure: An alternative to phonation threshold pressure?2009In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 34, no 4, p. 210-217Article in journal (Refereed)
    Abstract [et]

    Phonation threshold pressure (PTP), frequently used for characterizing vocal fold properties, is often difficult to measure. This investigation analyses the lowest pressure initiating vocal fold collision (CTP). Microphone, electroglottograph (EGG), and oral pressure signals were recorded, before and after vocal warm-up, in 15 amateur singers, repeating the syllable /pa:/ at several fundamental frequencies with gradually decreasing vocal loudness. Subglottal pressure was estimated from oral pressure during the p-occlusion, using the audio and the EGG amplitudes as criteria for PTP and CTP. The coefficient of variation was mostly lower for CTP than for PTP. Both CTP and PTP tended to be higher before than after the warm-up. The results support the conclusion that CTP is a promising parameter in investigations of vocal fold characteristics.

  • 32.
    Enflo, Laura
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    McAllister, Anita
    Collision and Phonation Threshold Pressures Before and After Loud, Prolonged Vocalization in Trained and Untrained Voices2013In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 27, no 5, p. 527-530Article in journal (Refereed)
    Abstract [en]

    The phonation threshold pressure (PTP) is defined as the lowest subglottal pressure needed for obtaining and sustaining vocal fold oscillation. It has been found to increase during vocal fatigue. In the present study, PTP is measured together with the threshold pressure needed for vocal fold collision; henceforth, the collision threshold pressure (CTP). PTP and CTP are compared before and after loud, prolonged vocalization in singer and nonsinger voices. Ten subjects repeated the vowel sequence /a, e, i, o, u/ at a Sound Pressure Level of at least 80 dB at 0.3 m for 20 minutes. Audio and electroglottography signals were recorded before and after this exercise. At the same time, oral pressure was registered while the subjects produced a diminuendo repeating the syllable /pa:/, thus acquiring an approximate of the subglottal pressure. CTP and PTP increased significantly after the vocal loading in the nonsinger subjects. On the other hand, singers reported no substantial effect of the exercise, and most singers had a mean after-to-before ratio close to 1 for both CTP and PTP.

  • 33.
    Enflo, Laura
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Pabst, Friedemann
    Hospital Dresden Friedrichstadt.
    Collision Threshold Pressure Before and After Vocal Loading2009In: INTERSPEECH 2009: 10th Annual Conference of the International Speech Communication Association 2009, 2009, p. 764-767Conference paper (Refereed)
    Abstract [en]

    The phonation threshold pressure (PIP) has been found to increase during vocal fatigue. In the present study we compare PTP and collision threshold pressure (CTP) before and after vocal loading in singer and non-singer voices. Seven subjects repeated the vowel sequence /a,c,i,o,u/ at an SPL of at least 80 dB @ 0.3 m for 20 min. Before and after this loading the subjects' voices were recorded while they produced a diminuendo repeating the syllable /pa/. Oral pressure during the /p/ occlusion was used as a measure of subglottal pressure. Both CTP and PIP increased significantly after the vocal loading.

  • 34.
    Enflo, Laura
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Pabst, Friedemann
    Hospital Dresden Friedrichstadt, Dresden, Germany.
    Effects of vocal loading on the phonation and collision threshold pressures2009In: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, p. 24-27Conference paper (Other academic)
    Abstract [en]

    Phonation threshold pressures (PTP) have been commonly used for obtaining a quantita-tive measure of vocal fold motility. However, as these measures are quite low, it is typically dif-ficult to obtain reliable data. As the amplitude of an electroglottograph (EGG) signal de-creases substantially at the loss of vocal fold contact, it is mostly easy to determine the colli-sion threshold pressure (CTP) from an EGG signal. In an earlier investigation (Enflo & Sundberg, forthcoming) we measured CTP and compared it with PTP in singer subjects. Re-sults showed that in these subjects CTP was on average about 4 cm H2O higher than PTP. The PTP has been found to increase during vocal fatigue. In the present study we compare PTP and CTP before and after vocal loading in singer and non-singer voices, applying a load-ing procedure previously used by co-author FP. Seven subjects repeated the vowel se-quence /a,e,i,o,u/ at an SPL of at least 80 dB @ 0.3 m for 20 min. Before and after the loading the subjects’ voices were recorded while they produced a diminuendo repeating the syllable /pa/. Oral pressure during the /p/ occlusion was used as a measure of subglottal pressure. Both CTP and PTP increased significantly after the vocal loading.

  • 35.
    Enflo, Laura
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Romedahl, Camilla
    McAllister, Anita
    Effects on Vocal Fold Collision and Phonation Threshold Pressure of Resonance Tube Phonation With Tube End in Water2013In: Journal of Speech, Language and Hearing Research, ISSN 1092-4388, E-ISSN 1558-9102, Vol. 56, no 5, p. 1530-1538Article in journal (Refereed)
    Abstract [en]

    Purpose: Resonance tube phonation in water (RTPW) or in air is a voice therapy method successfully used for treatment of several voice pathologies. Its effect on the voice has not been thoroughly studied. This investigation analyzes the effects of RTPW on collision and phonation threshold pressures (CTP and PTP), the lowest subglottal pressure needed for vocal fold collision and phonation, respectively. Method: Twelve mezzo-sopranos phonated into a glass tube, the end of which was placed under the water surface in a jar. Subglottal pressure, electroglottography, and audio signals were recorded before and after exercise. Also, the perceptual effects were assessed in a listening test with an expert panel, who also rated the subjects' singing experience. Results: Resonance tube phonation significantly increased CTP and also tended to improve perceived voice quality. The latter effect was mostly greater in singers who did not practice singing daily. In addition, a more pronounced perceptual effect was found in singers rated as being less experienced. Conclusion: Resonance tube phonation significantly raised CTP and tended to improve perceptual ratings of voice quality. The effect on PTP did not reach significance.

  • 36. Eyben, Florian
    et al.
    Salomão, Gláucia Laís
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. Institutionen för Lingvistik, Stockholm Universitet / Department of Linguistics, Stockhom University.
    Sundberg, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Scherer, Klaus R.
    Schuller, Bjorn W.
    Emotion in the singing voice-a deeper look at acoustic features in the light of automatic classification2015In: EURASIP Journal on Audio, Speech, and Music Processing, ISSN 1687-4714, E-ISSN 1687-4722, ISSN 1687-4714, Vol. 2015, no 1, article id 19Article in journal (Refereed)
    Abstract [en]

    We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.

  • 37. Eyben, Florian
    et al.
    Salomão, Gláucia Laís
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics. KTH (Royal Institute of Technology), Sweden.
    Sundberg, Johan
    Scherer, Klaus R.
    Schuller, Björn W.
    Emotion in the singing voice—a deeper look at acoustic features in the light of automatic classification2015In: EURASIP Journal on Audio, Speech, and Music Processing, ISSN 1687-4714, E-ISSN 1687-4722, article id 19Article in journal (Refereed)
    Abstract [en]

    We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.

    Download full text (pdf)
    fulltext
  • 38. Eyben, Florian
    et al.
    Scherer, Klaus R.
    Schuller, Bjoern W.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Andre, Elisabeth
    Busso, Carlos
    Devillers, Laurence Y.
    Epps, Julien
    Laukka, Petri
    Narayanan, Shrikanth S.
    Truong, Khiet P.
    The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing2016In: IEEE Transactions on Affective Computing, E-ISSN 1949-3045, Vol. 7, no 2, p. 190-202Article in journal (Refereed)
    Abstract [en]

    Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.

  • 39. Fornhammar, L.
    et al.
    Sundberg, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Universty College of Music Education, Stockholm, Sweden.
    Fuchs, M.
    Pieper, L.
    Measuring Voice Effects of Vibrato-Free and Ingressive Singing: A Study of Phonation Threshold Pressures2022In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 36, no 4, p. 479-486Article in journal (Refereed)
    Abstract [en]

    Background: Phonation threshold pressure (PTP), showing the lowest subglottal pressure producing vocal fold vibration, has been found useful for documenting various effects of phonatory conditions. The need for such documentation is relevant also to the teaching of singing, particularly in view of vocal demands raised in some contemporary as well as early music compositions. The aim of the present study was to test the usefulness of PTP measurement for evaluating phonatory effects of vibrato-free and ingressive singing in professional singers. Methods: PTP was measured at a middle, a high and a low pitch in two female and two male singers before and after recording voice range profiles (i) in habitual technique, ie, with vibrato, (ii) in vibrato-free, and (iii) in ingressive phonation. Effects on vocal fold status were examined by videolaryngostroboscopy. Results: After careful instruction of the singers, no problems were found in applying the PTP method. In some singers videolaryngostroboscopy showed effects after the experiment, eg, in terms of increased mucus and more complete glottal closure. After ingressive phonation PTP increased substantially at high pitch in one singer but changed marginally in the other singers. Conclusion: The method seems useful for assessing and interpreting effects of singing in different styles and as a part of voice diagnostics. Therefore, it seems worthwhile to automatize PTP measurement.

  • 40.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bresin, Roberto
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Fryden, Lars
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Musical punctuation on the microlevel: Automatic identification and performance of small melodic units1998In: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 27, no 3, p. 271-292Article in journal (Refereed)
    Abstract [en]

    In this investigation we use the term musical punctuation for the marking of melodic structure by commas inserted at the boundaries that separate small structural units. Two models are presented that automatically try to locate the positions of such commas. They both use the score as the input and operate with a short context of maximally five notes. The first model is based on a set of subrules. One group of subrules mark possible comma positions, each provided with a weight value. Another group alters or removes these weight values according to different conditions. The second model is an artificial neural network using a similar input as that used by the rule system. The commas proposed by either model are realized in terms of micropauses and of small lengthenings of interonset durations. The models are evaluated by using a set of 52 musical excerpts, which were marked with punctuations according to the preference of an expert performer.

  • 41.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bresin, Roberto
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Musical punctuation on the microlevel: Automatic identification and performance of small melodic units1998In: Journal of New Music Research, ISSN 0929-8215, E-ISSN 1744-5027, Vol. 27, no 3, p. 271-292Article in journal (Refereed)
    Abstract [en]

    In this investigation we use the term musical punctuation for the marking of melodic structure by commas inserted at the boundaries that separate small structural units. Two models are presented that automatically try to locate the positions of such commas. They both use the score as the input and operate with a short context of maximally five notes. The first model is based on a set of subrules. One group of subrules mark possible comma positions, each provided with a weight value. Another group alters or removes these weight values according to different conditions. The second model is an artificial neural network using a similar input as that used by the rule system. The commas proposed by either model are realized in terms of micropauses and of small lengthenings of interonset durations. The models are evaluated by using a set of 52 musical excerpts, which were marked with punctuations according to the preference of an expert performer. * Sound examples are available in the JNMR Electronic Appendix (EA), which can be found on the WWW at http://www.swets.nl/jnmr/jnmr.html

    Download full text (pdf)
    data set
  • 42.
    Friberg, Anders
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Analysis by synthesis2014In: Music in the Social and Behavioral Sciences / [ed] Thompson, W. F., Los Angeles: Sage Publications, 2014Chapter in book (Refereed)
  • 43.
    Friberg, Anders
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics. KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Expressive timing2014In: Music in the Social and Behavioral Sciences / [ed] Thompson, W. F., Los Angeles: Sage Publications, 2014, p. 440-442Chapter in book (Refereed)
  • 44.
    Friberg, Anders
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Bresin, Roberto
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Overview of the KTH rule system for musical performance2006In: Advances in Cognitive Psychology, E-ISSN 1895-1171, Vol. 2, no 2-3, p. 145-161Article in journal (Refereed)
    Abstract [en]

    The KTH rule system models performance principles used by musicians when performing a musical score, within the realm of Western classical, jazz and popular music. An overview is given of the major rules involving phrasing, micro-level timing, metrical patterns and grooves, articulation, tonal tension, intonation, ensemble timing, and performance noise. By using selections of rules and rule quantities, semantic descriptions such as emotional expressions can be modeled. A recent real-time implementation provides the means for controlling the expressive character of the music. The communicative purpose and meaning of the resulting performance variations are discussed as well as limitations and future improvements.

  • 45.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bodin, L. G.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Performance Rules for Computer-Controlled Contemporary Keyboard Music1991In: Computer music journal, ISSN 0148-9267, E-ISSN 1531-5169, Vol. 15, no 2, p. 49-55Article in journal (Refereed)
    Abstract [en]

    A computer program for synthesis of music performance, originally developed for traditional tonal music by means of an analysis-by-synthesis strategy, is applied to contemporary piano music as well as to various computer-generated random music. The program consists of rules that manipulate the durations and sound levels of the tones in a contextdependent way. When applying the rules to this music, the concept harmonic charge, which has been found useful for generating crescendi and diminuendi in performance of traditional tonal music for example, is replaced by chromatic charge. The music is performed on a Casio sampler controlled by a Macintosh II microcomputer. A listening panel of five experts on contemporary piano music or electroacoustic music clearly preferred performances processed by the performance program to "deadpan" performances mechanically replicating the durations and sound levels nominally written in the music score. 

    Download full text (pdf)
    fulltext
  • 46.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Bodin, L-G
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Performance rules for computer controlled performance of contemporary keyboard music1987In: STL-QPSR, Vol. 28, no 4, p. 079-085Article in journal (Other academic)
    Download full text (pdf)
    fulltext
  • 47.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    A rule for automatic musical punctuation of melodies1997In: Proc of 3rd Triennial ESCOM Conference, 1997, p. 719-723Conference paper (Refereed)
  • 48.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Iwarsson, JennyKTH, Superseded Departments (pre-2005), Speech, Music and Hearing.Jansson, ErikKTH, Superseded Departments (pre-2005), Speech, Music and Hearing.Sundberg, JohanKTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Proceedings of the Stockholm Music Acoustics Conference 19931994Conference proceedings (editor) (Other academic)
  • 49.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    A Lisp Environment for Creating and Applying Rules for Musical Performance1986In: Proceedings of the International Computer Music Conference 1986, 1986, p. 1-3Conference paper (Refereed)
    Download full text (pdf)
    fulltext
  • 50.
    Friberg, Anders
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Comparing runners« decelerations and final ritards1997In: Proc of 3rd Triennial ESCOM Conference, 1997, p. 582-586Conference paper (Refereed)
12345 1 - 50 of 206
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf