Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A Speech Corpus for Modeling Language Acquisition: CAREGIVER
Aalto Univ. School of Science and Tech., Dept. of Signal Proc. & Acoustics.
Radboud University Nijmegen, Language and Speech unit.
Univ. of Sheffield, Speech & Hearing group, Dept. of Computer Science.
KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling (Stängd 130101).
Visa övriga samt affilieringar
2010 (Engelska)Ingår i: 7th International Conference on Language Resources and Evaluation (LREC) 2010, Valletta, Malta / [ed] Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias, European Language Resources Association (ELRA) , 2010, s. 1062-1068Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communication and Recognition Skills (ACORNS) project. The paper describes the motivation behind the corpus and its design by relying on current knowledge regarding infant language acquisition. Instead of recording infants and children, the voices of their primary and secondary caregivers were captured in both infant-directed and adult-directed speech modes over four languages in a read speech manner. The challenges and methods applied to obtain similar prompts in terms of complexity and semantics across different languages, as well as the normalized recording procedures employed at different locations, is covered. The corpus contains nearly 66000 utterance based audio files spoken over a two-year period by 17 male and 17 female native speakers of Dutch, English, Finnish, and Swedish. An orthographical transcription is available for every utterance. Also, time-aligned word and phone annotations for many of the sub-corpora also exist. The CAREGIVER corpus will be published via ELRA.

Ort, förlag, år, upplaga, sidor
European Language Resources Association (ELRA) , 2010. s. 1062-1068
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling) Annan data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:kth:diva-39052ISI: 000356879505138Scopus ID: 2-s2.0-84891053656ISBN: 9782951740860 (tryckt)OAI: oai:DiVA.org:kth-39052DiVA, id: diva2:439329
Konferens
LREC 2010, Seventh International Conference on Language Resources and Evaluation, 17-23 May, Malta
Anmärkning

QC 20111017

Tillgänglig från: 2011-09-07 Skapad: 2011-09-07 Senast uppdaterad: 2020-01-24Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Scopushttp://www.lrec-conf.org/proceedings/lrec2010/pdf/597_Paper.pdf

Sök vidare i DiVA

Av författaren/redaktören
Koniaris, Christos
Av organisationen
Ljud- och bildbehandling (Stängd 130101)
Språkteknologi (språkvetenskaplig databehandling)Annan data- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 875 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf