Digitala Vetenskapliga Arkivet

Change search
Refine search result
1 - 12 of 12
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Bjerva, Johannes
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Genetic Algorithms in the Brill Tagger: Moving towards language independence2013Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    The viability of using rule-based systems for part-of-speech tagging was revitalised when a simple rule-based tagger was presented by Brill (1992). This tagger is based on an algorithm which automatically derives transformation rules from a corpus, using an error-driven approach. In addition to performing on par with state of the art stochastic systems for part-of-speech tagging, it has the advantage that the automatically derived rules can be presented in a human-readable format.

    In spite of its strengths, the Brill tagger is quite language dependent, and performs much better on languages similar to English than on languages with richer morphology. This issue is addressed in this paper through defining rule templates automatically with a search that is optimised using Genetic Algorithms. This allows the Brill GA-tagger to search a large search space for templates which in turn generate rules which are appropriate for various target languages, which has the added advantage of removing the need for researchers to define rule templates manually.

    The Brill GA-tagger performs significantly better (p<0.001) than the standard Brill tagger on all 9 target languages (Chinese, Japanese, Turkish, Slovene, Portuguese, English, Dutch, Swedish and Icelandic), with an error rate reduction of between 2% -- 15% for each language.

    Download full text (pdf)
    fulltext
  • 2.
    Bjerva, Johannes
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Predicting the N400 Component in Manipulated and Unchanged Texts with a Semantic Probability Model2012Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Within the field of computational linguistics, recent research has made successful advances in integrating word space models with n-gram models. This is of particular interest when a model that encapsulates both semantic and syntactic information is desirable. A potential application for this can be found in the field of psycholinguistics, where the neural response N400 has been found to occur in contexts with semantic incongruities. Previous research has found correlations between cloze probabilities and N400, while more recent research has found correlations between cloze probabilities and language models.

    This essay attempts to uncover whether or not a more direct connection between integrated models and N400 can be found, hypothesizing that low probabilities elicit strong N400 responses and vice versa. In an EEG experiment, participants read a text manipulated using a language model, and a text left unchanged. Analysis of the results shows that the manipulations to some extent yielded results supporting the hypothesis. Further results are found when analysing responses to the unchanged text. However, no significant correlations between N400 and the computational model are found. Future research should improve the experimental paradigm, so that a larger scale EEG recording can be used to construct a large EEG corpus.

    Download full text (pdf)
    Bjerva2012
  • 3.
    Bjerva, Johannes
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics. University of Groningen, The Netherlands.
    Börstell, Carl
    Stockholm University, Faculty of Humanities, Department of Linguistics, Sign Language.
    Morphological complexity influences Verb–Object order in Swedish Sign Language2016In: Computational Linguistics for Linguistic Complexity: Proceedings of the Workshop / [ed] Dominique Brunato, Felice Dell'Orletta, Giulia Venturi, Thomas François, Philippe Blache, International Committee on Computational Linguistics (ICCL) , 2016, p. 137-141Conference paper (Refereed)
    Abstract [en]

    Computational linguistic approaches to sign languages could benefit from investigating how complexity influences structure. We investigate whether morphological complexity has an effect on the order of Verb (V) and Object (O) in Swedish Sign Language (SSL), on the basis of elicited data from five Deaf signers. We find a significant difference in the distribution of the orderings OV vs. VO, based on an analysis of morphological weight. While morphologically heavy verbs exhibit a general preference for OV, humanness seems to affect the ordering in the opposite direction, with [+human] Objects pushing towards a preference for VO.

    Download full text (pdf)
    fulltext
  • 4. Bjerva, Johannes
    et al.
    Grigonyte, Gintare
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Östling, Robert
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Plank, Barbara
    Neural Networks and Spelling Features for Native Language Identification2017In: The Twelfth Workshop on Innovative Use of NLP for Building Educational Applications: Proceedings of the Workshop, Association for Computational Linguistics, 2017, p. 235-239Conference paper (Refereed)
    Abstract [en]

    We present the RUG-SU team's submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reaching an F1 score of 0.8323. Although our system is not the highest ranking one, we do outperform the baseline by far.

    Download full text (pdf)
    fulltext
  • 5.
    Bjerva, Johannes
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Marklund, Ellen
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Engdahl, Johan
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Lacerda, Francisco
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Anticipatory Looking in Infants and Adults2011In: Proceedings of EyeTrackBehavior 2011, 2011Conference paper (Other academic)
    Abstract [en]

    Infant language acquisition research faces the challenge of dealing with subjects who are unable to provide spoken answers to research questions. To obtain comprehensible data from such subjects eye tracking is a suitable research tool, as the infants’ gaze can be interpreted as behavioural responses. The purpose of the current study was to investigate the amount of training necessary for participants to learn an audio-visual contingency and present anticipatory looking behaviour in response to an auditory stimulus. Infants (n=22) and adults (n=16) were presented with training sequences, every fourth of which was followed by a test sequence. Training sequences contained implicit audio-visual contingencies consisting of a syllable (/da/ or /ga/) followed by an image appearing on the left/right side of the screen. Test sequences were identical to training sequences except that no image appeared. The latency in time to first fixation towards the non-target area during test sequences was used as a measurement of whether the participants had grasped the contingency. Infants were found to present anticipatory looking behaviour after 24 training trials. Adults were found to present anticipatory looking behaviour after 28-36 training trials. In future research a more interactive experiment design will be employed in order to individualise the amount of training, which will increase the time span available for testing.

    Download full text (pdf)
    BjervaMarklundEngdahlLacerda_2011
  • 6.
    Bjerva, Johannes
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Marklund, Ellen
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Engdahl, Johan
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Tengstrand, Lisa
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Lacerda, Francisco
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Preceding non-linguistic stimuli affect categorisation of Swedish plosives2012In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 131, no 4Article in journal (Refereed)
    Abstract [en]

    Speech perception is highly context-dependent. Sounds preceding speech stimuli affect how listeners categorise the stimuli, regardless of whether the context consists of speech or non-speech. This effect is acoustically contrastive; a preceding context with high-frequency acoustic energy tends to skew categorisation towards speech sounds possessing lower-frequency acoustic energy and vice versa (Mann, 1980; Holt, Lotto, Kluender, 2000; Holt, 2005). Partially replicating Holt's study from 2005, the present study investigates the effect of non-linguistic contexts in different frequency bands on speech categorisation. Adult participants (n=15) were exposed to Swedish syllables from a speech continuum ranging from /da/ to /ga/ varying in the onset frequencies of the second and third formants in equal steps. Contexts preceding the speech stimuli consisted of sequences of sine tones distributed in different frequency bands: high, mid and low. Participants were asked to categorise the syllables as /da/ or /ga/. As hypothesised, high frequency contexts shift the category boundary towards /da/, while lower frequency contexts shift the boundary towards /ga/, compared to the mid frequency context.

  • 7. Bjerva, Johannes
    et al.
    Östling, Robert
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations2017In: Proceedings of the 21st Nordic Conference on Computational Linguistics / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, p. 211-215, article id 024Conference paper (Refereed)
    Abstract [en]

    Assessing the semantic similarity between sentences in different languages is challenging. We approach this problem by leveraging multilingual distributional word representations, where similar words in different languages are close to each other. The availability of parallel data allows us to train such representations on a large amount of languages. This allows us to leverage semantic similarity data for languages for which no such data exists. We train and evaluate on five language pairs, including English, Spanish, and Arabic. We are able to train wellperforming systems for several language pairs, without any labelled data for that language pair.

    Download full text (pdf)
    fulltext
  • 8. Bjerva, Johannes
    et al.
    Östling, Robert
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Han Veiga, Maria
    Tiedemann, Jörg
    Augenstein, Isabelle
    What Do Language Representations Really Represent?2019In: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 45, no 2, p. 381-389Article in journal (Other academic)
    Abstract [en]

    A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just as it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, whereas genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

    Download full text (pdf)
    fulltext
  • 9.
    de Lhoneux, Miryam
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Bjerva, Johannes
    Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark..
    Augenstein, Isabelle
    Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark..
    Sogaard, Anders
    Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark..
    Parameter sharing between dependency parsers for related languages2018In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2018, p. 4992-4997Conference paper (Refereed)
    Abstract [en]

    Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to better performance, but there is no consensus on what parameters to share. We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies. Based on this result, we propose an architecture where the transition classifier is shared, and the sharing of word and character parameters is controlled by a parameter that can be tuned on validation data. This model is linguistically motivated and obtains significant improvements over a mono-lingually trained baseline. We also find that sharing transition classifier parameters helps when training a parser on unrelated language pairs, but we find that, in the case of unrelated languages, sharing too many parameters does not help.

  • 10.
    Engdahl, Johan
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Bjerva, Johannes
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Marklund, Ellen
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Byström, Emil
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Lacerda, Francisco
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Acoustic analysis of adults imitating infants: a cross-linguistic perspective2012In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 131, no 4Article in journal (Refereed)
    Abstract [en]

    The present study investigates adult imitations of infant vocalizations in a cross-linguistic perspective. Japanese-learning and Swedish-learning infants were recorded at ages 16-21 and 78-79 weeks. Vowel-like utterances (n=210) were selected from the recordings and presented to Japanese (n=3) and Swedish (n=3) adults. The adults were asked to imitate what they heard, simulating a spontaneous feedback situation between caregiver and infant. Formant data (F1 and F2) was extracted from all utterances and validated by comparing original and formant re-synthesized utterances. The data was normalized for fundamental frequency and time, and the accumulated spectral difference was calculated between each infant utterance and each imitation of that utterance. The mean spectral difference was calculated and compared, grouped by native language of infant and adult, as well as age of the infant. Preliminary results show smaller spectral difference in the imitations of older infants compared to imitations of the younger group, regardless of infant and adult native language. This may be explained by the increasing stability and more speech-like quality of infants' vocalizations as they grow older (and thus have been exposed to their native language for a longer period of time), making their utterances easier for adults to imitate.

  • 11.
    Sjons, Johan
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Hörberg, Thomas
    Stockholm University, Faculty of Humanities, Department of Linguistics, General Linguistics.
    Östling, Robert
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Bjerva, Johannes
    Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for2017In: / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, p. 1794-1798Conference paper (Refereed)
    Abstract [en]

    In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age of the child, even when utterance length and differences in articulation rate between subjects are controlled for. In this paper we show on utterance level in spontaneous Swedish speech that i) for the youngest children, articulation rate in CDS is lower than in adult-directed speech (ADS), ii) there is a significant negative correlation between articulation rate and surprisal (the negative log probability) in ADS, and iii) the increase in articulation rate in Swedish CDS as a function of the age of the child holds, even when surprisal along with utterance length and differences in articulation rate between speakers are controlled for. These results indicate that adults adjust their articulation rate to make it fit the linguistic capacity of the child.

    Download full text (pdf)
    fulltext
  • 12.
    Östling, Robert
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Bjerva, Johannes
    SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological inflection with attentional sequence-to-sequence models2017In: Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection / [ed] Mans Hulden, Vancouver, Canada: Association for Computational Linguistics, 2017, p. 110-113Conference paper (Refereed)
    Abstract [en]

    This paper describes the Stockholm University/University of Groningen (SU-RUG) system for the SIGMORPHON 2017 shared task on morphological inflection. Our system is based on an attentional sequence-to-sequence neural network model using Long Short-Term Memory (LSTM) cells, with joint training of morphological inflection and the inverse transformation, i.e. lemmatization and morphological analysis. Our system outperforms the baseline with a large margin, and our submission ranks as the 4th best team for the track we participate in (task 1, high resource).

    Download full text (pdf)
    fulltext
1 - 12 of 12
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf