The viability of using rule-based systems for part-of-speech tagging was revitalised when a simple rule-based tagger was presented by Brill (1992). This tagger is based on an algorithm which automatically derives transformation rules from a corpus, using an error-driven approach. In addition to performing on par with state of the art stochastic systems for part-of-speech tagging, it has the advantage that the automatically derived rules can be presented in a human-readable format.
In spite of its strengths, the Brill tagger is quite language dependent, and performs much better on languages similar to English than on languages with richer morphology. This issue is addressed in this paper through defining rule templates automatically with a search that is optimised using Genetic Algorithms. This allows the Brill GA-tagger to search a large search space for templates which in turn generate rules which are appropriate for various target languages, which has the added advantage of removing the need for researchers to define rule templates manually.
The Brill GA-tagger performs significantly better (p<0.001) than the standard Brill tagger on all 9 target languages (Chinese, Japanese, Turkish, Slovene, Portuguese, English, Dutch, Swedish and Icelandic), with an error rate reduction of between 2% -- 15% for each language.
Within the field of computational linguistics, recent research has made successful advances in integrating word space models with n-gram models. This is of particular interest when a model that encapsulates both semantic and syntactic information is desirable. A potential application for this can be found in the field of psycholinguistics, where the neural response N400 has been found to occur in contexts with semantic incongruities. Previous research has found correlations between cloze probabilities and N400, while more recent research has found correlations between cloze probabilities and language models.
This essay attempts to uncover whether or not a more direct connection between integrated models and N400 can be found, hypothesizing that low probabilities elicit strong N400 responses and vice versa. In an EEG experiment, participants read a text manipulated using a language model, and a text left unchanged. Analysis of the results shows that the manipulations to some extent yielded results supporting the hypothesis. Further results are found when analysing responses to the unchanged text. However, no significant correlations between N400 and the computational model are found. Future research should improve the experimental paradigm, so that a larger scale EEG recording can be used to construct a large EEG corpus.
Computational linguistic approaches to sign languages could benefit from investigating how complexity influences structure. We investigate whether morphological complexity has an effect on the order of Verb (V) and Object (O) in Swedish Sign Language (SSL), on the basis of elicited data from five Deaf signers. We find a significant difference in the distribution of the orderings OV vs. VO, based on an analysis of morphological weight. While morphologically heavy verbs exhibit a general preference for OV, humanness seems to affect the ordering in the opposite direction, with [+human] Objects pushing towards a preference for VO.
We present the RUG-SU team's submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reaching an F1 score of 0.8323. Although our system is not the highest ranking one, we do outperform the baseline by far.
Infant language acquisition research faces the challenge of dealing with subjects who are unable to provide spoken answers to research questions. To obtain comprehensible data from such subjects eye tracking is a suitable research tool, as the infants’ gaze can be interpreted as behavioural responses. The purpose of the current study was to investigate the amount of training necessary for participants to learn an audio-visual contingency and present anticipatory looking behaviour in response to an auditory stimulus. Infants (n=22) and adults (n=16) were presented with training sequences, every fourth of which was followed by a test sequence. Training sequences contained implicit audio-visual contingencies consisting of a syllable (/da/ or /ga/) followed by an image appearing on the left/right side of the screen. Test sequences were identical to training sequences except that no image appeared. The latency in time to first fixation towards the non-target area during test sequences was used as a measurement of whether the participants had grasped the contingency. Infants were found to present anticipatory looking behaviour after 24 training trials. Adults were found to present anticipatory looking behaviour after 28-36 training trials. In future research a more interactive experiment design will be employed in order to individualise the amount of training, which will increase the time span available for testing.
Speech perception is highly context-dependent. Sounds preceding speech stimuli affect how listeners categorise the stimuli, regardless of whether the context consists of speech or non-speech. This effect is acoustically contrastive; a preceding context with high-frequency acoustic energy tends to skew categorisation towards speech sounds possessing lower-frequency acoustic energy and vice versa (Mann, 1980; Holt, Lotto, Kluender, 2000; Holt, 2005). Partially replicating Holt's study from 2005, the present study investigates the effect of non-linguistic contexts in different frequency bands on speech categorisation. Adult participants (n=15) were exposed to Swedish syllables from a speech continuum ranging from /da/ to /ga/ varying in the onset frequencies of the second and third formants in equal steps. Contexts preceding the speech stimuli consisted of sequences of sine tones distributed in different frequency bands: high, mid and low. Participants were asked to categorise the syllables as /da/ or /ga/. As hypothesised, high frequency contexts shift the category boundary towards /da/, while lower frequency contexts shift the boundary towards /ga/, compared to the mid frequency context.
Assessing the semantic similarity between sentences in different languages is challenging. We approach this problem by leveraging multilingual distributional word representations, where similar words in different languages are close to each other. The availability of parallel data allows us to train such representations on a large amount of languages. This allows us to leverage semantic similarity data for languages for which no such data exists. We train and evaluate on five language pairs, including English, Spanish, and Arabic. We are able to train wellperforming systems for several language pairs, without any labelled data for that language pair.
A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just as it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, whereas genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.
Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to better performance, but there is no consensus on what parameters to share. We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies. Based on this result, we propose an architecture where the transition classifier is shared, and the sharing of word and character parameters is controlled by a parameter that can be tuned on validation data. This model is linguistically motivated and obtains significant improvements over a mono-lingually trained baseline. We also find that sharing transition classifier parameters helps when training a parser on unrelated language pairs, but we find that, in the case of unrelated languages, sharing too many parameters does not help.
The present study investigates adult imitations of infant vocalizations in a cross-linguistic perspective. Japanese-learning and Swedish-learning infants were recorded at ages 16-21 and 78-79 weeks. Vowel-like utterances (n=210) were selected from the recordings and presented to Japanese (n=3) and Swedish (n=3) adults. The adults were asked to imitate what they heard, simulating a spontaneous feedback situation between caregiver and infant. Formant data (F1 and F2) was extracted from all utterances and validated by comparing original and formant re-synthesized utterances. The data was normalized for fundamental frequency and time, and the accumulated spectral difference was calculated between each infant utterance and each imitation of that utterance. The mean spectral difference was calculated and compared, grouped by native language of infant and adult, as well as age of the infant. Preliminary results show smaller spectral difference in the imitations of older infants compared to imitations of the younger group, regardless of infant and adult native language. This may be explained by the increasing stability and more speech-like quality of infants' vocalizations as they grow older (and thus have been exposed to their native language for a longer period of time), making their utterances easier for adults to imitate.
In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age of the child, even when utterance length and differences in articulation rate between subjects are controlled for. In this paper we show on utterance level in spontaneous Swedish speech that i) for the youngest children, articulation rate in CDS is lower than in adult-directed speech (ADS), ii) there is a significant negative correlation between articulation rate and surprisal (the negative log probability) in ADS, and iii) the increase in articulation rate in Swedish CDS as a function of the age of the child holds, even when surprisal along with utterance length and differences in articulation rate between speakers are controlled for. These results indicate that adults adjust their articulation rate to make it fit the linguistic capacity of the child.
This paper describes the Stockholm University/University of Groningen (SU-RUG) system for the SIGMORPHON 2017 shared task on morphological inflection. Our system is based on an attentional sequence-to-sequence neural network model using Long Short-Term Memory (LSTM) cells, with joint training of morphological inflection and the inverse transformation, i.e. lemmatization and morphological analysis. Our system outperforms the baseline with a large margin, and our submission ranks as the 4th best team for the track we participate in (task 1, high resource).