As empirical methods have come to the fore in language technology and translation studies, the processing of parallel texts and parallel corpora have become a major issue. In this article we review the state of the art in alignment and data extraction tec
An approach to integrated multilingual document production is proposed. The basic idea of this approach is to use the analyzer of a modular, transferbased machine translation system as the core of a language checker. The checker generates grammatical structures to be forwarded to the transfer and generation components for the various target languages. A precondition for such an approach is a controlled source language. The source language in focus of this presentation, is ScaniaSwedish, to be defined via a standardization of the language presently used by Scania in their truck maintenance documents. Here we concentrate on the identification of the vocabulary of current ScaniaSwedish and present the results that we achieved so far. In parallel with the inventory of the vocabulary, the competence of the language checker is developed.
Thispaper investigates the L2 acquisition of clausal syntax in post-puberty learnersof German and Swedish regarding V2, VP headedness and verb particleconstructions. The learner data are tested against L2 theories according towhich lower structural projections (VP) are acquired before higher functionalprojections (IP, CP), VP syntax is unproblematic (invulnerable), but where grammatical operations related to thetopmost level of syntactic structure (CP) are acquired late (e.g. Platzack’s(2001) vulnerable C-domain). It willbe shown that such theories do not hold water: native speakers of Swedishlearning German and native speakers of German learning Swedish both master V2from early on. At the same time, these learners exhibit a nontargetlike syntaxat lower structural levels: residual VO in the case of the Swedish-L1 learnersof German, and persistent nontarget transitive verb particle constructions inthe German-L1 learners of Swedish. I argue that these findings are bestexplained by assuming full transfer of L1 syntax (e.g. Schwartz & Sprouse 1996).
This article investigates the information structure of verb-second (V2) declaratives in Swedish, German, and nonnative German. Even though almost any type of element can occur in the so-called prefield, the clause-initial preverbal position of V2 declaratives, we have found language-specific patterns in native-speaker corpora: The frequencies of prefield constituent types differ substantially between German and Swedish, and Swedish postpones new (rhematic) information and instead fills the prefield with given (thematic) elements and elements of no or low informational value (e.g., expletives) to a far greater extent than German. We compare Swedish learners of German to native controls matched for age and Genre (Bohnacker 2005, 2006; Rosén 2006). These learners master the syntactic properties of V2 but start their sentences in nonnative ways. They overapply the Swedish principle of rheme later in their second language German, indicating first language (L1) transfer at the interace of syntax and information structure, especially for structures that are frequent in the L1.
String similarity metrics are important tools in computational linguistics, extensively used e.g. for comparing words in a variety of problem domains. This paper examines the sometimes made assumption that the performance of such word comparison metho
We describe work in progress on a corpus-based tutoring system for education in traditional and formal grammar. It is mainly intended for language and speech technology students and gives them the opportunity to learn grammar and grammatical analysis
This article introduces Token Dependency Semantics (TDS), a surface‐oriented and token‐based framework for compositional truth‐conditional semantics. It is motivated by Davidson's ‘paratactic’ analysis of semantic intensionality (‘On Saying That’, 1968, Synthèse 19: 130–146), which has been much discussed in philosophy. This is the first fully‐fledged formal implementation of Davidson's proposal. Operator‐argument structure and scope are captured by means of relations among tokens. Intensional constituent tokens represent ‘propositional’ contents directly. They serve as arguments to the words introducing intensional contexts, rather than being ‘ordinary’ constituents. The treatment of de re readings involves the use of functions (‘anchors’) assigning entities to argument positions of lexical tokens. Quantifiers are thereby allowed to bind argument places on content tokens. This gives us a simple underspecification‐based account of scope ambiguity. The TDS framework is applied to indirect speech reports, mental attitude sentences, control verbs, and modal and agent‐relative sentence adverbs in English. This semantics is compatible with a traditional view of syntax. Here, it is integrated into a Head‐driven Phrase Structure Grammar (HPSG). The result is a straightforward and ontologically parsimonious analysis of truth‐conditional meaning and semantic intensionality.