Change search
Refine search result
1 - 16 of 16
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. AAl Abdulsalam, Abdulrahman
    et al.
    Velupillai, Sumithra
    Meystre, Stephane
    UtahBMI at SemEval-2016 Task 12: Extracting Temporal Information from Clinical Text2016In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics , 2016, p. 1256-1262Conference paper (Refereed)
    Abstract [en]

    The 2016 Clinical TempEval continued the 2015 shared task on temporal information extraction with a new evaluation test set. Our team, UtahBMI, participated in all subtasks using machine learning approaches with ClearTK (LIBLINEAR), CRF++ and CRFsuite packages. Our experiments show that CRF-based classifiers yield, in general, higher recall for multi-word spans, while SVM-based classifiers are better at predicting correct attributes of TIMEX3. In addition, we show that an ensemble-based approach for TIMEX3 could yield improved results. Our team achieved competitive results in each subtask with an F1 75.4% for TIMEX3, F1 89.2% for EVENT, F1 84.4% for event relations with document time (DocTimeRel), and F1 51.1% for narrative container (CONTAINS) relations.

  • 2.
    Dalianis, Hercules
    et al.
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Nilsson, Gunnar
    Velupillai, Sumithra
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Is de-identification of electronic health records possible?: Or can we use health record corpora for research?2009In: Virtual healthcare interaction: Papers from AAAI fall symposium ; [November 5 - 7, 2009, at the Westin Arlington Gateway in Arlington, Virginia USA], AAAI Press, 2009, p. 2-3Conference paper (Refereed)
  • 3. Gkotsis, George
    et al.
    Oellrich, Anika
    Hubbard, Tim
    Dobson, Richard
    Liakata, Maria
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Dutta, Rina
    The language of mental health problems in social media2016In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, Association for Computational Linguistics , 2016, p. 63-73Conference paper (Refereed)
    Abstract [en]

    Online social media, such as Reddit, has become an important resource to share personal experiences and communicate with others. Among other personal information, some social media users communicate about mental health problems they are experiencing, with the intention of getting advice, support or empathy from other users. Here, we investigate the language of Reddit posts specific to mental health, to define linguistic characteristics that could be helpful for further applications. The latter include attempting to identify posts that need urgent attention due to their nature, e.g. when someone announces their intentions of ending their life by suicide or harming others. Our results show that there are a variety of linguistic features that are discriminative across mental health user communities and that can be further exploited in subsequent classification tasks. Furthermore, while negative sentiment is almost uniformly expressed across the entire data set, we demonstrate that there are also condition-specific vocabularies used in social media to communicate about particular disorders. Source code and related materials are available from: https: //github.com/gkotsis/ reddit-mental-health.

  • 4. Gkotsis, George
    et al.
    Oellrich, Anika
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Liakata, Maria
    Hubbard, Tim J. P.
    Dobson, Richard J. B.
    Dutta, Rina
    Characterisation of mental health conditions in social media using Informed Deep Learning2017In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 7Article in journal (Refereed)
    Abstract [en]

    The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well- being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.

  • 5. Gkotsis, George
    et al.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Oellrich, Anika
    Dean, Harry
    Liakata, Maria
    Dutta, Rina
    Don’t Let Notes Be Misunderstood: A Negation Detection Method for Assessing Risk of Suicide in Mental Health Records2016In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, Association for Computational Linguistics , 2016, p. 95-105Conference paper (Refereed)
    Abstract [en]

    Mental Health Records (MHRs) contain freetext documentation about patients’ suicide and suicidality. In this paper, we address the problem of determining whether grammatic variants (inflections) of the word “suicide” are af- firmed or negated. To achieve this, we populate and annotate a dataset with over 6,000 sentences originating from a large repository of MHRs. The resulting dataset has high InterAnnotator Agreement (κ 0.93). Furthermore, we develop and propose a negation detection method that leverages syntactic features of text1 . Using parse trees, we build a set of basic rules that rely on minimum domain knowledge and render the problem as binary classification (affirmed vs. negated). Since the overall goal is to identify patients who are expected to be at high risk of suicide, we focus on the evaluation of positive (affirmed) cases as determined by our classifier. Our negation detection approach yields a recall (sensitivity) value of 94.6% for the positive cases and an overall accuracy value of 91.9%. We believe that our approach can be integrated with other clinical Natural Language Processing tools in order to further advance information extraction capabilities.

  • 6.
    Grigonyte, Gintare
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Wirén, Mats
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Henriksson, Aron
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Swedification patterns of Latin and Greek affixes in clinical text2016In: Nordic Journal of Linguistics, ISSN 0332-5865, E-ISSN 1502-4717, Vol. 39, no 1, p. 5-37Article in journal (Refereed)
    Abstract [en]

    Swedish medical language is rich with Latin and Greek terminology which has undergone a Swedification since the 1980s. However, many original expressions are still used by clinical professionals. The goal of this study is to obtain precise quantitative measures of how the foreign terminology is manifested in Swedish clinical text. To this end, we explore the use of Latin and Greek affixes in Swedish medical texts in three genres: clinical text, scientific medical text and online medical information for laypersons. More specifically, we use frequency lists derived from tokenised Swedish medical corpora in the three domains, and extract word pairs belonging to types that display both the original and Swedified spellings. We describe six distinct patterns explaining the variation in the usage of Latin and Greek affixes in clinical text. The results show that to a large extent affixes in clinical text are Swedified and that prefixes are used more conservatively than suffixes.

  • 7.
    Grigonyté, Gintaré
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institutet, Sweden.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Wirén, Mats
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results2014In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), Stroudsburg, USA: Association for Computational Linguistics, 2014, p. 74-83Conference paper (Refereed)
    Abstract [en]

    This paper describes part of an ongoing effort to improve the readability of Swedish electronic health records (EHRs). An EHR contains systematic documentation of a single patient’s medical history across time, entered by healthcare professionals with the purpose of enabling safe and informed care. Linguistically, medical records exemplify a highly specialised domain, which can be superficially characterised as having telegraphic sentences involving displaced or missing words, abundant abbreviations, spelling variations including misspellings, and terminology. We report results on lexical simplification of Swedish EHRs, by which we mean detecting the unknown, out-ofdictionary words and trying to resolve them either as compounded known words, abbreviations or misspellings.

  • 8.
    Grigonyté, Gintaré
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Kvist, Maria
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institute, Sweden.
    Velupillai, Sumithra
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Wirén, Mats
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Spelling Variation of Latin and Greek words in Swedish Medical Text2014Conference paper (Refereed)
  • 9. Ive, J.
    et al.
    Viani, N.
    Chandran, D.
    Bittar, A.
    Velupillai, Sumithra
    KTH, School of Electrical Engineering and Computer Science (EECS), Theoretical Computer Science, TCS. King's College London, IoPPN, London, SE5 8AF, United Kingdom.
    KCL-Health-NLP@CLEF eHealth 2018 Task 1: ICD-10 coding of French and Italian death certificates with character-level convolutional neural networks2018In: CEUR Workshop Proceedings, CEUR-WS , 2018, Vol. 2125Conference paper (Refereed)
    Abstract [en]

    In this paper we describe the participation of the KCL-Health-NLP team in the CLEF eHealth 2018 lab, specifically Task 1: Multilingual Information Extraction-ICD10 coding. The task involves the automatic coding of causes of death in death certificates in French, Italian and Hungarian according to the ICD-10 taxonomy. Choosing to work on the two Romance languages, we treated the task as a sequence-to-sequence prediction problem. Our system has an encoder-decoder architecture, with convolutional neural networks based on character em-beddings as encoders and recurrent neural network decoders. Our hypothesis was that a character-level representation would allow our model to generalise across two genealogically related languages. Results obtained by pre-training our Italian model on the French data set confirmed this intuition. We also explored the impact of character-level features extracted from dictionary-matched ICD codes. We obtained F-measures of 0.72/0.64 and 0.78 on the French aligned/raw and Italian raw internal test data, respectively. On the blind test set released by the task organisers, our top results were 0.65/0.52 and 0.69 F-measure, respectively.

  • 10. Kalyanam, Janani
    et al.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA. KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Conway, Mike
    Lanckriet, Gert
    From Event Detection to Storytelling on Microblogs2016In: PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, IEEE, 2016, p. 437-442Conference paper (Refereed)
    Abstract [en]

    The problem of detecting events from content published on microblogs has garnered much interest in recent times. In this paper, we address the questions of what happens after the outbreak of an event in terms of how the event gradually progresses and attains each of its milestones, and how it eventually dissipates. We propose a model based approach to capture the gradual unfolding of an event over time. This enables the model to automatically produce entire timeline trajectories of events from the time of their outbreak to their disappearance. We apply our model on the Twitter messages collected about Ebola during the 2014 outbreak and obtain the progression timelines of several events that occurred during the outbreak. We also compare our model to several existing topic modeling and event detection baselines in literature to demonstrate its efficiency.

  • 11.
    Neveol, Aurelie
    et al.
    Univ Paris Saclay, CNRS, LIMSI, Rue John von Neumann, F-91405 Orsay, France..
    Dalianis, Hercules
    Stockholm Univ, DSV, Kista, Sweden..
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC). Kings Coll London, Inst Psychiat Psychol & Neurosci, London, England..
    Savova, Guergana
    Childrens Hosp Boston, Boston, MA USA.;Harvard Med Sch, Boston, MA USA..
    Zweigenbaum, Pierre
    Univ Paris Saclay, CNRS, LIMSI, Rue John von Neumann, F-91405 Orsay, France..
    Clinical Natural Language Processing in languages other than English: opportunities and challenges2018In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 9, article id 12Article, review/survey (Refereed)
    Abstract [en]

    Background: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main Body: We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion: We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

  • 12.
    Rosell, Magnus
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Revealing Relations between Open and Closed Answers in Questionnaires through Text Clustering Evaluation2008In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), 2008, p. 1-7Conference paper (Refereed)
    Abstract [en]

    Open answers in questionnaires contain valuable information that is very time-consuming to analyze manually. We present a method forhypothesis generation from questionnaires based on text clustering. Text clustering is used interactively on the open answers, and the usercan explore the cluster contents. The exploration is guided by automatic evaluation of the clusters against a closed answer regarded as acategorization. This simplifies the process of selecting interesting clusters. The user formulates a hypothesis from the relation betweenthe cluster content and the closed answer categorization. We have applied our method on an open answer regarding occupation comparedto a closed answer on smoking habits. With no prior knowledge of smoking habits in different occupation groups we have generated thehypothesis that farmers smoke less than the average. The hypothesis is supported by several separate surveys. Closed answers are easyto analyze automatically but are restricted and may miss valuable aspects. Open answers, on the other hand, fully capture the dynamicsand diversity of possible outcomes. With our method the process of analyzing open answers becomes feasible.

  • 13.
    Rosell, Magnus
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Velupillai, Sumithra
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    The Impact of Phrases in Document Clustering for Swedish2005In: Proceedings of the 15th NODALIDA conference, Joensuu 2005 / [ed] Werner, S., 2005, p. 173-179Conference paper (Refereed)
    Abstract [en]

    We have investigated the impact of using phrases in the vector spacemodel for clustering documents in Swedish in different ways. The investigation is carried out on two textsets from different domains: one set of newspaper articles and one set of medical papers.The use of phrases do not improveresults relative the ordinary use ofwords. The results differ significantly between the text types. Thisindicates that one could benefit from different text representations for different domains although a fundamentally different approach probably would be needed.

  • 14. Samuelsson, Y.
    et al.
    Täckström, O.
    Velupillai, Sumithra
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Eklund, J.
    Fišel, M.
    Saers, M.
    Mixing and blending syntactic and semantic dependencies2008In: CoNLL - Proc. Twelfth Conf. Comput. Nat. Lang. Learn., 2008, p. 248-252Conference paper (Refereed)
    Abstract [en]

    Our system for the CoNLL 2008 shared task uses a set of individual parsers, a set of stand-alone semantic role labellers, and a joint system for parsing and semantic role labelling, all blended together. The system achieved a macro averaged labelled F 1- score of 79.79 (WSJ 80.92, Brown 70.49) for the overall task. The labelled attachment score for syntactic dependencies was 86.63 (WSJ 87.36, Brown 80.77) and the labelled F 1-score for semantic dependencies was 72.94 (WSJ 74.47, Brown 60.18).

  • 15. Velupillai, Sumithra
    et al.
    Dalianis, Hercules
    Hassel, Martin
    Nilsson, Gunnar H.
    Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial2009In: International Journal of Medical Informatics, ISSN 1386-5056, E-ISSN 1872-8243, Vol. 78, no 12, p. E19-E26Article in journal (Refereed)
    Abstract [en]

    Background: Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish. Methods: This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic deidentification trial. Results are measured with precision, recall and F-measure. Results: This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results. Conclusion: Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.

  • 16.
    Velupillai, Sumithra
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Mowery, Danielle
    Conway, Mike
    Hurdle, John
    Kious, Brent
    Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notest2016In: Proceedings of BioNLP 2016, Association for Computational Linguistics , 2016, p. 92-101Conference paper (Refereed)
    Abstract [en]

    Extracting information from mental health records can be useful for large-scale clinical studies (e.g., to predict medication adherence or to understand medication effects) in this clinical specialty largely underserved by the Natural Language Processing (NLP) community. Vocabularies that contain medical terms for specific clinical use-cases, such as signs, symptoms, histories, social risk factors, are valuable resources for the development of NLP systems that aid clinicians in extracting information from text. Substance abuse is an important variable for many clinical use-cases, but, to our knowledge, there are no publicly available vocabularies that cover these types of terms. In this study, we apply and combine three methods for generating vocabularies related to substance abuse. We propose a simple and systematic method to generate highly relevant vocabularies and evaluate these vocabularies with respect to size and content, as well as coverage and relevance when applied to authentic psychiatric notes.

1 - 16 of 16
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf