Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Technical Term Extraction Using Measures of Neology
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Facktermsdetektering medelst neologiska kriteria (Swedish)
Abstract [en]

This study aims to show that frequency of occurrence over time for technical terms differs from general language terms in the sense that technical terms are strongly biased to be recent occurrences, and that this difference can be exploited for the automatic identification and extraction of technical terms from text. To this end, we propose two features extracted from temporally labelled datasets designed to capture surface level n-gram neology. The analysis shows that these features, calculated over consecutive bigrams, are highly indicative of technical terms, which suggests that technical terms are strongly biased to be surface level neologisms. Finally, we implement a technical term extractor using the proposed features and compare its performance against a number of baselines.

Abstract [sv]

Detta arbete ämnar visa att den tidsberoende frekvensen för facktermer skiljer sig från motsvarande frekvens för termer i vardagligt språk, i det avseendet att facktermer med hög sannolikhet är lingvistiska nybildningar, samt att denna iaktagelse kan nyttjas i syfte att automatiskt identifiera och extrahera facktermer i löptext. I detta syfte introducerar vi två särdrag extraherade från kronologiskt annoterade datamängder avsedda att fånga nybildningar av förekommande n-gram. Analysen visar att dessa särdrag, beräknade över konsekutiva bigram, är starkt indikativa för facktermer, vilket antyder att facktermer har en starkt tendens att vara nybildningar. Slutligtvis implementerar vi en facktermsextraktor baserad på dessa särdrag och jämför dess prestanda med ett antal referenssärdrag.

Place, publisher, year, edition, pages
2016.
Keyword [en]
technical term extraction, keyphrase extraction, terminology
Keyword [sv]
facktermsdetektering, nyckelordsextraktion, terminologi
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-184186OAI: oai:DiVA.org:kth-184186DiVA: diva2:915512
External cooperation
University of Tokyo
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2016-04-06 Created: 2016-03-30 Last updated: 2016-04-06Bibliographically approved

Open Access in DiVA

fulltext(1226 kB)90 downloads
File information
File name FULLTEXT01.pdfFile size 1226 kBChecksum SHA-512
f1bebf0c470e51178d36d36366cdb8c4752cc352da0fca9a08cbc9d1cbb03d0d6c502dbd0e4780e78028da6278cb54884905d86974cc13f89764eff42286c9da
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 90 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 828 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf