Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The effect of noise in the training of convolutional neural networks for text summarisation
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this thesis, we work towards bridging the gap between two distinct areas: noisy text handling and text summarisation. The overall goal of the paper is to examine the effects of noise in the training of convolutional neural networks for text summarisation, with a view to understanding how to effectively create a noise-robust text-summarisation system. We look specifically at the problem of abstractive text summarisation of noisy data in the context of summarising error-containing documents from automatic speech recognition (ASR) output. We experiment with adding varying levels of noise (errors) to the 4 million-article Gigaword corpus and training an encoder-decoder CNN on it with the aim of producing a noise-robust text summarisation system. A total of six text summarisation models are trained, each with a different level of noise. We discover that the models with a high level of noise are indeed able to aptly summarise noisy data into clean summaries, despite a tendency for all models to overfit to the level of noise on which they were trained. Directions are given for future steps in order to create an even more noise-robust and flexible text summarisation system.

Place, publisher, year, edition, pages
2019.
Keywords [en]
text summarisation, text summarization, summarization, nlp, computational linguistics, cnn, neural networks, machine learning
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-384607OAI: oai:DiVA.org:uu-384607DiVA, id: diva2:1321076
External cooperation
Inovia AI
Subject / course
Language Technology
Educational program
Master Programme in Language Technology
Presentation
(English)
Supervisors
Examiners
Available from: 2019-06-12 Created: 2019-06-06 Last updated: 2019-06-12Bibliographically approved

Open Access in DiVA

fulltext(2258 kB)56 downloads
File information
File name FULLTEXT01.pdfFile size 2258 kBChecksum SHA-512
b1c8421dd9c78ef8ed8e517291ad13cf6901455322ba6bb23381da04c46ac4f804c9e6db754e337a7c28bb9c80e109ba10a8578ade07305e4333eaf7c2df3eb4
Type fulltextMimetype application/pdf

By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 56 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 340 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf