Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Attention-based Sequence-to-Sequence Neural Networks for Transcription of Historical Cipher Documents
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
Abstract [en]

Encrypted historical manuscripts (also called ciphers), containing encoded information, provides a useful resource for giving new insight into our history. Transcribing these manuscripts from image format to computer readable format is a necessary step for decrypting them. In this thesis project, we explore automatic approaches of Hand Written Text Recognition (HTR) for cipher image transcription line by line.In this thesis project, We applied an attention-based Sequence-to-Sequence (Seq2Seq) model for the automatic transcription of ciphers with three different writing systems. We tested/developed algorithms for the recognition of cipher symbols, and their location. To evaluate our method on different levels, the model is trained and tested on ciphers with various symbol sets, from digits to graphical signs. To find out the useful approaches for improving the transcription performance, we conducted ablation study regarding attention mechanism and other deep learning tricks. The results show an accuracy lower than 50% and indicate a big room for improvements and plenty of future work.

Place, publisher, year, edition, pages
2020. , p. 38
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-420322OAI: oai:DiVA.org:uu-420322DiVA, id: diva2:1470446
Educational program
Master Programme in Language Technology
Presentation
2020-09-07, Online, Engelska parken Thunbergsvägen 3H, Uppsala, 15:32 (English)
Supervisors
Examiners
Available from: 2020-09-29 Created: 2020-09-24 Last updated: 2020-09-29Bibliographically approved

Open Access in DiVA

fulltext(1485 kB)316 downloads
File information
File name FULLTEXT01.pdfFile size 1485 kBChecksum SHA-512
3db47a55b0cdd825fb0d4617bbbecf7a049908160cf9c3a072e404a1777c892971499d6504ea27b508673e280f356c3d7cb4bee77e51eba1c94579ecbe6001da
Type fulltextMimetype application/pdf

By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 317 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 359 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf