Digitala Vetenskapliga Arkivet

System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Attention-based Sequence-to-Sequence Neural Networks for Transcription of Historical Cipher Documents
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
Abstract [en]

Encrypted historical manuscripts (also called ciphers), containing encoded information, provides a useful resource for giving new insight into our history. Transcribing these manuscripts from image format to computer readable format is a necessary step for decrypting them. In this thesis project, we explore automatic approaches of Hand Written Text Recognition (HTR) for cipher image transcription line by line.In this thesis project, We applied an attention-based Sequence-to-Sequence (Seq2Seq) model for the automatic transcription of ciphers with three different writing systems. We tested/developed algorithms for the recognition of cipher symbols, and their location. To evaluate our method on different levels, the model is trained and tested on ciphers with various symbol sets, from digits to graphical signs. To find out the useful approaches for improving the transcription performance, we conducted ablation study regarding attention mechanism and other deep learning tricks. The results show an accuracy lower than 50% and indicate a big room for improvements and plenty of future work.

Place, publisher, year, edition, pages
2020. , p. 38
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:uu:diva-420322OAI: oai:DiVA.org:uu-420322DiVA, id: diva2:1470446
Educational program
Master Programme in Language Technology
Presentation
2020-09-07, Online, Engelska parken Thunbergsvägen 3H, Uppsala, 15:32 (English)
Supervisors
Examiners
Available from: 2020-09-29 Created: 2020-09-24 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(1485 kB)409 downloads
File information
File name FULLTEXT01.pdfFile size 1485 kBChecksum SHA-512
3db47a55b0cdd825fb0d4617bbbecf7a049908160cf9c3a072e404a1777c892971499d6504ea27b508673e280f356c3d7cb4bee77e51eba1c94579ecbe6001da
Type fulltextMimetype application/pdf

By organisation
Department of Linguistics and Philology
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 410 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 458 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf