Using Attention-based Sequence-to-Sequence Neural Networks for Transcription of Historical Cipher Documents
2020 (English) Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE credits
Student thesis
Abstract [en]
Encrypted historical manuscripts (also called ciphers), containing encoded information, provides a useful resource for giving new insight into our history. Transcribing these manuscripts from image format to computer readable format is a necessary step for decrypting them. In this thesis project, we explore automatic approaches of Hand Written Text Recognition (HTR) for cipher image transcription line by line.In this thesis project, We applied an attention-based Sequence-to-Sequence (Seq2Seq) model for the automatic transcription of ciphers with three different writing systems. We tested/developed algorithms for the recognition of cipher symbols, and their location. To evaluate our method on different levels, the model is trained and tested on ciphers with various symbol sets, from digits to graphical signs. To find out the useful approaches for improving the transcription performance, we conducted ablation study regarding attention mechanism and other deep learning tricks. The results show an accuracy lower than 50% and indicate a big room for improvements and plenty of future work.
Place, publisher, year, edition, pages 2020. , p. 38
National Category
Natural Language Processing
Identifiers URN: urn:nbn:se:uu:diva-420322 OAI: oai:DiVA.org:uu-420322 DiVA, id: diva2:1470446
Educational program Master Programme in Language Technology
Presentation
2020-09-07, Online, Engelska parken Thunbergsvägen 3H, Uppsala, 15:32 (English)
Supervisors
Examiners
2020-09-292020-09-242025-02-07 Bibliographically approved