Evaluating Transcription of Ciphers with Few-Shot Learning
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Ciphers are encrypted documents created to hide their content from those who were not the receivers of the message. Different types of symbols, such as zodiac signs, alchemical symbols, alphabet letters or digits are exploited to compose the encrypted text which needs to be decrypted to gain access to the content of the documents. The first step before decryption is the transcription of the cipher. The purpose of this thesis is to evaluate an automatic transcription tool from image to a text format to provide a transcription of the cipher images. We implement a supervised few-shot deep-learning model which is tested on different types of encrypted documents and use various evaluation metrics to assess the results. We show that the few-shot model presents promising results on seen data with Symbol Error Rates (SER) ranging from 8.21% to 47.55% and accuracy scores from 80.13% to 90.27%, whereas SER in out-of-domain datasets reaches 79.91%. While a wide range of symbols are correctly transcribed, the erroneous symbols mainly contain diacritics or are punctuation marks.
Place, publisher, year, edition, pages
2022. , p. 65
Keywords [en]
Ciphers, Automatic Transcription, Decrypt project, Few-shot learning
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-477452OAI: oai:DiVA.org:uu-477452DiVA, id: diva2:1671276
Educational program
Master Programme in Language Technology
Presentation
2022-06-02, Uppsala, 13:15 (English)
Supervisors
Examiners
2022-06-172022-06-172022-06-17Bibliographically approved