Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Learning based Word Search and Visualisation for Historical Manuscript Images
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för visuell information och interaktion. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Bildanalys och människa-datorinteraktion.ORCID-id: 0000-0002-6783-1744
2019 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Today, work with historical manuscripts is nearly exclusively done manually, by researchers in the humanities as well as laypeople mapping out their personal genealogy. This is a highly time consuming endeavour as it is not uncommon to spend months with the same volume of a few hundred pages. The last few decades have seen an ongoing effort to digitise manuscripts, both preservation purposes and to increase accessibility. This has the added effect of enabling the use methods and algorithms from Image Analysis and Machine Learning that have great potential in both making existing work more efficient and creating new methodologies for manuscript-based research.

The first part of this thesis focuses on Word Spotting, the task of searching for a given text query in a manuscript collection. This can be broken down into two tasks, detecting where the words are located on the page, and then ranking the words according to their similarity to a search query. We propose Deep Learning models to do both, separately and then simultaneously, and successfully search through a large manuscript collection consisting of over a hundred thousand pages.

A limiting factor in applying learning-based methods to historical manuscript images is the cost, and therefore, lack of annotated data needed to train machine learning models. We propose several ways to mitigate this problem, including generating synthetic data, augmenting existing data to get better value from it, and learning from pre-existing, partially annotated data that was previously unusable.

In the second part, a method for visualising manuscript collections called the Image-based Word Cloud is proposed. Much like it text-based counterpart, it arranges the most representative words in a collection into a cloud, where the size of the words are proportional to their frequency of occurrence. This grants a user a single image overview of a manuscript collection, regardless of its size. We further propose a way to estimate a manuscripts production date. This can grant historians context that is crucial for correctly interpreting the contents of a manuscript.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2019. , s. 82
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1798
Nyckelord [en]
Word Spotting, Convolutional Neural Networks, Deep Learning, Region Proposals, Historical Manuscripts, Computer Vision, Image Analysis, Visualisation, Document Analysis
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
URN: urn:nbn:se:uu:diva-381308ISBN: 978-91-513-0633-9 (tryckt)OAI: oai:DiVA.org:uu-381308DiVA, id: diva2:1303103
Disputation
2019-06-04, TLS (Tidskriftläsesalen), Carolina Rediviva, Dag Hammarskjölds väg 1, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1Tillgänglig från: 2019-05-13 Skapad: 2019-04-08 Senast uppdaterad: 2019-06-18
Delarbeten
1. Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment
Öppna denna publikation i ny flik eller fönster >>Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Recent work in word spotting in handwritten documents has yielded impressive results. Yet this progress has largely been made by supervised learning systems which are dependant on manually annotated data, making deployment to new collections a significant effort. In this paper we propose an approach utilising transcriptions without bounding box annotations to train segmentation-free word spotting models, given a model partially trained with full annotations. This is done through an alignment procedure based on hidden Markov models. This model can create a tentative mapping between word region proposals and the transcriptions to automatically create additional weakly annotated training data. Using as little as 1% and 10% of the fully annotated training sets for partial convergence, we automatically annotate the remaining training data and successfully train using it. Across all datasets, our approach comes within a few mAP% of achieving the same performance as a model trained with only full ground truth. We believe that this will be a significant advance towards a more general use of word spotting, since digital transcription data will already exist for parts of many collections of interest.

Nyckelord
weakly supervised, segmentation-free word spotting, convolutional neural network, hidden Markov model
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-381304 (URN)
Projekt
q2b
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tillgänglig från: 2019-04-07 Skapad: 2019-04-07 Senast uppdaterad: 2019-04-08
2. Neural Word Search in Historical Manuscript Collections
Öppna denna publikation i ny flik eller fönster >>Neural Word Search in Historical Manuscript Collections
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first propose an end-to-end trainable model based on deep neural networks that we dub Ctrl-F-Net. The model simultaneously generates region proposals and embeds them into a word embedding space, wherein a search is performed. We further introduce a simplified version called Ctrl-F-Mini. It is faster with similar performance, though it is limited to more easily segmented manuscripts. We evaluate both models on common benchmark datasets and surpass the previous state of the art. Finally, in collaboration with historians, we employ the Ctrl-F-Net to search within a large manuscript collection of over 100 thousand pages, written across two centuries. With only 11 training pages, we enable large scale data collection in manuscript-based historical research. This results in a speed up of data collection and the number of manuscripts processed by orders of magnitude. Given the time consuming manual work required to study old manuscripts in the humanities, quick and robust tools for word spotting has the potential to revolutionise domains like history, religion and language.

Nyckelord
Word spotting, Historical Manuscripts, Deep Convolutional Neural Network, Region Proposals
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-381306 (URN)
Projekt
q2b
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tillgänglig från: 2019-04-07 Skapad: 2019-04-07 Senast uppdaterad: 2019-04-08
3. Neural Ctrl-F: Segmentation-free query-by-string word spotting in handwritten manuscript collections
Öppna denna publikation i ny flik eller fönster >>Neural Ctrl-F: Segmentation-free query-by-string word spotting in handwritten manuscript collections
2017 (Engelska)Ingår i: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, s. 4443-4452Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper, we approach the problem of segmentation-free query-by-string word spotting for handwritten documents. In other words, we use methods inspired from computer vision and machine learning to search for words in large collections of digitized manuscripts. In particular, we are interested in historical handwritten texts, which are often far more challenging than modern printed documents. This task is important, as it provides people with a way to quickly find what they are looking for in large collections that are tedious and difficult to read manually. To this end, we introduce an end-to-end trainable model based on deep neural networks that we call Ctrl-F-Net. Given a full manuscript page, the model simultaneously generates region proposals, and embeds these into a distributed word embedding space, where searches are performed. We evaluate the model on common benchmarks for handwritten word spotting, outperforming the previous state-of-the-art segmentation-free approaches by a large margin, and in some cases even segmentation-based approaches. One interesting real-life application of our approach is to help historians to find and count specific words in court records that are related to women's sustenance activities and division of labor. We provide promising preliminary experiments that validate our method on this task.

Ort, förlag, år, upplaga, sidor
IEEE, 2017
Serie
IEEE International Conference on Computer Vision, E-ISSN 1550-5499
Nyckelord
Segmentation-free Word Spotting, Deep Learning, Convolutional Neural Network, Query-by-String
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-335926 (URN)10.1109/ICCV.2017.475 (DOI)000425498404054 ()978-1-5386-1032-9 (ISBN)
Konferens
16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 22-29, 2017
Projekt
q2b
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tillgänglig från: 2017-12-11 Skapad: 2017-12-11 Senast uppdaterad: 2019-04-08Bibliografiskt granskad
4. Visualizing document image collections using image-based word clouds
Öppna denna publikation i ny flik eller fönster >>Visualizing document image collections using image-based word clouds
2015 (Engelska)Ingår i: Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015, Proceedings, Part I / [ed] Bebis, G; Boyle, R; Parvin, B; Koracin, D; Pavlidis, I; Feris, R; McGraw, T; Elendt, M; Kopper, R; Ragan, E; Ye, Z; Weber, G, Springer, 2015, s. 297-306Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Springer, 2015
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9474
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-272193 (URN)10.1007/978-3-319-27857-5_27 (DOI)000376400300027 ()9783319278568 (ISBN)9783319278575 (ISBN)
Konferens
ISVC 2015, December 14–16, Las Vegas, NV
Projekt
q2b
Forskningsfinansiär
Vetenskapsrådet, 2012-5743
Tillgänglig från: 2015-12-18 Skapad: 2016-01-12 Senast uppdaterad: 2019-04-08Bibliografiskt granskad
5. A novel word segmentation method based on object detection and deep learning
Öppna denna publikation i ny flik eller fönster >>A novel word segmentation method based on object detection and deep learning
2015 (Engelska)Ingår i: Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015, Proceedings, Part I / [ed] Bebis, G; Boyle, R; Parvin, B; Koracin, D; Pavlidis, I; Feris, R; McGraw, T; Elendt, M; Kopper, R; Ragan, E; Ye, Z; Weber, G, Springer, 2015, s. 231-240Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The segmentation of individual words is a crucial step in several data mining methods for historical handwritten documents. Examples of applications include visual searching for query words (word spotting) and character-by-character text recognition. In this paper, we present a novel method for word segmentation that is adapted from recent advances in computer vision, deep learning and generic object detection. Our method has unique capabilities and it has found practical use in our current research project. It can easily be trained for different kinds of historical documents, uses full gray scale information, does not require binarization as pre-processing or prior segmentation of individual text lines. We evaluate its performance using established error metrics, previously used in competitions for word segmentation, and demonstrate its usefulness for a 15th century handwritten document.

Ort, förlag, år, upplaga, sidor
Springer, 2015
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9474
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-272181 (URN)10.1007/978-3-319-27857-5_21 (DOI)000376400300021 ()9783319278568 (ISBN)9783319278575 (ISBN)
Konferens
ISVC 2015, December 14–16, Las Vegas, NV
Projekt
q2b
Forskningsfinansiär
Vetenskapsrådet, 2012-5743
Tillgänglig från: 2015-12-18 Skapad: 2016-01-12 Senast uppdaterad: 2019-04-08Bibliografiskt granskad
6. Semantic and Verbatim Word Spotting using Deep Neural Networks
Öppna denna publikation i ny flik eller fönster >>Semantic and Verbatim Word Spotting using Deep Neural Networks
2016 (Engelska)Ingår i: Proceedings Of 2016 15Th International Conference On Frontiers In Handwriting Recognition (Icfhr), 2016, s. 307-312Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In the last few years, deep convolutional neural networks have become ubiquitous in computer vision, achieving state-of-the-art results on problems like object detection, semantic segmentation, and image captioning. However, they have not yet been widely investigated in the document analysis community. In this paper, we present a word spotting system based on convolutional neural networks. We train a network to extract a powerful image representation, which we then embed into a word embedding space. This allows us to perform wordspotting using both query-by-string and query-by-example in a variety of word embedding spaces, both learned and handcrafted, for verbatim as well as semantic word spotting. Our novel approach is versatile and the evaluation shows that it outperforms the previous state-of-the-art for word spotting on standard datasets.

Serie
International Conference on Handwriting Recognition, ISSN 2167-6445
Nyckelord
handwritten word spotting, convolutional neural networks, deep learning, word embeddings
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-306667 (URN)10.1109/ICFHR.2016.60 (DOI)000400052400056 ()978-1-5090-0981-7 (ISBN)
Konferens
15th International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
Projekt
q2b
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tillgänglig från: 2016-11-01 Skapad: 2016-11-01 Senast uppdaterad: 2019-04-08
7. Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
Öppna denna publikation i ny flik eller fönster >>Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
2016 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Deep learning has thus far not been used for dating of pre-modern handwritten documents. In this paper, we propose ways of using deep convolutional neural networks (CNNs) to estimate production dates for such manuscripts. In our approach, a CNN can either be used directly for estimating the production date or as a feature learning framework for other regression techniques. We explore the feature learning approach using Gaussian Processes regression and Support Vector Regression.The evaluation is performed on a unique large dataset of over 10000 medieval charters from the Swedish collection Svenskt Diplomatariums huvudkartotek (SDHK). We show that deep learning is applicable to the task of dating documents and that the performance is on average comparable to that of a human expert.

Ort, förlag, år, upplaga, sidor
IEEE, 2016
Serie
International Conference on Handwriting Recognition, ISSN 2167-6445
Nyckelord
Document analysis, Manuscripts, Document dating, Digital Humanities
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-306685 (URN)10.1109/ICFHR.2016.114 (DOI)000400052400039 ()978-1-5090-0981-7 (ISBN)
Konferens
International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
Projekt
q2bq2b_vr2012
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tillgänglig från: 2016-11-01 Skapad: 2016-11-01 Senast uppdaterad: 2019-04-08
8. CalligraphyNet: Augmenting handwriting generation with quill based stroke width
Öppna denna publikation i ny flik eller fönster >>CalligraphyNet: Augmenting handwriting generation with quill based stroke width
2019 (Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Realistic handwritten document generation garners a lot ofinterest from the document research community for its abilityto generate annotated data. In the current approach we haveused GAN-based stroke width enrichment and style transferbased refinement over generated data which result in realisticlooking handwritten document images. The GAN part of dataaugmentation transfers the stroke variation introduced by awriting instrument onto images rendered from trajectories cre-ated by tracking coordinates along the stylus movement. Thecoordinates from stylus movement are augmented with thelearned stroke width variations during the data augmentationblock. An RNN model is then trained to learn the variationalong the movement of the stylus along with the stroke varia-tions corresponding to an input sequence of characters. Thismodel is then used to generate images of words or sentencesgiven an input character string. A document image thus cre-ated is used as a mask to transfer the style variations of the inkand the parchment. The generated image can capture the colorcontent of the ink and parchment useful for creating annotated data.

Nationell ämneskategori
Datorsystem
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-379633 (URN)
Konferens
26th IEEE International Conference on Image Processing
Anmärkning

Currently under review

Tillgänglig från: 2019-03-19 Skapad: 2019-03-19 Senast uppdaterad: 2019-04-08

Open Access i DiVA

fulltext(2043 kB)218 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 2043 kBChecksumma SHA-512
6be1551c9994c1ae95f9381262f3feba4c59de208e41d013adc882853011adbd8ab489443fe19f91d86e7dfa2468cbf6d29a851cefcf6bf6ed112973c1b4e1ec
Typ fulltextMimetyp application/pdf
Köp publikationen >>

Sök vidare i DiVA

Av författaren/redaktören
Wilkinson, Tomas
Av organisationen
Avdelningen för visuell information och interaktionBildanalys och människa-datorinteraktion
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 218 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1017 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf