Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Learning based Word Search and Visualisation for Historical Manuscript Images
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för visuell information och interaktion. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Bildanalys och människa-datorinteraktion.ORCID-id: 0000-0002-6783-1744
2019 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Today, work with historical manuscripts is nearly exclusively done manually, by researchers in the humanities as well as laypeople mapping out their personal genealogy. This is a highly time consuming endeavour as it is not uncommon to spend months with the same volume of a few hundred pages. The last few decades have seen an ongoing effort to digitise manuscripts, both preservation purposes and to increase accessibility. This has the added effect of enabling the use methods and algorithms from Image Analysis and Machine Learning that have great potential in both making existing work more efficient and creating new methodologies for manuscript-based research.

The first part of this thesis focuses on Word Spotting, the task of searching for a given text query in a manuscript collection. This can be broken down into two tasks, detecting where the words are located on the page, and then ranking the words according to their similarity to a search query. We propose Deep Learning models to do both, separately and then simultaneously, and successfully search through a large manuscript collection consisting of over a hundred thousand pages.

A limiting factor in applying learning-based methods to historical manuscript images is the cost, and therefore, lack of annotated data needed to train machine learning models. We propose several ways to mitigate this problem, including generating synthetic data, augmenting existing data to get better value from it, and learning from pre-existing, partially annotated data that was previously unusable.

In the second part, a method for visualising manuscript collections called the Image-based Word Cloud is proposed. Much like it text-based counterpart, it arranges the most representative words in a collection into a cloud, where the size of the words are proportional to their frequency of occurrence. This grants a user a single image overview of a manuscript collection, regardless of its size. We further propose a way to estimate a manuscripts production date. This can grant historians context that is crucial for correctly interpreting the contents of a manuscript.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis, 2019. , s. 82
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1798
Emneord [en]
Word Spotting, Convolutional Neural Networks, Deep Learning, Region Proposals, Historical Manuscripts, Computer Vision, Image Analysis, Visualisation, Document Analysis
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
URN: urn:nbn:se:uu:diva-381308ISBN: 978-91-513-0633-9 (tryckt)OAI: oai:DiVA.org:uu-381308DiVA, id: diva2:1303103
Disputas
2019-06-04, TLS (Tidskriftläsesalen), Carolina Rediviva, Dag Hammarskjölds väg 1, Uppsala, 10:15 (engelsk)
Opponent
Veileder
Forskningsfinansiär
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1Tilgjengelig fra: 2019-05-13 Laget: 2019-04-08 Sist oppdatert: 2019-06-18
Delarbeid
1. Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment
Åpne denne publikasjonen i ny fane eller vindu >>Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Recent work in word spotting in handwritten documents has yielded impressive results. Yet this progress has largely been made by supervised learning systems which are dependant on manually annotated data, making deployment to new collections a significant effort. In this paper we propose an approach utilising transcriptions without bounding box annotations to train segmentation-free word spotting models, given a model partially trained with full annotations. This is done through an alignment procedure based on hidden Markov models. This model can create a tentative mapping between word region proposals and the transcriptions to automatically create additional weakly annotated training data. Using as little as 1% and 10% of the fully annotated training sets for partial convergence, we automatically annotate the remaining training data and successfully train using it. Across all datasets, our approach comes within a few mAP% of achieving the same performance as a model trained with only full ground truth. We believe that this will be a significant advance towards a more general use of word spotting, since digital transcription data will already exist for parts of many collections of interest.

Emneord
weakly supervised, segmentation-free word spotting, convolutional neural network, hidden Markov model
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-381304 (URN)
Prosjekter
q2b
Forskningsfinansiär
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tilgjengelig fra: 2019-04-07 Laget: 2019-04-07 Sist oppdatert: 2019-04-08
2. Neural Word Search in Historical Manuscript Collections
Åpne denne publikasjonen i ny fane eller vindu >>Neural Word Search in Historical Manuscript Collections
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first propose an end-to-end trainable model based on deep neural networks that we dub Ctrl-F-Net. The model simultaneously generates region proposals and embeds them into a word embedding space, wherein a search is performed. We further introduce a simplified version called Ctrl-F-Mini. It is faster with similar performance, though it is limited to more easily segmented manuscripts. We evaluate both models on common benchmark datasets and surpass the previous state of the art. Finally, in collaboration with historians, we employ the Ctrl-F-Net to search within a large manuscript collection of over 100 thousand pages, written across two centuries. With only 11 training pages, we enable large scale data collection in manuscript-based historical research. This results in a speed up of data collection and the number of manuscripts processed by orders of magnitude. Given the time consuming manual work required to study old manuscripts in the humanities, quick and robust tools for word spotting has the potential to revolutionise domains like history, religion and language.

Emneord
Word spotting, Historical Manuscripts, Deep Convolutional Neural Network, Region Proposals
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-381306 (URN)
Prosjekter
q2b
Forskningsfinansiär
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tilgjengelig fra: 2019-04-07 Laget: 2019-04-07 Sist oppdatert: 2019-04-08
3. Neural Ctrl-F: Segmentation-free query-by-string word spotting in handwritten manuscript collections
Åpne denne publikasjonen i ny fane eller vindu >>Neural Ctrl-F: Segmentation-free query-by-string word spotting in handwritten manuscript collections
2017 (engelsk)Inngår i: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, s. 4443-4452Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper, we approach the problem of segmentation-free query-by-string word spotting for handwritten documents. In other words, we use methods inspired from computer vision and machine learning to search for words in large collections of digitized manuscripts. In particular, we are interested in historical handwritten texts, which are often far more challenging than modern printed documents. This task is important, as it provides people with a way to quickly find what they are looking for in large collections that are tedious and difficult to read manually. To this end, we introduce an end-to-end trainable model based on deep neural networks that we call Ctrl-F-Net. Given a full manuscript page, the model simultaneously generates region proposals, and embeds these into a distributed word embedding space, where searches are performed. We evaluate the model on common benchmarks for handwritten word spotting, outperforming the previous state-of-the-art segmentation-free approaches by a large margin, and in some cases even segmentation-based approaches. One interesting real-life application of our approach is to help historians to find and count specific words in court records that are related to women's sustenance activities and division of labor. We provide promising preliminary experiments that validate our method on this task.

sted, utgiver, år, opplag, sider
IEEE, 2017
Serie
IEEE International Conference on Computer Vision, E-ISSN 1550-5499
Emneord
Segmentation-free Word Spotting, Deep Learning, Convolutional Neural Network, Query-by-String
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-335926 (URN)10.1109/ICCV.2017.475 (DOI)000425498404054 ()978-1-5386-1032-9 (ISBN)
Konferanse
16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 22-29, 2017
Prosjekter
q2b
Forskningsfinansiär
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tilgjengelig fra: 2017-12-11 Laget: 2017-12-11 Sist oppdatert: 2019-04-08bibliografisk kontrollert
4. Visualizing document image collections using image-based word clouds
Åpne denne publikasjonen i ny fane eller vindu >>Visualizing document image collections using image-based word clouds
2015 (engelsk)Inngår i: Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015, Proceedings, Part I / [ed] Bebis, G; Boyle, R; Parvin, B; Koracin, D; Pavlidis, I; Feris, R; McGraw, T; Elendt, M; Kopper, R; Ragan, E; Ye, Z; Weber, G, Springer, 2015, s. 297-306Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Springer, 2015
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9474
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-272193 (URN)10.1007/978-3-319-27857-5_27 (DOI)000376400300027 ()9783319278568 (ISBN)9783319278575 (ISBN)
Konferanse
ISVC 2015, December 14–16, Las Vegas, NV
Prosjekter
q2b
Forskningsfinansiär
Swedish Research Council, 2012-5743
Tilgjengelig fra: 2015-12-18 Laget: 2016-01-12 Sist oppdatert: 2019-04-08bibliografisk kontrollert
5. A novel word segmentation method based on object detection and deep learning
Åpne denne publikasjonen i ny fane eller vindu >>A novel word segmentation method based on object detection and deep learning
2015 (engelsk)Inngår i: Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015, Proceedings, Part I / [ed] Bebis, G; Boyle, R; Parvin, B; Koracin, D; Pavlidis, I; Feris, R; McGraw, T; Elendt, M; Kopper, R; Ragan, E; Ye, Z; Weber, G, Springer, 2015, s. 231-240Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The segmentation of individual words is a crucial step in several data mining methods for historical handwritten documents. Examples of applications include visual searching for query words (word spotting) and character-by-character text recognition. In this paper, we present a novel method for word segmentation that is adapted from recent advances in computer vision, deep learning and generic object detection. Our method has unique capabilities and it has found practical use in our current research project. It can easily be trained for different kinds of historical documents, uses full gray scale information, does not require binarization as pre-processing or prior segmentation of individual text lines. We evaluate its performance using established error metrics, previously used in competitions for word segmentation, and demonstrate its usefulness for a 15th century handwritten document.

sted, utgiver, år, opplag, sider
Springer, 2015
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9474
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-272181 (URN)10.1007/978-3-319-27857-5_21 (DOI)000376400300021 ()9783319278568 (ISBN)9783319278575 (ISBN)
Konferanse
ISVC 2015, December 14–16, Las Vegas, NV
Prosjekter
q2b
Forskningsfinansiär
Swedish Research Council, 2012-5743
Tilgjengelig fra: 2015-12-18 Laget: 2016-01-12 Sist oppdatert: 2019-04-08bibliografisk kontrollert
6. Semantic and Verbatim Word Spotting using Deep Neural Networks
Åpne denne publikasjonen i ny fane eller vindu >>Semantic and Verbatim Word Spotting using Deep Neural Networks
2016 (engelsk)Inngår i: Proceedings Of 2016 15Th International Conference On Frontiers In Handwriting Recognition (Icfhr), 2016, s. 307-312Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In the last few years, deep convolutional neural networks have become ubiquitous in computer vision, achieving state-of-the-art results on problems like object detection, semantic segmentation, and image captioning. However, they have not yet been widely investigated in the document analysis community. In this paper, we present a word spotting system based on convolutional neural networks. We train a network to extract a powerful image representation, which we then embed into a word embedding space. This allows us to perform wordspotting using both query-by-string and query-by-example in a variety of word embedding spaces, both learned and handcrafted, for verbatim as well as semantic word spotting. Our novel approach is versatile and the evaluation shows that it outperforms the previous state-of-the-art for word spotting on standard datasets.

Serie
International Conference on Handwriting Recognition, ISSN 2167-6445
Emneord
handwritten word spotting, convolutional neural networks, deep learning, word embeddings
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-306667 (URN)10.1109/ICFHR.2016.60 (DOI)000400052400056 ()978-1-5090-0981-7 (ISBN)
Konferanse
15th International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
Prosjekter
q2b
Forskningsfinansiär
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tilgjengelig fra: 2016-11-01 Laget: 2016-11-01 Sist oppdatert: 2019-04-08
7. Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
Åpne denne publikasjonen i ny fane eller vindu >>Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
2016 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Deep learning has thus far not been used for dating of pre-modern handwritten documents. In this paper, we propose ways of using deep convolutional neural networks (CNNs) to estimate production dates for such manuscripts. In our approach, a CNN can either be used directly for estimating the production date or as a feature learning framework for other regression techniques. We explore the feature learning approach using Gaussian Processes regression and Support Vector Regression.The evaluation is performed on a unique large dataset of over 10000 medieval charters from the Swedish collection Svenskt Diplomatariums huvudkartotek (SDHK). We show that deep learning is applicable to the task of dating documents and that the performance is on average comparable to that of a human expert.

sted, utgiver, år, opplag, sider
IEEE, 2016
Serie
International Conference on Handwriting Recognition, ISSN 2167-6445
Emneord
Document analysis, Manuscripts, Document dating, Digital Humanities
HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-306685 (URN)10.1109/ICFHR.2016.114 (DOI)000400052400039 ()978-1-5090-0981-7 (ISBN)
Konferanse
International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
Prosjekter
q2bq2b_vr2012
Forskningsfinansiär
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tilgjengelig fra: 2016-11-01 Laget: 2016-11-01 Sist oppdatert: 2019-04-08
8. CalligraphyNet: Augmenting handwriting generation with quill based stroke width
Åpne denne publikasjonen i ny fane eller vindu >>CalligraphyNet: Augmenting handwriting generation with quill based stroke width
2019 (engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Realistic handwritten document generation garners a lot ofinterest from the document research community for its abilityto generate annotated data. In the current approach we haveused GAN-based stroke width enrichment and style transferbased refinement over generated data which result in realisticlooking handwritten document images. The GAN part of dataaugmentation transfers the stroke variation introduced by awriting instrument onto images rendered from trajectories cre-ated by tracking coordinates along the stylus movement. Thecoordinates from stylus movement are augmented with thelearned stroke width variations during the data augmentationblock. An RNN model is then trained to learn the variationalong the movement of the stylus along with the stroke varia-tions corresponding to an input sequence of characters. Thismodel is then used to generate images of words or sentencesgiven an input character string. A document image thus cre-ated is used as a mask to transfer the style variations of the inkand the parchment. The generated image can capture the colorcontent of the ink and parchment useful for creating annotated data.

HSV kategori
Forskningsprogram
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-379633 (URN)
Konferanse
26th IEEE International Conference on Image Processing
Merknad

Currently under review

Tilgjengelig fra: 2019-03-19 Laget: 2019-03-19 Sist oppdatert: 2019-04-08

Open Access i DiVA

fulltext(2043 kB)232 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 2043 kBChecksum SHA-512
6be1551c9994c1ae95f9381262f3feba4c59de208e41d013adc882853011adbd8ab489443fe19f91d86e7dfa2468cbf6d29a851cefcf6bf6ed112973c1b4e1ec
Type fulltextMimetype application/pdf
Kjøp publikasjonen >>

Søk i DiVA

Av forfatter/redaktør
Wilkinson, Tomas
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 232 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1026 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf