Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Semantic and Verbatim Word Spotting using Deep Neural Networks
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för visuell information och interaktion. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Bildanalys och människa-datorinteraktion.
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för visuell information och interaktion. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Bildanalys och människa-datorinteraktion.
2016 (Engelska)Ingår i: Proceedings Of 2016 15Th International Conference On Frontiers In Handwriting Recognition (Icfhr), 2016, s. 307-312Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In the last few years, deep convolutional neural networks have become ubiquitous in computer vision, achieving state-of-the-art results on problems like object detection, semantic segmentation, and image captioning. However, they have not yet been widely investigated in the document analysis community. In this paper, we present a word spotting system based on convolutional neural networks. We train a network to extract a powerful image representation, which we then embed into a word embedding space. This allows us to perform wordspotting using both query-by-string and query-by-example in a variety of word embedding spaces, both learned and handcrafted, for verbatim as well as semantic word spotting. Our novel approach is versatile and the evaluation shows that it outperforms the previous state-of-the-art for word spotting on standard datasets.

Ort, förlag, år, upplaga, sidor
2016. s. 307-312
Serie
International Conference on Handwriting Recognition, ISSN 2167-6445
Nyckelord [en]
handwritten word spotting, convolutional neural networks, deep learning, word embeddings
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
URN: urn:nbn:se:uu:diva-306667DOI: 10.1109/ICFHR.2016.60ISI: 000400052400056ISBN: 978-1-5090-0981-7 (tryckt)OAI: oai:DiVA.org:uu-306667DiVA, id: diva2:1044046
Konferens
15th International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
Projekt
q2b
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1Tillgänglig från: 2016-11-01 Skapad: 2016-11-01 Senast uppdaterad: 2019-04-08
Ingår i avhandling
1. Learning based Word Search and Visualisation for Historical Manuscript Images
Öppna denna publikation i ny flik eller fönster >>Learning based Word Search and Visualisation for Historical Manuscript Images
2019 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Today, work with historical manuscripts is nearly exclusively done manually, by researchers in the humanities as well as laypeople mapping out their personal genealogy. This is a highly time consuming endeavour as it is not uncommon to spend months with the same volume of a few hundred pages. The last few decades have seen an ongoing effort to digitise manuscripts, both preservation purposes and to increase accessibility. This has the added effect of enabling the use methods and algorithms from Image Analysis and Machine Learning that have great potential in both making existing work more efficient and creating new methodologies for manuscript-based research.

The first part of this thesis focuses on Word Spotting, the task of searching for a given text query in a manuscript collection. This can be broken down into two tasks, detecting where the words are located on the page, and then ranking the words according to their similarity to a search query. We propose Deep Learning models to do both, separately and then simultaneously, and successfully search through a large manuscript collection consisting of over a hundred thousand pages.

A limiting factor in applying learning-based methods to historical manuscript images is the cost, and therefore, lack of annotated data needed to train machine learning models. We propose several ways to mitigate this problem, including generating synthetic data, augmenting existing data to get better value from it, and learning from pre-existing, partially annotated data that was previously unusable.

In the second part, a method for visualising manuscript collections called the Image-based Word Cloud is proposed. Much like it text-based counterpart, it arranges the most representative words in a collection into a cloud, where the size of the words are proportional to their frequency of occurrence. This grants a user a single image overview of a manuscript collection, regardless of its size. We further propose a way to estimate a manuscripts production date. This can grant historians context that is crucial for correctly interpreting the contents of a manuscript.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2019. s. 82
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1798
Nyckelord
Word Spotting, Convolutional Neural Networks, Deep Learning, Region Proposals, Historical Manuscripts, Computer Vision, Image Analysis, Visualisation, Document Analysis
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datoriserad bildbehandling
Identifikatorer
urn:nbn:se:uu:diva-381308 (URN)978-91-513-0633-9 (ISBN)
Disputation
2019-06-04, TLS (Tidskriftläsesalen), Carolina Rediviva, Dag Hammarskjölds väg 1, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Forskningsfinansiär
Vetenskapsrådet, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Tillgänglig från: 2019-05-13 Skapad: 2019-04-08 Senast uppdaterad: 2019-06-18

Open Access i DiVA

fulltext(699 kB)427 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 699 kBChecksumma SHA-512
fbc1d8ffa156dfe637ae2423fd2c85ee2055d9a5dc7d1d6498844f88cfe45b7d6f05e7b3aa1893944720538fcc8a5c2f2f3469eb847371531838764bbc85f36d
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltext

Sök vidare i DiVA

Av författaren/redaktören
Wilkinson, TomasBrun, Anders
Av organisationen
Avdelningen för visuell information och interaktionBildanalys och människa-datorinteraktion
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 427 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 1028 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf