Change search
ReferencesLink to record
Permanent link

Direct link
Clustering of Image Search Results to Support Historical Document Recognition
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2014 (English)Independent thesis Advanced level (degree of Master (One Year))Student thesis
Abstract [en]

Context. Image searching in historical handwritten documents is a challenging problem in computer vision and pattern recognition. The amount of documents which have been digitalized is increasing each day, and the task to find occurrences of a selected sub-image in a collection of documents has special interest for historians and genealogist. Objectives. This thesis develops a technique for image searching in historical documents. Divided in three phases, first the document is segmented into sub-images according to the words on it. These sub-images are defined by a features vector with measurable attributes of its content. And based on these vectors, a clustering algorithm computes the distance between vectors to decide which images match with the selected by the user. Methods. The research methodology is experimentation. A quasi-experiment is designed based on repeated measures over a single group of data. The image processing, features selection, and clustering approach are the independent variables; whereas the accuracies measurements are the dependent variable. This design provides a measurement net based on a set of outcomes related to each other. Results. The statistical analysis is based on the F1 score to measure the accuracy of the experimental results. This test analyses the accuracy of the experiment regarding to its true positives, false positives, and false negatives detected. The average F-measure for the experiment conducted is F1 = 0.59, whereas the actual performance value of the method is matching ratio of 66.4%. Conclusions. This thesis provides a starting point in order to develop a search engine for historical document collections based on pattern recognition. The main research findings are focused in image enhancement and segmentation for degraded documents, and image matching based on features definition and cluster analysis.

Place, publisher, year, edition, pages
2014. , 42 p.
Keyword [en]
Historical documents, Computer vision, Features extraction, Clustering
National Category
Computer Science
URN: urn:nbn:se:bth-5577Local ID: diva2:832962
Available from: 2015-04-22 Created: 2014-10-31 Last updated: 2015-06-30Bibliographically approved

Open Access in DiVA

fulltext(7322 kB)58 downloads
File information
File name FULLTEXT01.pdfFile size 7322 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science and Engineering
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 58 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 94 hits
ReferencesLink to record
Permanent link

Direct link