Change search
ReferencesLink to record
Permanent link

Direct link
Subimage matching in historical documents using SIFT keypoints and clustering
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context: In this thesis subimage matching in historical handwritten documents using SIFT (Scale-Invariant Feature Transform) keypoints was tested. SIFT features are invariant to scale and rotation and have gained a lot of interest in the research community. The historical documents used in this thesis orignates from 16th century and forward. The following steps have been executed; binarization, word segmentation, feature identification and clustering. The binarization step converts the images into binary images. The word segmentation separates the different words into individual subimages. In the feature identification SIFT keypoints was found and descriptors was computed. The last step was to cluster the images based on the distances between the set of image features identified. Objectives: The main objectives are to find a good configuration for the binarization step, implement a good word segmentation, identify image features and lastly to cluster the images based on their similarity. The context from subimages are matched to each other rather than trying to predict what the context of a subimage is, simply because the data that has been used is unlabeled. Methods: Implementation were the main methodology used combined with experimentation. Measurements were taken throughout the development and accuracy of word segmentation and the clustering is measured. Results: The word segmentation got an average accuracy of 89\% correct segmentation which is comparable to other word segmentating results. The clustering however matched 0% correctly.Conclusions: The conclusions that have been drawn from this study is that SIFT keypoints are not very well suited for this type of problem which includes a lot of handwritten text. The descriptors were not discriminative enough and different keypoints were found in different images with the same handwritten text, which lead to the bad clustering results.

Place, publisher, year, edition, pages
2015. , 36 p.
Keyword [en]
Handwritten, Image matching, SIFT, Segmentation
National Category
Computer Science
URN: urn:nbn:se:bth-10417OAI: diva2:839793
Subject / course
DV2566 Master's Thesis (120 credits) in Computer Science
Educational program
DVACS Master of Science Programme in Computer Science
Available from: 2015-08-05 Created: 2015-07-05 Last updated: 2015-08-05Bibliographically approved

Open Access in DiVA

fulltext(7035 kB)103 downloads
File information
File name FULLTEXT02.pdfFile size 7035 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Åberg, Hampus
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 103 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 312 hits
ReferencesLink to record
Permanent link

Direct link