Automatic identification and cropping of rectangular objects in digital images
Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Today, digital images are commonly used to preserve and present analogue media. To minimize the need for digital storage space, it is important that the object covers as large part of the image as possible. This paper presents a robust methodology, based on common edge and line detection techniques, to automatically identify rectangular objects in digital images. The methodology is tailored to identify posters, photographs and books digitized at the National Library of Sweden (the KB). The methodology has been implemented as a part of DocCrop, a computer program written in Java to automatically identify and crop documents in digital images. With the aid of the developed tool, the KB hopes to decrease the time and manual labour required to crop their digital images.
Three multi-paged documents digitized at the KB have been used to evaluate the tool's performance. Each document features different characteristics. The overall identification results, as well as an in-depth analysis of the different methodology stages, are presented in this paper. In average, the developed software identified 98% of the digitized document pages successfully. The software's identification success rate never went below 95% for any of the three documents. The robustness and execution speed of the methodology suggests that the methodology can be a compelling alternative to the manual identification used at the KB today.
Place, publisher, year, edition, pages
IT, 12 040
Engineering and Technology
IdentifiersURN: urn:nbn:se:uu:diva-180381OAI: oai:DiVA.org:uu-180381DiVA: diva2:549806
Brun, AndersBol, Roland