Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Multifont recognition System for Ethiopic Script
Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).
2006 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

In this thesis, we present a general framework for multi-font, multi-size and multi-style Ethiopic character recognition system. We propose structural and syntactic techniques for recognition of Ethiopic characters where the graphically comnplex characters are represented by less complex primitive structures and their spatial interrelationships. For each Ethiopic character, the primitive structures and their spatial interrelationships form a unique set of patterns.

The interrelationships of primitives are represented by a special tree structure which resembles a binary search tree in the sense that it groups child nodes as left and right, and keeps the spatial position of primitives in orderly manner. For a better computational efficiency, the primitive tree is converted into string pattern using in-order traversal, which generates a base of the alphabet that stores possibly occuring string patterns for each character. The recognition of characters is then achieved by matching the generated patterns with each pattern in a stored knowledge base of characters.

Structural features are extracted using direction field tensor, which is also used for character segmentation. In general, the recognition system does not need size normalization, thinning or other preprocessing procedures. The only parameter that needs to be adjusted during the recognition process is the size of Gaussian window which should be chosen optimally in relation to font sizes. We also constructed an Ethiopic Document Image Database (EDIDB) from real life documents and the recognition system is tested with respect to variations in font type, size, style, document skewness and document type. Experimental results are reported.

sted, utgiver, år, opplag, sider
Göteborg: Department of Signals and Systems, Chalmers University of Technology , 2006. , s. 46
Serie
Technical report ; 2006:21
Emneord [en]
Ethiopic character recognition, OCR, Multifont recognition, Amharic, Direction fields, Structural and syntactic pattern recognition
HSV kategori
Identifikatorer
URN: urn:nbn:se:hh:diva-1978Lokal ID: 2082/2373OAI: oai:DiVA.org:hh-1978DiVA, id: diva2:239196
Presentation
(engelsk)
Veileder
Tilgjengelig fra: 2008-09-29 Laget: 2008-09-29 Sist oppdatert: 2018-03-23bibliografisk kontrollert
Delarbeid
1. Recognition of Modification-based Scripts Using Direction Tensors
Åpne denne publikasjonen i ny fane eller vindu >>Recognition of Modification-based Scripts Using Direction Tensors
2004 (engelsk)Inngår i: Proc. 4th Indian Conference on Computer Vision, Graphics and Image Processing, 2004, s. 587-592Konferansepaper, Publicerat paper (Annet vitenskapelig)
Abstract [en]

The research on the OCR technology for the Latin-based scripts has been successful in achieving the status of image scanners with built-in OCR facility. But, a majority of modification-based scripts such as Brahmi descended South Asian or Ethiopic scripts are still progressing to achieve this status. This indicates the difficulties in adopting the recognition methods that have been proposed so far for the Latin-based scripts to modification-based scripts. In this paper we propose a novel method that can be adopted to recognise modification-based printed scripts consisting of a large character set, without the need for prior segmentation. The major strength of this method is that, the direction features that are used as the main principle for recognition, are further used in the separation of confusing characters, detection of skew angle, segmentation of script and graphic objects which substantially improves the computation efficiency. Algorithms developed initially for the Brahmi descended Sinhala script used in Sri Lanka, have been extended successfully for the Ethiopic script which has been evolved in a different geographical region, yielding consistently accurate results. Together, these two scripts are used by a population of ninety million.

Emneord
OCR technology, language, scripts
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-14923 (URN)81-7764-707-5 (ISBN)
Konferanse
4th Indian Conference on Computer Vision, Graphics and Image, December 16-18, 2004, Kolkata, India
Tilgjengelig fra: 2011-04-04 Laget: 2011-04-04 Sist oppdatert: 2018-03-23bibliografisk kontrollert
2. Structural and Syntactic Techniques for Recognition of Ethiopic Characters
Åpne denne publikasjonen i ny fane eller vindu >>Structural and Syntactic Techniques for Recognition of Ethiopic Characters
2006 (engelsk)Inngår i: Structural, syntactic, and statistical pattern recognition joint IAPR international workshops SSPR 2006 and SPR 2006, Hong Kong, China, August 17-19, 2006 : proceedings: Lecture Notes in Computer Sciences (Volume 4109/2006), Berlin: Springer Berlin/Heidelberg, 2006, s. 118-126Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

OCR technology of Latin scripts is well advanced in comparison to other scripts. However, the available results from Latin are not always sufficient to directly adopt them for other scripts such as the Ethiopic script. In this paper, we propose a novel approach that uses structural and syntactic techniques for recognition of Ethiopic characters. We reveal that primitive structures and their spatial relationships form a unique set of patterns for each character. The relationships of primitives are represented by a special tree structure, which is also used to generate a pattern. A knowledge base of the alphabet that stores possibly occurring patterns for each character is built. Recognition is then achieved by matching the generated pattern against each pattern in the knowledge base. Structural features are extracted using direction field tensor. Experimental results are reported, and the recognition system is insensitive to variations on font types, sizes and styles.

sted, utgiver, år, opplag, sider
Berlin: Springer Berlin/Heidelberg, 2006
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 4109
Emneord
Pattern recognition, Image analysis, OCR
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-2166 (URN)10.1007/11815921 (DOI)000240075100012 ()2-s2.0-33749587617 (Scopus ID)2082/2563 (Lokal ID)978-3-540-37236-3 (ISBN)2082/2563 (Arkivnummer)2082/2563 (OAI)
Konferanse
Joint IAPR International Workshops, SSPR 2006 and SPR 2006, Hong Kong, China, August 17-19, 2006
Tilgjengelig fra: 2008-11-27 Laget: 2008-11-27 Sist oppdatert: 2018-03-23bibliografisk kontrollert
3. Ethiopic Character Recognition Using Direction Field Tensor
Åpne denne publikasjonen i ny fane eller vindu >>Ethiopic Character Recognition Using Direction Field Tensor
2006 (engelsk)Inngår i: The 18th International Conference on Pattern Recognition: proceedings : 20-24 August, 2006, Hong Kong, Los Alamitos, Calif.: IEEE Computer Society, 2006, s. 284-287Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Many languages in Ethiopia use a unique alphabet called Ethiopic for writing. However, there is no OCR system developed to date. In an effort to develop automatic recognition of Ethiopic script, a novel system is designed by applying structural and syntactic techniques. The recognition system is developed by extracting primitive structural features and their spatial relationships. A special tree structure is used to represent the spatial relationship of primitive structures. For each character, a unique string pattern is generated from the tree and recognition is achieved by matching the string against a stored knowledge base of the alphabet. To implement the recognition system, we use direction field tensor as a tool for character segmentation, and extraction of structural features and their spatial relationships. Experimental results are reported.

sted, utgiver, år, opplag, sider
Los Alamitos, Calif.: IEEE Computer Society, 2006
Serie
International Conference on Pattern Recognition. Proceedings, ISSN 1051-4651
Emneord
character recognition, feature extraction, image segmentation, knowledge based systems, natural language interfaces, string matching, tensors
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-2123 (URN)10.1109/ICPR.2006.507 (DOI)000240705600067 ()2-s2.0-34147145904 (Scopus ID)2082/2518 (Lokal ID)0-7695-2521-0 (ISBN)2082/2518 (Arkivnummer)2082/2518 (OAI)
Konferanse
18th International Conference on Pattern Recognition, ICPR 2006, Hong Kong, 20 - 24 August, 2006
Merknad

©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Tilgjengelig fra: 2008-11-11 Laget: 2008-11-11 Sist oppdatert: 2018-03-23bibliografisk kontrollert
4. Ethiopic Document Image Database for Testing Character Recognition Systems
Åpne denne publikasjonen i ny fane eller vindu >>Ethiopic Document Image Database for Testing Character Recognition Systems
2006 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

In this paper we describe the acquisition and content of a large database of Ethiopic documents for testing and evaluating character recognition systems. The Ethiopic Document Image Database (EDIDB) contains documents written in Amharic and Geez languages. The database was built from a variety of documents such as printouts, books, newspapers, and magazines. Documents written in various font types, sizes and styles were included in the database. Degraded and poor quality documents were also included in the database to represent the real life situation. A total of 1,204 pages were scanned at a resolution of 300 dpi and saved as grayscale images of JPEG format. We also describe an evaluation protocol for standardizing the comparison of recognition systems and their results. The database is made available to the research community through http://www.hh.se/staff/josef/.

sted, utgiver, år, opplag, sider
Halmstad: Halmstad University, 2006. s. 6
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-14930 (URN)
Tilgjengelig fra: 2011-04-04 Laget: 2011-04-04 Sist oppdatert: 2018-03-23bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Søk i DiVA

Av forfatter/redaktør
Assabie Lake, Yaregal
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 336 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf