Change search
ReferencesLink to record
Permanent link

Direct link
Extracting Knowledge for Cultural Heritage Knowledge Base Population
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2013 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

The entity-oriented description of the world is a major, current trend motivated by the need for semantic services that can support the human need of finding information, learning and discovering new knowledge, and broadening the existing knowledge horizons. Entities, managed in semantic knowledge bases, have the potential to be the backbone for these new and innovative services. Therefore, automatically extracting facts from various data sources and populating knowledge bases a challenge studied in this work.

This thesis proposes methods for knowledge extraction for the cultural heritage domain. Extracting knowledge from the cultural heritage metadata is by no means a trivial task and there are often problems with missing or ambiguous information. Therefore, an inherent part of this work is dedicated to developing pattern-based techniques to extract knowledge from natural language documents to complement and supplement the knowledge we extract from metadata. However, the proposed framework is not limited to only work in conjunction with metadata extraction – it additionally supports independent, continuous mode operation, i.e. patterns learned during extraction are used to subsequently mine new knowledge.

In summary, the main contributions of this thesis are:

  • FRBR-ML: a generic framework for exploiting metadata which includes: (i) a method to extract entities, attributes and relationships from existing legacy metadata, (ii) novel techniques for correction, enhancement and semantic enrichment of the metadata, and (iii) metrics to assess the quality of extraction.
  • SPIDER: a prototype that supports extraction of relational facts at Web-scale. Contrary to most knowledge extraction approaches, we tackle the problem of uniquely identifying entities both to extend their list of spelling forms and to facilitate the matching to LOD entities. Furthermore, in addition to the flexible pattern definition scheme, SPIDER enables a provenance-aware extraction method, which prudently refines extracted facts by considering the PageRank and SpamScore as well as the relevance score of the source document.
  • KIEV: a prototype that takes the development of SPIDER into the next stage, namely by enabling verification of facts using two evidence-based techniques: (i) classification to check the type of relationship with a machine learning approach, and (ii) linking to discover local entity’s correspondence in another data source was leveraged using existing semantic knowledge bases.
  • FRBRpedia: a prototype that is developed to utilize the attribute-oriented linking of local entity to the corresponding entities in external semantic knowledge bases. As one of the most basic tasks of knowledge base population, linking demonstrates the power of Linked Data applications. Finally, linking is commonly seen as a required step for putting the data on LOD.

The methods and solutions proposed in this thesis provide a solid foundation for automatically populating knowledge bases using wide range of sources. The feasibility of the approaches presented have been tested through experimental evaluation using real-world datasets. A general conclusion is that complementing knowledge extraction from metadata with the external sources results in less amount of missing and ambiguous information and in a more complete knowledge base.

Place, publisher, year, edition, pages
NTNU, 2013.
Doctoral theses at NTNU, ISSN 1503-8181 ; 2013:289
Keyword [en]
Knowledge bases, Knowledge extraction, Metadata, FRBR, Entity matching
National Category
Information and communication systems
URN: urn:nbn:no:ntnu:diva-23381ISBN: 978-82-471-4709-2 (printed ver.)ISBN: 978-82-471-4710-8 (electronic ver.)OAI: diva2:662303
Public defence
2013-11-07, 00:00
Available from: 2013-11-06 Created: 2013-11-06Bibliographically approved

Open Access in DiVA

fulltekst(4478 kB)472 downloads
File information
File name FULLTEXT01.pdfFile size 4478 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Computer and Information Science
Information and communication systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 472 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 171 hits
ReferencesLink to record
Permanent link

Direct link