During the last decades, document sharing has become vastly more available for the general public, with large document collections being made generally available on the internet and inside of organizations on intranets. In addition, each of us has an everincreasing archive of private digital documents. At the same time efforts to enable more efficient document retrieval have only succeeded marginally. This makes finding the right document like looking for a needle in the haystack. Just now it is a bigger haystack. This lack of overview of existing document resources results in large amounts of scarce human resources that are still being used to create similar resources.
A key reason to why we are faced with this challenge is that few documents receive a sufficient metadata description in order to enable efficient retrieval. Too often the document metadata is insufficient or even incorrect. Few document creators are aware of describing their documents with metadata. Trained librarians and archivists can assist authors to create and publish metadata, but this is a costly and time-consuming process. Advanced metadata formats, such as the IEEE LOM, enable detailed and precise metadata descriptions. This format is challenging to use and the potential in the format is often not leveraged. Document formats that require such metadata, e.g. SCORM Learning Objects (LOs), are not being used to their potential due to the challenges of creating metadata.
This thesis shows how Automatic Metadata Generation (AMG) can stand as a foundation for creation, publishing and discovery of document resources with rich and correct metadata descriptions. This thesis shows how high quality metadata can be created automatically using the documents themselves and contextual data sources. Finally, this thesis shows how metadata descriptions can be used alongside the original document to create SCORM LOs to enable sharing of educational resources with educational metadata descriptions.
The main contributions by this thesis are:
C1: Establishing an overview of research literature, projects and products using AMG and the quality of their generated metadata.
C2: Establishing that AMG efforts can be combined to expand the range of elements and entities that can be generated, but also to increase the quality of generated entities.
C3: Establishing that AMG efforts can generate high quality metadata from nonhomogeneous document collections, vastly expanding the practical usefulness of AMG.
C4: Establishing that AMG efforts can contribute extensively in promoting sharing of knowledge with the creation of sharable SCORM LOs containing the educational resources themselves and extensive metadata descriptions to enable efficient location and use.