Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Creating a Graph Database from a Set of Documents
KTH, School of Computer Science and Communication (CSC).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Skapandet av en grafdatabas från ett set av dokument (Swedish)
Abstract [en]

In the context of search, it may be advantageous in some use-cases to have documents saved in a graph database rather than a document-orientated database. Graph databases are able to model relationships between objects, in this case documents, in ways which allow for efficient retrieval, as well as search queries that are slightly more specific or complex.

This report will attempt to explore the possibilities of storing an existing set of documents into a graph database. A Named Entity Recognizer was used on a set of news articles in order to extract entities from each news article’s body of text. News articles that contain the same entities are then connected to each other in the graph. Ideas to improve this entity extraction are also explored.

The method of evaluation that was utilized in this report proved not to be ideal for this task in that only a relative measure was given, not an absolute one. As such, no absolute answer with regards to the quality of the method can be presented. It is clear that improvements can be made, and the result should be subject to further study.

Abstract [sv]

I ett sökkontext kan det vara födelaktigt att i några användarscenarion utgå från dokument lagrade i en grafdatabas gentemot en dokument-orienterad databas. Grafdatabaser kan modellera förhållanden mellan objekt, som i detta fall är dokument, på ett sätt som ökar effektiviteten för vissa mer specifika eller komplexa sökfrågor.

Denna rapport utforskar möjligheterna i att lagra existerande dokument i en grafdatabas. En Named Entity Recognizer används för att extrahera entiter från en stor samling nyhetsartiklar. Nyhetsartiklar som innehåller samma entiteter är sedan kopplade till varandra i grafen. Dessutom undersöks möjligheter till att förbättra extraheringen av entiteter.

Evalueringsmetoden som användes visade sig mindre än ideal, då endast en relativ snarare än absolut bedömning kan göras av den slutgiltiga grafen. Därav kan inget slutgiltigt svar ges angående grafens och metodens kvalitet, men resultatet bör vara av intresse för framtida undersökningar.

Place, publisher, year, edition, pages
2015.
Keyword [en]
graph database, documents, NER, named entity recognition, named entity recognizer, relations
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-176042OAI: oai:DiVA.org:kth-176042DiVA: diva2:865639
External cooperation
Findwise AB
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2015-10-29 Created: 2015-10-28 Last updated: 2015-10-29Bibliographically approved

Open Access in DiVA

xjobb_vladan_nikolic(1012 kB)96 downloads
File information
File name FULLTEXT01.pdfFile size 1012 kBChecksum SHA-512
033e97a9475021b10d53e86ec94225e3fa273e5f008b29985e2585b855a99fab3fafb8eb2172eefdd157c2d58d0e4def784e1129373a6d5080d9e36938adc713
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 96 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 219 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf