Change search
ReferencesLink to record
Permanent link

Direct link
Parallel Community Detection For Cross-Document Coreference
RISE, Swedish ICT, SICS. Computer Systems Laboratory.
RISE, Swedish ICT, SICS. Computer Systems Laboratory.
RISE, Swedish ICT, SICS. Computer Systems Laboratory.
Number of Authors: 3
2014 (English)Report (Other academic)
Abstract [en]

This document presents a highly parallel solution for cross-document coreference resolution, which can deal with billions of documents that exist in the current web. At the core of our solution lies a novel algorithm for community detection in large scale graphs. We operate on graphs which we construct by representing documents' keywords as nodes and the co-location of those keywords in a document as edges. We then exploit the particular nature of such graphs where coreferent words are topologically clustered and can be efficiently discovered by our community detection algorithm. The accuracy of our technique is considerably higher than that of the state of the art, while the convergence time is by far shorter. In particular, we increase the accuracy for a baseline dataset by more than 15\% compared to the best reported result so far. Moreover, we outperform the best reported result for a dataset provided for the Word Sense Induction task in SemEval 2010.

Place, publisher, year, edition, pages
Kista, Sweden: Swedish Institute of Computer Science , 2014, 6.
Series
SICS Technical Report, ISSN 1100-3154 ; 2014:01
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:ri:diva-24302OAI: oai:DiVA.org:ri-24302DiVA: diva2:1043382
Available from: 2016-10-31 Created: 2016-10-31

Open Access in DiVA

fulltext(1451 kB)4 downloads
File information
File name FULLTEXT01.pdfFile size 1451 kBChecksum SHA-512
cb4f3e6154222562110ef8f98c0a4da221ccc74c265066168b4bcc1f7efb0c9fbbe987f7f6209e913223f813b46b4c3f2d09780fac8d75e1ad2b41e7bcdd2bfa
Type fulltextMimetype application/pdf

By organisation
SICS
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 4 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 4 hits
ReferencesLink to record
Permanent link

Direct link