CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases
Show others and affiliations
2019 (English)In: Methods in Ecology and Evolution, ISSN 2041-210X, E-ISSN 2041-210X, Vol. 10, no 5, p. 744-751Article in journal (Refereed) Published
Abstract [en]

Species occurrence records from online databases are an indispensable resource in ecological, biogeographical and palaeontological research. However, issues with data quality, especially incorrect geo-referencing or dating, can diminish their usefulness. Manual cleaning is time-consuming, error prone, difficult to reproduce and limited to known geographical areas and taxonomic groups, making it impractical for datasets with thousands or millions of records.

Here, we present CoordinateCleaner, an r-package to scan datasets of species occurrence records for geo-referencing and dating imprecisions and data entry errors in a standardized and reproducible way. CoordinateCleaner is tailored to problems common in biological and palaeontological databases and can handle datasets with millions of records. The software includes (a) functions to flag potentially problematic coordinate records based on geographical gazetteers, (b) a global database of 9,691 geo-referenced biodiversity institutions to identify records that are likely from horticulture or captivity, (c) novel algorithms to identify datasets with rasterized data, conversion errors and strong decimal rounding and (d) spatio-temporal tests for fossils.

We describe the individual functions available in CoordinateCleaner and demonstrate them on more than 90million occurrences of flowering plants from the Global Biodiversity Information Facility (GBIF) and 19,000 fossil occurrences from the Palaeobiology Database (PBDB). We find that in GBIF more than 3.4 million records (3.7%) are potentially problematic and that 179 of the tested contributing datasets (18.5%) might be biased by rasterized coordinates. In PBDB, 1205 records (6.3%) are potentially problematic.

All cleaning functions and the biodiversity institution database are open-source and available within the CoordinateCleaner r-package.

Place, publisher, year, edition, pages
Wiley-Blackwell, 2019. Vol. 10, no 5, p. 744-751
Keywords [en]
biodiversity institutions, data quality, fossils, GBIF, geo-referencing, palaeobiology database (PBDB), r package, species distribution modelling
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:umu:diva-161543DOI: 10.1111/2041-210X.13152ISI: 000471332800014OAI: oai:DiVA.org:umu-161543DiVA, id: diva2:1336848
Funder
Swedish Research Council, 2015-04748Swedish Foundation for Strategic Research Wallenberg FoundationsAvailable from: 2019-07-10 Created: 2019-07-10 Last updated: 2019-07-10Bibliographically approved

Open Access in DiVA

fulltext(1009 kB)68 downloads
File information
File name FULLTEXT01.pdfFile size 1009 kBChecksum SHA-512
9d60128d74abfcd92df8a358ab3c5b812715f60366557c433e8064733bde7afaf2bc56c1668397091be8922cfb2d19c13cb28bd3b8a57759f1cbf9d93ba6b353
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Zizka, AlexanderSilvestro, DanieleEdler, DanielHerdean, Andrei
By organisation
Department of Physics
In the same journal
Methods in Ecology and Evolution
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 68 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 111 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf