What is the Required Level of Data Cleaning? A Research Evaluation Case
2016 (English)In: Journal of Scientometric Research, ISSN 2321-6654, Vol. 5, no 1, 7-12 p.Article in journal (Refereed) Published
Bibliometric methods depend heavily on the quality of data, and cleaning and disambiguating data are very time-consuming. Therefore, quite some effort is devoted to the development of better and faster tools for disambiguating of the data (e.g., Gurney et al. 2012). Parallel to this, one may ask to what extent data cleaning is needed, given the intended use of the data. To what extent is there a trade-off between the type of questions asked and the level of cleaning and disambiguating required? When evaluating individuals, a very high level of data cleaning is required, but for other types of research questions, one may accept certain levels of error, as long as these errors do not correlate with the variables under study. In this paper, we present an earlier case study with a rather crude way of data handling as it was expected that the unavoidable error would even out. In this paper, we do a sophisticated data cleaning and disambiguation of the same dataset, and then do the same analysis as before. We compare the results and discuss conclusions about required data cleaning What is the Required Level of Data Cleaning? A Research Evaluation Case.
Place, publisher, year, edition, pages
Wolters Kluwer Health and Medknow Publications , 2016. Vol. 5, no 1, 7-12 p.
Coupling data sets, Data cleaning disambiguation, Data error
Other Social Sciences not elsewhere specified
Research subject Industrial Engineering and Management
IdentifiersURN: urn:nbn:se:kth:diva-191463DOI: 10.5530/jscires.5.1.3OAI: oai:DiVA.org:kth-191463DiVA: diva2:956590
QC 201609072016-08-302016-08-302016-09-07Bibliographically approved