Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Code Cloning Habits Of The Jupyter Notebook Community
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Code reuse has the benefits of saving time and resources but poses a risk whenattempting to tailor copied code for a new purpose or in cases when such copies arebuggy or otherwise faulty. In the field of data science, the web application JupyterNotebook is a popular tool for creating computational notebooks, documentscontaining both plain text and code snippets, many of which are publicly available oncode hosting sites such as GitHub. This thesis describes the acquisition ofapproximately 2.6 million computational notebooks and analysis of this data set.By hashing the contents of every code snippet, using the MD5 hashing algorithm,cloned snippets were found through snippets producing identical hashes. Bysubsequently mapping the snippets to their corresponding notebooks, the relativeoriginality of a notebook could be determined. This analysis shows that nearly 95% ofnotebooks are written in some version of Python. Furthermore, nearly 54% ofnotebooks in the data set are comprised of code blocks also found in othernotebooks and, on average, approximately 70% of the code in any given notebookis copied from elsewhere.

Place, publisher, year, edition, pages
2019. , p. 50
Series
IT ; 19032
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-396822OAI: oai:DiVA.org:uu-396822DiVA, id: diva2:1369190
Educational program
Bachelor Programme in Computer Science
Supervisors
Examiners
Available from: 2019-11-11 Created: 2019-11-11 Last updated: 2019-11-11Bibliographically approved

Open Access in DiVA

fulltext(947 kB)14 downloads
File information
File name FULLTEXT01.pdfFile size 947 kBChecksum SHA-512
821850868ef08b31b3e760fc935ce837880dbfa90f458f07767f05e4521b9d0d4e05be79969ee3b088a88e3c14178f39a2962624de955dee7d3207388658894b
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 14 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 9 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf