Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Measuring Information Diffusion in Code Review at Spotify
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0001-8879-6450
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering. fortiss.ORCID iD: 0000-0003-0619-6027
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0002-1729-5154
Spotify.
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Background

Code review, a core practice in software engineering, has been widely studied as a collaborative process, with prior work suggesting it functions as a communication network. Despite its popularity, this theory has not been formalized and remains untested, limiting its practical and theoretical significance.

Objective

This study aims to (1) formalize the theory of code review as a communication network explicit and (2) empirically test its validity by quantifying the extent of information diffusion---the spread of information---in code review across social, organizational, and software architectural boundaries.

Method

We conduct a large-scale empirical analysis of 220,733 code reviews by 2,246 developers at Spotify during 2019. We conceptualize information diffusion along three distinct boundaries: social (dissimilarity among review participants), organizational (involvement of developers across teams), and architectural (interconnections among the components under review).

Results

We find that over 99.6% of review pairs have completely distinct participant sets, indicating high diffusion across social boundaries. Approximately 18% of code reviews involve developers from multiple teams, evidencing nontrivial diffusion across organizational boundaries. Of the 5.82% of code reviews linked to others, 99.0% span distinct repositories, reflecting architectural diffusion.

Conclusion

The substantial diffusion of information across social, organizational, and architectural boundaries empirically supports the theory of code review as a communication network. These findings indicate that code review plays a role not only in quality assurance, but also in enabling communication and coordination in large-scale, distributed software projects. They further support its use as a measurable proxy for cross-border collaboration in the context of tax compliance, but also raise concerns about the impact of integrating LLMs on its communicative function.

Keywords [en]
code review, theory, communication network, information diffusion
National Category
Software Engineering
Research subject
Software Engineering
Identifiers
URN: urn:nbn:se:bth-28564OAI: oai:DiVA.org:bth-28564DiVA, id: diva2:1993897
Part of project
SERT- Software Engineering ReThought, Knowledge FoundationAvailable from: 2025-09-01 Created: 2025-09-01 Last updated: 2025-09-30Bibliographically approved
In thesis
1. Code Review as a Communication Network
Open this publication in new window or tab >>Code Review as a Communication Network
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Background: Modern software systems are often too large and complex for an individual developer to fully oversee, making it difficult to understand the implications of changes. Therefore, most collaborative software projects rely on code review as communication network to foster asynchronous discussions about changes before they are merged. Although prior qualitative studies have revealed that practitioners view code review as a communication network, no formal theory or empirical validation exists. Without formalization and confirmatory evidence, the theory remains uncertain, limiting its credibility, practical relevance, and future development.

Objective: In this thesis, our objective is to (1) formalize the theory of code review as a communication network, (2) empirically evaluate the theory across varied perspectives, contexts, and conditions by quantifying the capability of code review to diffuse information among its participants, (3) demonstrate its practical relevance by applying the theory to the domain of tax compliance in collaborative software engineering, and (4) examine how the role of code review as a communication network for collaborative software engineering may evolve in the future.

Methods: To formalize the theory of code review as a communication network, we developed and validated a simulation model that operationalizes its core propositions about information diffusion among participants. To empirically evaluate the theory, we employed two complementary research approaches. First, we used the simulation model to conduct in silico experiments with closed-source code review systems from Microsoft, Spotify, and Trivago, as well as open-source code review systems from Android, Visual Studio Code, and React, to estimate the upper bound of information diffusion in code review. Second, through an observational study, we quantified the diffusion of information in code review across social, organizational, and architectural boundaries at Spotify. To demonstrate the practical relevance of the theory, we analyzed the code review system of a multinational enterprise as a communication network to reveal the latent collaboration structure among developers across borders, which is taxable. To explore the future of code review as a communication network, we conducted a questionnaire survey with 92 practitioners to gather their expectations and discuss how these anticipated changes may reshape our understanding of code review.

Results: By formalizing the theory of code review as a communication network modelled as a time-varying hypergraph, we were able to empirically demonstrate that traditional time-agnostic models substantially overestimate information diffusion in code review. Throughout our empirical studies, we found substential evidence supporting the theory of code review as a communication network: We confirmed that code review is capable of diffusing information quickly and widely among participants, even at a large scale. We also observed extensive information diffusion across social, organizational, and architectural boundaries at Spotify corroborating our theory. However, we also found that information diffusion patterns in open-source code review systems differ significantly, suggesting that findings from open-source environments may not directly apply to closed-source contexts. Through applying the theory of code review as a communication network in the domain of tax compliance, we were able to uncover the significant and previously unrecognized tax risks associated with collaborative software engineering within multinational enterprises. While practitioners consider code review also in the future a core practice in collaborative software engineering, we identify a potential risk that generative AI may undermine code review’s role as a human communication network.

Conclusion: Our work on understanding code review as a communication network contributes not only to theory-driven, empirical software engineering research but also lays the groundwork for practical applications, particularly in the context of tax compliance. Future research is needed to explore the evolving role of code review as a communication network.

Place, publisher, year, edition, pages
Karlskrona, Sweden: Blekinge Tekniska Högskola, 2025. p. 188
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 2025:10
Keywords
code review, software engineering, tax compliance, collaborative software engineering, communication network
National Category
Software Engineering
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-28424 (URN)978-91-7295-508-0 (ISBN)
Public defence
2025-09-23, J1630, Valhallavägen 1, Karlskrona, 14:00 (English)
Opponent
Supervisors
Available from: 2025-08-22 Created: 2025-08-22 Last updated: 2025-09-30Bibliographically approved

Open Access in DiVA

fulltext(292 kB)35 downloads
File information
File name FULLTEXT01.pdfFile size 292 kBChecksum SHA-512
fa5c169aa7567b282c7a51b7f987a963127e042272647bf72e973738d45d511a18c8593e1ed6e86fb5313dbe12dff7895042c303e8bf8e9d2e633eef9a000f0b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Dorner, MichaelMendez, DanielZabardast, EhsanFloryan, Marcin
By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 35 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 879 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf