Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Natural Language Processing in Software Engineering: A Systematic Literature Review
Universidade Federal de São Paulo, Brazil.ORCID iD: 0009-0000-6804-9474
Universidade Federal de São Paulo, Brazil.ORCID iD: 0000-0002-7266-5840
Universidade Estadual de Campinas, Brazil.
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0002-0535-1761
Show others and affiliations
2025 (English)In: Journal of Software Engineering Research and Development, Vol. 13, no 2Article, review/survey (Refereed) Published
Abstract [en]

Context: Software engineering (SE) artifacts and documents, such as requirements specifications, user stories, test cases, and concepts of operations (ConOps), are typically written in natural language, making their manipulation challenging. Natural Language Processing (NLP) is a viable solution for managing these tasks.

Objective: To conduct a systematic literature review to explore the current use of NLP in SE artifacts and tasks, supplementedby a tertiary study focusing on the emerging role of Large Language Models (LLMs) in software engineering re-search.

Method: We searched digital libraries for relevant papers and applied inclusion and exclusion criteria to filter the primary studies. We then analyzed NLP techniques applied to SE documents and examined their usage in this context. Our research methodology followed Kitchenham and Charters’ guidelines. Additionally, we conducted a tertiary study to synthesize findings from existing systematic literature reviews and surveys specifically addressing LLMs in software engineering.

Results: We selected 60 primary studies to identify the most common methods for NLP pipelines, feature extraction, language models, and machine learning algorithms used in SE. We also assessed the purposes of these methods, their benefits for SE, their difficulty, and their contribution to SE advancement. The tertiary study revealed a rapid proliferation of LLM-focused research, with comprehensive reviews documenting exponential growth in publications and widespread adoption across diverse SE tasks.

Conclusion: Requirements are the most frequently addressed artifacts using NLP techniques, with preprocessing and part-of-speech (POS) tagging being widely used. There is a notable increase in the use of large language models for various SE tasks, such as requirements elicitation, source code generation, bug fixing, and software testing. The tertiary study confirms that LLMs represent a pivotal shift in the research landscape, warranting dedicated investigation to understand their transformative impact on NLP applications in software engineering.

Place, publisher, year, edition, pages
Sociedad Brasileira de Computacao , 2025. Vol. 13, no 2
Keywords [en]
Natural Language Processing, Software Engineering, Machine Learning, Literature Review
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-28959DOI: 10.5753/jserd.2025.5097OAI: oai:DiVA.org:bth-28959DiVA, id: diva2:2018848
Available from: 2025-12-04 Created: 2025-12-04 Last updated: 2025-12-04Bibliographically approved

Open Access in DiVA

fulltext(509 kB)35 downloads
File information
File name FULLTEXT01.pdfFile size 509 kBChecksum SHA-512
b639df46bd446125489853261e0857b0a664937c255c1b903d49fefdf05b9438ef54b4cc67fb3a2918f22b2d0e2f53e4471a7437e68f18b0ecf1b846199ff27c
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Nogueira Pacheco, GabrielGalvão Martins, Luiz EduardoLavesson, NiklasGorschek, Tony
By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 806 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf