Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Explainable AI as a Defence Mechanism for Adversarial Examples
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Förklarbar AI som en försvarsmekanism mot motstridiga exempel (Swedish)
Abstract [en]

Deep learning is the gold standard for image classification tasks. With its introduction came many impressive improvements in computer vision outperforming all of the earlier machine learning models. However, in contrast to the success it has been shown that deep neural networks are easily fooled by adversarial examples, data that have been modified slightly to cause the neural networks to make incorrect classifications. This significant disadvantage has caused an increased doubt in neural networks and it has been questioned whether or not they are safe to use in practice. In this thesis we propose a new defence mechanism against adversarial examples that utilizes the explainable AI metrics of neural network predictions to filter out adversarial examples prior to model interference. We evaluate the filters against various attacks and models targeted at the MNIST, Fashion-MNIST, and Cifar10 datasets. The results show that the filters can detect adversarial examples constructed with regular attacks but that they are not robust against adaptive attacks that specifically utilizes the architecture of the defence mechanism.

Abstract [sv]

Djupinlärning är den bästa metoden för bildklassificeringsuppgifter. Med dess introduktion kom många imponerande förbättringar inom datorseende som överträffade samtliga tidigare maskininlärningsmodeller. Samtidigt har det i kontrast till alla framgångar visat sig att djupa neuronnät lätt luras av motstridiga exempel, data som har modifierats för att få neurala nätverk att göra felaktiga klassificeringar. Denna nackdel har orsakat ett ökat tvivel gällande huruvida neuronnät är säkra att använda i praktiken. I detta examensarbete föreslås en ny försvarsmekanism mot motstridiga exempel som utnyttjar förklarbar AI för att filtrera bort motstridiga exempel innan de kommer i kontakt med modellerna. Vi utvärderar filtren mot olika attacker och modeller riktade till MNIST-, Fashion-MNIST-, och Cifar10-dataseten. Resultaten visar att filtren kan upptäcka motstridiga exempel konstruerade med vanliga attacker, men att de inte är robusta mot adaptiva attacker som specifikt utnyttjar försvarsmekanismens arkitektur.

Place, publisher, year, edition, pages
2019. , p. 43
Series
TRITA-EECS-EX ; 2019:466
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-260347OAI: oai:DiVA.org:kth-260347DiVA, id: diva2:1355328
Supervisors
Examiners
Available from: 2019-10-08 Created: 2019-09-27 Last updated: 2019-10-08Bibliographically approved

Open Access in DiVA

fulltext(3229 kB)5 downloads
File information
File name FULLTEXT01.pdfFile size 3229 kBChecksum SHA-512
c8f9fba7b7c064aa3c5c3da775c9c9f9d2dfddb3aa6706b33d03ccbf11cc42f414bf43d44c41ceae41d8965766253cfb8306e18a43f00d26032c10dd8845287e
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 5 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 23 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf