Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Behaviour of logits in adversarial examples: a hypothesis
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2017 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Beteendet hos logits för kontradiktorisk indata: en hypotes (Swedish)
Abstract [en]

It has been suggested that the existence of adversarial examples, i.e. slightly perturbed images that are classified incorrectly, imply that the theory that deep neural networks learn to identify a hierarchy of concepts does not hold, or that the network has not managed to learn the true underlying concepts. Previous work has however only reported that adversarial examples are misclassified or the output probabilities of the network, neither of which give a good understanding of the activations inside the network.

We propose a hypothesis concerning the input to the final softmax layer, i.e. the logits. More precisely, that the logit of the target class does not increase when the perturbation is applied. When experimentally testing of this hypothesis using a single network architecture and attack algorithm we find that it does not hold.

Abstract [sv]

Det har föreslagits att förekomsten av kontradiktorisk indata (adversarial examples), dvs indata med en liten förändring som blir felklassificerad, implicerar att teorin att djupa neurala nätverk lär sig att identifiera en hierarki av koncept är felaktigt, eller att nätverket inte har lärt sig att identifiera de korrekta koncepten. Tidigare artiklar har emellertid endast rapporterat att kontradiktorisk indata blir felklassificerad eller de sannolikheter som nätverket ger som utdata. Inget av dessa mått ger en bra insikt i aktiviteten inuti nätverket.

Vi föreslår en hypotes rörande indata till softmax-lagret, de så kallade logits. Mer precist är vår hypotes att värdet hos logiten för målklassen inte ökar när indata till nätverket ändras så att det blir kontradiktoriskt. Vi testar vår hypotes på en nätverksarkitektur och med en attackalgoritm och finner att den inte stämmer.

Place, publisher, year, edition, pages
2017.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-208906OAI: oai:DiVA.org:kth-208906DiVA: diva2:1108742
Supervisors
Examiners
Available from: 2017-06-17 Created: 2017-06-12 Last updated: 2017-06-17Bibliographically approved

Open Access in DiVA

fulltext(5670 kB)54 downloads
File information
File name FULLTEXT01.pdfFile size 5670 kBChecksum SHA-512
267cd2d3466f98ac2d7f0f2b259955bed099e75f5d8ecc02be25c8dd4af7cadfb61a2caf4020babac47a5e2afcb4bb4ce88ab2c8b2779621f73f696b43178fb5
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Svedin, MartinGeuna, Trolle
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 54 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 178 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf