Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A comparison of image and object level annotation performance of image recognition cloud services and custom Convolutional Neural Network models
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Recent advancements in machine learning has contributed to an explosive growth of the image recognition field. Simultaneously, multiple Information Technology (IT) service providers such as Google and Amazon have embraced cloud solutions and software as a service. These factors have helped mature many computer vision tasks from scientific curiosity to practical applications. As image recognition is now accessible to the general developer community, a need arises for a comparison of its capabilities, and what can be gained from choosing a cloud service over a custom implementation.

This thesis empirically studies the performance of five general image recognition services (Google Cloud Vision, Microsoft Computer Vision, IBM Watson, Clarifai and Amazon Rekognition) and image recognition models of the Convolutional Neural Network (CNN) architecture that we ourselves have configured and trained. Image and object level annotations of images extracted from different datasets were tested, both in their original state and after being subjected to one of the following six types of distortions: brightness, color, compression, contrast, blurriness and rotation. The output labels and confidence scores were compared to the ground truth of multiple levels of concepts, such as food, soup and clam chowder.

The results show that out of the services tested, there is currently no clear top performer over all categories and they all have some variations and similarities in their output, but on average Google Cloud Vision performs the best by a small margin. The services are all adept at identifying high level concepts such as food and most mid-level ones such as soup. However, in terms of further specifics, such as clam chowder, they start to vary, some performing better than others in different categories. Amazon was found to be the most capable at identifying multiple unique objects within the same image, on the chosen dataset. Additionally, it was found that by using synonyms of the ground truth labels, performance increased as the semantic gap between our expectations and the actual output from the services was narrowed. The services all showed vulnerability to image distortions, especially compression, blurriness and rotation. The custom models all performed noticeably worse, around half as well compared to the cloud services, possibly due to the difference in training data standards. The best model, configured with three convolutional layers, 128 nodes and a layer density of two, reached an average performance of almost 0.2 or 20%.

In conclusion, if one is limited by a lack of experience with machine learning, computational resources and time, it is recommended to make use of one of the cloud services to reach a more acceptable performance level. Which to choose depends on the intended application, as the services perform differently in certain categories. The services are all vulnerable to multiple image distortions, potentially allowing adversarial attacks. Finally, there is definitely room for improvement in regards to the performance of these services and the computer vision field as a whole.

Place, publisher, year, edition, pages
2019. , p. 43
Keywords [en]
machine learning, cnn, image recognition, cloud services
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-18074OAI: oai:DiVA.org:bth-18074DiVA, id: diva2:1327682
Subject / course
PA1445 Kandidatkurs i Programvaruteknik
Educational program
PAGPT Software Engineering
Supervisors
Examiners
Available from: 2019-06-24 Created: 2019-06-19 Last updated: 2019-06-24Bibliographically approved

Open Access in DiVA

fulltext(4817 kB)56 downloads
File information
File name FULLTEXT01.pdfFile size 4817 kBChecksum SHA-512
76f8f8eb51cf8c644072bc79617985040bb77afe3b707ff3a0a0ea18d2a2dead8d9bf6ba735f3baa4cbdc207f1ce9f40258d4432d4b4db5d7c438f36dd209889
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nilsson, KristianJönsson, Hans-Eric
By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 56 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 253 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf