Optimizing t-SNE using random sampling techniques
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
The main topic of this thesis concerns t-SNE, a dimensionality reduction technique that has gained much popularity for showing great capability of preserving well-separated clusters from a high-dimensional space. Our goal with this thesis is twofold. Firstly we give an introduction to the use of dimensionality reduction techniques in visualization and, following recent research, show that t-SNE in particular is successful at preserving well-separated clusters. Secondly, we perform a thorough series of experiments that give us the ability to draw conclusions about the quality of embeddings from running t-SNE on samples of data using different sampling techniques. We are comparing pure random sampling, random walk sampling and so-called hubness sampling on a dataset, attempting to find a sampling method that is consistently better at preserving local information than simple random sampling. Throughout our testing, a specific variant of random walk sampling distinguished itself as a better alternative to pure random sampling.
Place, publisher, year, edition, pages
2019. , p. 38
Keywords [en]
Machine learning, t-SNE, visualization, sampling
National Category
Mathematics
Identifiers
URN: urn:nbn:se:lnu:diva-88585OAI: oai:DiVA.org:lnu-88585DiVA, id: diva2:1345416
Subject / course
Mathematics
Educational program
Applied Mahtematics Programme, 180 credits
Presentation
2019-06-05, D1072, P G Vejdes väg 29, Växjö, 08:00 (English)
Supervisors
Examiners
2019-08-262019-08-232019-08-26Bibliographically approved