Predicting user preferences is a common problem for many companies and
services. With the growth of Internet services it becomes both more important
and more lucrative being able to predict what products a user would like and
then recommend these to them. There are many ways of attempting this,
but this study attempts to use random indexing to solve the same problem.
Random indexing is a method that has been used successfully when studying
the similarity between words, and allows entities to be represented as vectors
with relatively small dimensionality. This would allow for fast and memory-efficient implementations of prediction systems.
This study uses the dataset Amazon Fine Food Reviews, which contains
reviews of products with a rating. It is attempted to predict these ratings, and
the result of random indexing is compared to the results on the same dataset
using collaborative filtering. Various parameters used in the random indexing
method are also varied, to study their effect on the results. These methods are
evaluated based on root mean square error and mean absolute error.
The results indicate that random indexing does not generate as good results
as collaborative filtering. However, the difference is small enough to warrant
further study into the other strengths of random indexing, such as speed and
memory efficiency. It is theorized that the sparsity of the dataset might have
caused the differences in errors between the methods, and with a dense dataset
the results might be better.