Many companies today, in different fields of operations and sizes, have access
to a vast amount of data which was not available only a couple of years ago.
This situation gives rise to questions regarding how to organize and use the
data in the best way possible.
In this thesis a large database of pricing data for products within various market
segments is analysed. The pricing data is from both external and internal
sources and is therefore confidential. Because of the confidentiality, the labels
from the database are in this thesis substituted with generic ones and the
company is not referred to by name, but the analysis is carried out on the
real data set. The data is from the beginning unstructured and difficult to
overlook. Therefore, it is first classified. This is performed by feeding some
manual training data into an algorithm which builds a decision tree. The
decision tree is used to divide the rest of the products in the database into
classes. Then, for each class, a multivariate time series model is built and each
product’s future price within the class can be predicted. In order to interact
with the classification and price prediction, a front end is also developed.
The results show that the classification algorithm both is fast enough to operate
in real time and performs well. The time series analysis shows that it is possible
to use the information within each class to do predictions, and a simple vector
autoregressive model used to perform it shows good predictive results.
2014. , 79 p.