Analysis of stock forum texts to examine correlation to stock prices
Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
In this thesis, four methods of classification from statistical learning have been used to examine correlations between stock forum discussions and stock prices. The classifiers Naive Bayes, support vector machine, AdaBoost and random forest, were used on text data from two different stock forums to see if the text had any predictive power for the stock price of five different companies. The volatility and the direction of the price - whether it would go up or down - over a day was measured. The highest accuracy obtained for predicting high or low volatility came from random forest and was 85.2 %. For price difference the highest accuracy was 69.2 %, using the support vector machine. The average accuracy for predicting the price difference was 58.6 % and the average accuracy for predicting the volatility was 73.4 %. This thesis was made in collaboration with the company Scila which works with stock market security.
Place, publisher, year, edition, pages
2016. , 47 p.
UPTEC F, ISSN 1401-5757 ; 16030
statistical learning, machine learning
Computer and Information Science Engineering and Technology
IdentifiersURN: urn:nbn:se:uu:diva-298484OAI: oai:DiVA.org:uu-298484DiVA: diva2:946691
Master Programme in Engineering Physics
Nyberg, TomasAshcroft, Michael