AbstractFor many years molecular fingerprints and machine learning algorithms havebeen widely adapted and applied in quantitative structure-activity relationship(QSAR) modelling. In this study, the focus was on ligand-based modellingwhere classical machine learning techniques such as Random Forest, C-SupportVector Classifier (C-SVC) and Neural Network were utilized on datasets ob-tained from biological assays with its compounds represented as Morgan finger-prints. More specifically, predictive performance and computational cost of themachine learning algorithms, as well as the Morgan algorithm were evaluated.Results illustrated that there weren’t clear differences in terms of predictiveperformance between these classical algorithms. However, increasing hash sizesof the Morgan algorithm had a strong positive effect on predictive performancein every case, with the unhashed version outperforming the hashed versions. In-creased fingerprint radius had slight negative trends on ROC-AUC scores withRandom Forest and slight positive trends with C-SVC and Neural Network.The FEST implementation of Random Forest and C-SVC were highly efficientwith regards to memory usage and runtime respectively. Sckit-learn’s RandomForest Classifier (RFC) showed great robustness in predictive performance —where Neural Network, FEST and particularly C-SVC were more sensitive tohyperparameter settings. Considering feasibility, the Random Forest would bea valid initial baseline model to implement — and the superior predictive per-formance of unhashed Morgan fingerprints, suggest that hashing (compression)of the fingerprints should be avoided