Software Defects Classification Prediction Based On Mining Software Repository
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
An important goal during the cycle of software development is to find and fix existing defects as early as possible. This has much to do with software defects prediction and management. Nowadays,many big software development companies have their own development repository, which typically includes a version control system and a bug tracking system. This has no doubt proved useful for software defects prediction. Since the 1990s researchers have been mining software repository to get a deeper understanding of the data. As a result they have come up with some software defects prediction models the past few years. There are basically two categories among these prediction models. One category is to predict how many defects still exist according to the already captured defects data in the earlier stage of the software life-cycle. The other category is to predict how many defects there will be in the newer version software according to the earlier version of the software defects data. The complexities of software development bring a lot of issues which are related with software defects. We have to consider these issues as much as possible to get precise prediction results, which makes the modeling more complex.
This thesis presents the current research status on software defects classification prediction and the key techniques in this area, including: software metrics, classifiers, data pre-processing and the evaluation of the prediction results. We then propose a way to predict software defects classification based on mining software repository. A way to collect all the defects during the development of software from the Eclipse version control systems and map these defects with the defects information containing in software defects tracking system to get the statistical information of software defects, is described. Then the Eclipse metrics plug-in is used to get the software metrics of files and packages which contain defects. After analyzing and preprocessing the dataset, the tool(R) is used to build a prediction models on the training dataset, in order to predict software defects classification on different levels on the testing dataset, evaluate the performance of the model and comparedifferent models’ performance.
Place, publisher, year, edition, pages
IT, 14 004
Engineering and Technology
IdentifiersURN: urn:nbn:se:uu:diva-216554OAI: oai:DiVA.org:uu-216554DiVA: diva2:690303
Master Programme in Computer Science
Feng, Du Qing
Christoff, IvanEriksson, Olle