uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Modelling of patterns between operational data, diagnostic trouble codes and workshop history using big data and machine learning
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
2016 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The work presented in this thesis is part of a large research and development project on condition-based maintenance for heavy trucks and buses at Scania. The aim of this thesis was to be able to predict the status of a component (the starter motor) using data mining methods and to create models that can predict the failure of that component. Based on workshop history data, error codes and operational data, three sets of classification models were built and evaluated. The first model aims to find patterns in a set of error codes, to see which codes are related to a starter motor failure. The second model aims to see if there are patterns in operational data that lead to the occurrence of an error code. Finally, the two data sets were merged and a classifier was trained and evaluated on this larger data set. Two machine learning algorithms were used and compared throughout the model building: AdaBoost and random forest. There is no statistically significant difference in their performance, and both algorithms had an error rate around ~13%, ~5% and ~13% for the three classification models respectively. However, random forest is much faster, and is therefore the preferable option for an industrial implementation. Variable analysis was conducted for the error codes and operational data, resulting in rankings of informative variables. From the evaluation metric precision, it can be derived that if our random forest model predicts a starter motor failure, there is a 85.7% chance that it actually has failed. This model finds 32% (the models recall) of the failed starter motors. It is also shown that four error codes; 2481, 2639, 2657 and 2597 have the highest predictive power for starter motor failure classification. For the operational data, variables that concern the starter motor lifetime and battery health are generally ranked as important by the models. The random forest model finds 81.9% of the cases where the 2481 error code occurs. If the random forest model predicts that the error code 2481 will occur, there is a 88.2% chance that it will. The classification performance was not increased when the two data sets were merged, indicating that the patterns detected by the two first classification models do not add value toone another.

Place, publisher, year, edition, pages
2016. , 64 p.
Series
UPTEC STS, ISSN 1650-8319 ; 16002
Keyword [en]
Data mining, random forest, adaboost, error codes
National Category
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-279823OAI: oai:DiVA.org:uu-279823DiVA: diva2:909003
External cooperation
Scania
Educational program
Systems in Technology and Society Programme
Supervisors
Examiners
Available from: 2016-03-04 Created: 2016-03-04 Last updated: 2016-03-04Bibliographically approved

Open Access in DiVA

fulltext(1690 kB)381 downloads
File information
File name FULLTEXT01.pdfFile size 1690 kBChecksum SHA-512
58f3eac4217ff4f5309b89c22dc15503da63abb80f200ac9b70e4f99f1fe0b5945504149479109a61b7b3c13fc91514ca747073c645f8da2ba17a4873d913288
Type fulltextMimetype application/pdf

By organisation
Computing Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 381 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 884 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf