uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Feature Selection and Classification of cDNA Microarray Samples in ROSETTA
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2008 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The advent of cDNA microarray technology makes it possible to measure theexpression level of thousands of genes simultaneously. This creates large volumes of data that require computational analysis.

One application of microarray technology is cancer studies, where supervised learning may be used on microarray data to predicting tumour subtypes and other clinical parameters. The number of available objects (microarrays) is much smaller than the number of attributes (genes). Hence it is necessary to determine which attributes are important for predicting a parameter. Feature selection methods determine which parameters are related to the predicted parameter, and classifiers are then trained on data sets consisting only of the selected attributes.

This thesis examines the performance of several feature selection methods on real life data sets. The implementation is based on ROSETTA, a toolkit that contains several rough set learning algorithms as well as discretization methods, but lacks algorithms for performing features selection. These missing algorithms are written in the C++ programming language as part of this thesis.

The conclusion is that even though it appears to perform well in binary classification, the current implementation of multi-class classification does not perform as well the other methods studied as part of this thesis. If multi-class classification using binary classifiers and additional optimization was implemented, then it would be possible to compare the performance of rough set based classifiers to other method in a fair and meaningful way.

Place, publisher, year, edition, pages
2008.
Series
UPTEC IT, ISSN 1401-5749 ; 08 006
Identifiers
URN: urn:nbn:se:uu:diva-88731OAI: oai:DiVA.org:uu-88731DiVA, id: diva2:158952
Presentation
(English)
Uppsok
Technology
Supervisors
Examiners
Available from: 2009-02-10 Created: 2009-02-05 Last updated: 2009-11-18Bibliographically approved

Open Access in DiVA

fulltext(4041 kB)1009 downloads
File information
File name FULLTEXT01.pdfFile size 4041 kBChecksum SHA-512
b4f1b1fe6ddc8093aaca4cb0e070551a531bc60e0ed90ec9539011fb0f5ecfdacb5180ce0a7665f68d59c3f6f2c65ac282b274f4dbe75f20bce461d61261cec3
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 1009 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 733 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf