Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Discovering Networks of Interdependent Features in High-Dimensional Problems
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.ORCID iD: 0000-0002-4922-8415
Show others and affiliations
2016 (English)In: Big Data Analysis: New Algorithms for a New Society / [ed] Japkowicz, Nathalie; Stefanowski, Jerzy, Cham: Springer, 2016, p. 285-304Chapter in book (Refereed)
Abstract [en]

The availability of very large data sets in Life Sciences provided earlier by the technological breakthroughs such as microarrays and more recently by various forms of sequencing has created both challenges in analyzing these data as well as new opportunities. A promising, yet underdeveloped approach to Big Data, not limited to Life Sciences, is the use of feature selection and classification to discover interdependent features. Traditionally, classifiers have been developed for the best quality of supervised classification. In our experience, more often than not, rather than obtaining the best possible supervised classifier, the Life Scientist needs to know which features contribute best to classifying observations (objects, samples) into distinct classes and what the interdependencies between the features that describe the observation. Our underlying hypothesis is that the interdependent features and rule networks do not only reflect some syntactical properties of the data and classifiers but also may convey meaningful clues about true interactions in the modeled biological system. In this chapter we develop further our method of Monte Carlo Feature Selection and Interdependency Discovery (MCFS and MCFS-ID, respectively), which are particularly well suited for high-dimensional problems, i.e., those where each observation is described by very many features, often many more features than the number of observations. Such problems are abundant in Life Science applications. Specifically, we define Inter-Dependency Graphs (termed, somewhat confusingly, ID Graphs) that are directed graphs of interactions between features extracted by aggregation of information from the classification trees constructed by the MCFS algorithm. We then proceed with modeling interactions on a finer level with rule networks. We discuss some of the properties of the ID graphs and make a first attempt at validating our hypothesis on a large gene expression data set for CD4+ T-cells. The MCFS-ID and ROSETTA including the Ciruvis approach offer a new methodology for analyzing Big Data from feature selection, through identification of feature interdependencies, to classification with rules according to decision classes, to construction of rule networks. Our preliminary results confirm that MCFS-ID is applicable to the identification of interacting features that are functionally relevant while rule networks offer a complementary picture with finer resolution of the interdependencies on the level of feature-value pairs.

Place, publisher, year, edition, pages
Cham: Springer, 2016. p. 285-304
Series
Studies in Big Data, ISSN 2197-6503 ; 16
National Category
Bioinformatics and Systems Biology
Research subject
Bioinformatics
Identifiers
URN: urn:nbn:se:uu:diva-325593DOI: 10.1007/978-3-319-26989-4_12ISBN: 978-3-319-26989-4 (print)OAI: oai:DiVA.org:uu-325593DiVA, id: diva2:1115326
Available from: 2017-06-26 Created: 2017-06-26 Last updated: 2017-06-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Diamanti, KlevKomorowski, Jan

Search in DiVA

By author/editor
Diamanti, KlevKomorowski, Jan
By organisation
Computational Biology and Bioinformatics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 341 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf