uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rule-Based Approaches for Large Biological Datasets Analysis: A Suite of Tools and Methods
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis is about new and improved computational methods to analyze complex biological data produced by advanced biotechnologies. Such data is not only very large but it also is characterized by very high numbers of features. Addressing these needs, we developed a set of methods and tools that are suitable to analyze large sets of data, including next generation sequencing data, and built transparent models that may be interpreted by researchers not necessarily expert in computing. We focused on brain related diseases.

The first aim of the thesis was to employ the meta-server approach to finding peaks in ChIP-seq data. Taking existing peak finders we created an algorithm that produces consensus results better than any single peak finder.

The second aim was to use supervised machine learning to identify features that are significant in predictive diagnosis of Alzheimer disease in patients with mild cognitive impairment. This experience led to a development of a better feature selection method for rough sets, a machine learning method. 

The third aim was to deepen the understanding of the role that STAT3 transcription factor plays in gliomas. Interestingly, we found that STAT3 in addition to being an activator is also a repressor in certain glioma rat and human models. This was achieved by analyzing STAT3 binding sites in combination with epigenetic marks. STAT3 regulation was determined using expression data of untreated cells and cells after JAK2/STAT3 inhibition.

The four papers constituting the thesis are preceded by an exposition of the biological, biotechnological and computational background that provides foundations for the papers.

The overall results of this thesis are witness of the mutually beneficial relationship played by Bioinformatics in modern Life Sciences and Computer Science.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2013. , 40 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1066
Keyword [en]
Rough sets, peak finding, gliomas, Alzheimer disease, STAT3, machine learning, feature selection, next generation sequencing
National Category
Cell and Molecular Biology Bioinformatics and Systems Biology Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-206137ISBN: 978-91-554-8733-1 (print)OAI: oai:DiVA.org:uu-206137DiVA: diva2:644044
Public defence
2013-10-11, C8:301, Husargatan 3, Uppsala, 13:00 (English)
Opponent
Supervisors
Available from: 2013-09-19 Created: 2013-08-28 Last updated: 2014-01-23
List of papers
1. Peak Finder Metaserver - a novel application for finding peaks in ChIP-seq data
Open this publication in new window or tab >>Peak Finder Metaserver - a novel application for finding peaks in ChIP-seq data
(English)In: Article in journal (Refereed) Submitted
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:uu:diva-206126 (URN)
Available from: 2013-08-28 Created: 2013-08-28 Last updated: 2017-02-02
2. Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets
Open this publication in new window or tab >>Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets
Show others...
2013 (English)In: Fundamenta Informaticae, ISSN 0169-2968, E-ISSN 1875-8681, Vol. 127, no 1-4, 273-288 p.Article in journal (Refereed) Published
Abstract [en]

An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts (RR) - a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection (MCFS) layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set.

National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-206127 (URN)10.3233/FI-2013-909 (DOI)000325745600021 ()
Available from: 2013-08-28 Created: 2013-08-28 Last updated: 2017-12-06Bibliographically approved
3. Monte Carlo feature selection and rule-based models to predict Alzheimer's disease in mild cognitive impairment
Open this publication in new window or tab >>Monte Carlo feature selection and rule-based models to predict Alzheimer's disease in mild cognitive impairment
Show others...
2012 (English)In: Journal of neural transmission, ISSN 0300-9564, E-ISSN 1435-1463, Vol. 119, no 7, 821-831 p.Article in journal (Refereed) Published
Abstract [en]

The objective of the present study was to evaluate a Monte Carlo feature selection (MCFS) and rough set Rosetta pipeline for generating rule-based models as a tool for comprehensive risk estimates for future Alzheimer's disease (AD) in individual patients with mild cognitive impairment (MCI). Risk estimates were generated on the basis of age, gender, Mini-Mental State Examination scores, apolipoprotein E (APOE) genotype and the cerebrospinal fluid (CSF) biomarkers total tau (T-tau), phospho-tau(181) (P-tau) and the 42 amino acid form of amyloid beta (A beta 42) in two sets of longitudinally followed MCI patients (n = 217 in total). The predictive model was created in Rosetta, evaluated with the standard tenfold cross-validation approach and tested on an external set. Features were ranked and selected by the MCFS algorithm. Using the combined pipeline of MCFS and Rosetta, it was possible to predict AD among patients with MCI with an area under the receiver operating characteristics curve of 0.92. Risk estimates were produced for the individual patients and showed good correlation with actual diagnosis in cross validation, and on an external dataset from a new study. Analysis of the importance of attributes showed that the biochemical CSF markers contributed the most to the predictions, and that added value was gained by combining several biochemical markers. Despite a correlation with the biochemical markers, the genetic marker APOE epsilon 4 did not contribute to the predictive power of the model.

Keyword
Alzheimer's disease, Decision support, Monte Carlo feature selection, Rosetta, Rough sets, Biomarkers, Cerebrospinal fluid
National Category
Neurosciences Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-177572 (URN)10.1007/s00702-012-0812-0 (DOI)000305525800012 ()
Available from: 2012-07-16 Created: 2012-07-16 Last updated: 2017-12-07Bibliographically approved
4. Integration of genome-wide of Stat3 binding and epigenetic modifications with transcriptome allowed identification of novel Stat3 target genes in glioma cells
Open this publication in new window or tab >>Integration of genome-wide of Stat3 binding and epigenetic modifications with transcriptome allowed identification of novel Stat3 target genes in glioma cells
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Cell and Molecular Biology Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:uu:diva-206129 (URN)
Available from: 2013-08-28 Created: 2013-08-28 Last updated: 2013-09-19

Open Access in DiVA

fulltext(1054 kB)446 downloads
File information
File name FULLTEXT01.pdfFile size 1054 kBChecksum SHA-512
08e477d8d7f2f2977e206acf232dd980654f0e157ea2aba2861bd1efa5afd2b37d7ff29385a16dfbc435163da8a3af5e52c143f67bfafba9b41100481cddb6cc
Type fulltextMimetype application/pdf
Buy this publication >>

Cell and Molecular BiologyBioinformatics and Systems BiologyBioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 446 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 537 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf