Rule-Based Approaches for Large Biological Datasets Analysis: A Suite of Tools and Methods
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
This thesis is about new and improved computational methods to analyze complex biological data produced by advanced biotechnologies. Such data is not only very large but it also is characterized by very high numbers of features. Addressing these needs, we developed a set of methods and tools that are suitable to analyze large sets of data, including next generation sequencing data, and built transparent models that may be interpreted by researchers not necessarily expert in computing. We focused on brain related diseases.
The first aim of the thesis was to employ the meta-server approach to finding peaks in ChIP-seq data. Taking existing peak finders we created an algorithm that produces consensus results better than any single peak finder.
The second aim was to use supervised machine learning to identify features that are significant in predictive diagnosis of Alzheimer disease in patients with mild cognitive impairment. This experience led to a development of a better feature selection method for rough sets, a machine learning method.
The third aim was to deepen the understanding of the role that STAT3 transcription factor plays in gliomas. Interestingly, we found that STAT3 in addition to being an activator is also a repressor in certain glioma rat and human models. This was achieved by analyzing STAT3 binding sites in combination with epigenetic marks. STAT3 regulation was determined using expression data of untreated cells and cells after JAK2/STAT3 inhibition.
The four papers constituting the thesis are preceded by an exposition of the biological, biotechnological and computational background that provides foundations for the papers.
The overall results of this thesis are witness of the mutually beneficial relationship played by Bioinformatics in modern Life Sciences and Computer Science.
Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2013. , 40 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1066
Rough sets, peak finding, gliomas, Alzheimer disease, STAT3, machine learning, feature selection, next generation sequencing
Cell and Molecular Biology Bioinformatics and Systems Biology Bioinformatics (Computational Biology)
IdentifiersURN: urn:nbn:se:uu:diva-206137ISBN: 978-91-554-8733-1OAI: oai:DiVA.org:uu-206137DiVA: diva2:644044
2013-10-11, C8:301, Husargatan 3, Uppsala, 13:00 (English)
List of papers