uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
BETA
Umer, Husen Muhammad
Publications (6 of 6) Show all publications
Umer, H. M. (2018). Computational Modelling of Gene Regulation in Cancer: Coding the noncoding genome. (Doctoral dissertation). Uppsala: Acta Universitatis Upsaliensis
Open this publication in new window or tab >>Computational Modelling of Gene Regulation in Cancer: Coding the noncoding genome
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Technological advancements have enabled quantification of processes within and around us. The information stored within our body converts into petabytes of data. Processing and learning from such data requires comprehensive computational programs and software systems. We developed software programs to systematically investigate the process of gene regulation in the human genome. Gene regulation is a complex process where several genomic elements control expression of a gene through recruiting many transcription factor (TF) proteins. The TFs recognize specific DNA sequences known as motifs. DNA mutations in regulatory elements and particularly in TF motifs may cause gene deregulation. Therefore, defining the landscape of regulatory elements and their roles in cancer and complex diseases is of major importance.

We developed an algorithm (tfNet) to identify regulatory elements based on transcription factor binding sites. tfNet identified nearly 144,000 regulatory elements in five human cell lines. Investigating the elements we identified TF interaction networks and enrichment of many GWAS SNPs. We also defined the regulatory landscape for other conditions and species. Next, we investigated the role of regulatory elements in cancer. Cancer is initiated and developed by genetic aberrations in the genome. Genetic changes that are present in a cancer genome are obtained through whole genome sequencing technologies. We analyzed somatic mutations that had been detected in 326 whole genomes of liver cancer patients. Our results indicated 907 candidate mutations affecting TF motifs. Genome wide alignment of the mutated motifs revealed a significant enrichment of mutations in a highly conserved position of the CTCF motif. Gene expression analysis exhibited disruption of topologically associated domains in the mutated samples. We also confirmed the mutational pattern in pancreatic, gastric and esophagus cancers. Finally, enrichment of cancer associated gene sets and pathways suggested great role of noncoding mutations in cancer.

To systematically analyze DNA mutations in TF motifs, we developed an online database system (funMotifs). Publicly available datasets were collected for thousands experiments. The datasets were integrated using a logistic regression model. Functionality annotations and scores for motifs of 519 TFs were derived. The database allows for identification of variants affecting functional motifs in a selected tissue type. Finally, a comprehensive analysis was performed to identify mutations overlapping functional TF motifs in 37 cancer types. Somatic mutations from a pan-cancer cohort of 2,515 cancer whole genomes were investigated. A significant enrichment of mutations in the CpG site of the CEBPB motif was identified. Overall, 10,806 mutated regulatory elements were identified including 406 highly recurrent ones. Genes associated to the mutated elements were highly enriched for cancer-related pathways. Our analyses provide further insights onto the role of regulatory elements and their impacts on cancer development.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. p. 54
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1627
Keywords
Regulatory elements, gene regulation, cancer, motif, integrative database, software solutions for cancer data
National Category
Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-339937 (URN)978-91-513-0220-1 (ISBN)
Public defence
2018-03-14, A1:111a, BMC, Husargatan 3, 09:00 (English)
Opponent
Supervisors
Available from: 2018-02-21 Created: 2018-01-24 Last updated: 2018-03-07
Diamanti, K., Umer, H. M., Kruczyk, M., Dabrowski, M. J., Cavalli, M., Wadelius, C. & Komorowski, J. (2016). Maps of context-dependent putative regulatory regions and genomic signal interactions. Nucleic Acids Research, 44(19), 9110-9120
Open this publication in new window or tab >>Maps of context-dependent putative regulatory regions and genomic signal interactions
Show others...
2016 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 44, no 19, p. 9110-9120Article in journal (Refereed) Published
Abstract [en]

Gene transcription is regulated mainly by transcription factors (TFs). ENCODE and Roadmap Epigenomics provide global binding profiles of TFs, which can be used to identify regulatory regions. To this end we implemented a method to systematically construct cell-type and species-specific maps of regulatory regions and TF-TF interactions. We illustrated the approach by developing maps for five human cell-lines and two other species. We detected similar to 144k putative regulatory regions among the human cell-lines, with the majority of them being similar to 300 bp. We found similar to 20k putative regulatory elements in the ENCODE heterochromatic domains suggesting a large regulatory potential in the regions presumed transcriptionally silent. Among the most significant TF interactions identified in the heterochromatic regions were CTCF and the cohesin complex, which is in agreement with previous reports. Finally, we investigated the enrichment of the obtained putative regulatory regions in the 3D chromatin domains. More than 90% of the regions were discovered in the 3D contacting domains. We found a significant enrichment of GWAS SNPs in the putative regulatory regions. These significant enrichments provide evidence that the regulatory regions play a crucial role in the genomic structural stability. Additionally, we generated maps of putative regulatory regions for prostate and colorectal cancer human cell-lines.

National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:uu:diva-310761 (URN)10.1093/nar/gkw800 (DOI)000388016900012 ()27625394 (PubMedID)
Funder
AstraZenecaSwedish Research CouncilSwedish Diabetes AssociationeSSENCE - An eScience Collaboration
Note

De två första författarna delar förstaförfattarskapet.

Available from: 2016-12-19 Created: 2016-12-19 Last updated: 2018-01-25Bibliographically approved
Dabrowski, M., Pilot, M., Kruczyk, M., Zmihorski, M., Umer, H. M. & Gliwicz, J. (2014). Reliability assessment of null allele detection: inconsistencies between and within different methods. Molecular Ecology Resources, 14(2), 361-373
Open this publication in new window or tab >>Reliability assessment of null allele detection: inconsistencies between and within different methods
Show others...
2014 (English)In: Molecular Ecology Resources, ISSN 1755-098X, E-ISSN 1755-0998, Vol. 14, no 2, p. 361-373Article in journal (Refereed) Published
Abstract [en]

Microsatellite loci are widely used in population genetic studies, but the presence of null alleles may lead to biased results. Here, we assessed five methods that indirectly detect null alleles and found large inconsistencies among them. Our analysis was based on 20 microsatellite loci genotyped in a natural population of Microtus oeconomus sampled during 8years, together with 1200 simulated populations without null alleles, but experiencing bottlenecks of varying duration and intensity, and 120 simulated populations with known null alleles. In the natural population, 29% of positive results were consistent between the methods in pairwise comparisons, and in the simulated data set, this proportion was 14%. The positive results were also inconsistent between different years in the natural population. In the null-allele-free simulated data set, the number of false positives increased with increased bottleneck intensity and duration. We also found a low concordance in null allele detection between the original simulated populations and their 20% random subsets. In the populations simulated to include null alleles, between 22% and 42% of true null alleles remained undetected, which highlighted that detection errors are not restricted to false positives. None of the evaluated methods clearly outperformed the others when both false-positive and false-negative rates were considered. Accepting only the positive results consistent between at least two methods should considerably reduce the false-positive rate, but this approach may increase the false-negative rate. Our study demonstrates the need for novel null allele detection methods that could be reliably applied to natural populations.

Keywords
heterozygosity, bottleneck, microsatellite loci, genotyping errors, null alleles, Microtus oeconomus
National Category
Natural Sciences
Identifiers
urn:nbn:se:uu:diva-220981 (URN)10.1111/1755-0998.12177 (DOI)000331469500013 ()
Available from: 2014-03-25 Created: 2014-03-24 Last updated: 2017-12-05Bibliographically approved
Kruczyk, M., Umer, H. M., Enroth, S. & Komorowski, J. (2013). Peak Finder Metaserver - a novel application for finding peaks in ChIP-seq data. BMC Bioinformatics, 14, 280
Open this publication in new window or tab >>Peak Finder Metaserver - a novel application for finding peaks in ChIP-seq data
2013 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, p. 280-Article in journal (Refereed) Published
Abstract [en]

Background: Finding peaks in ChIP-seq is an important process in biological inference. In some cases, such as positioning nucleosomes with specific histone modifications or finding transcription factor binding specificities, the precision of the detected peak plays a significant role. There are several applications for finding peaks (called peak finders) based on different algorithms (e.g. MACS, Erange and HPeak). Benchmark studies have shown that the existing peak finders identify different peaks for the same dataset and it is not known which one is the most accurate. We present the first meta-server called Peak Finder MetaServer (PFMS) that collects results from several peak finders and produces consensus peaks. Our application accepts three standard ChIP-seq data formats: BED, BAM, and SAM. Results: Sensitivity and specificity of seven widely used peak finders were examined. For the experiments we used three previously studied Transcription Factors (TF) ChIP-seq datasets and identified three of the selected peak finders that returned results with high specificity and very good sensitivity compared to the remaining four. We also ran PFMS using the three selected peak finders on the same TF datasets and achieved higher specificity and sensitivity than the peak finders individually. Conclusions: We show that combining outputs from up to seven peak finders yields better results than individual peak finders. In addition, three of the seven peak finders outperform the remaining four, and running PFMS with these three returns even more accurate results. Another added value of PFMS is a separate report of the peaks returned by each of the included peak finders.

Keywords
Transcription factor, Peak finder, ChIP-seq, Metaserver
National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:uu:diva-213840 (URN)10.1186/1471-2105-14-280 (DOI)000327524600002 ()
Available from: 2014-01-05 Created: 2014-01-04 Last updated: 2017-12-06Bibliographically approved
Umer, H. M., Komorowski, J., Wadelius, C. & the PCAWG Drivers and Functional Interpretation Group, t.-C. I.Functional annotation of noncoding mutations identifies candidate regulatory aberrations in cancer.
Open this publication in new window or tab >>Functional annotation of noncoding mutations identifies candidate regulatory aberrations in cancer
(English)Manuscript (preprint) (Other academic)
Keywords
noncoding genome, transcription factor motifs, regulatory elements, cancer
National Category
Bioinformatics and Systems Biology Medical Genetics
Research subject
Bioinformatics; Medical Genetics
Identifiers
urn:nbn:se:uu:diva-339913 (URN)
Available from: 2018-01-24 Created: 2018-01-24 Last updated: 2018-01-25
Umer, H. M., Khaliq, Z., Marzouka, N.-a., Smolinska, K., Wadelius, C. & Komorowski, J.funMotifs: Tissue-specific transcription factor motifs.
Open this publication in new window or tab >>funMotifs: Tissue-specific transcription factor motifs
Show others...
(English)Manuscript (preprint) (Other academic)
Keywords
noncoding genome, transcription factor motifs, database, annotation, genetic variants
National Category
Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-339915 (URN)
Available from: 2018-01-24 Created: 2018-01-24 Last updated: 2018-01-25
Organisations

Search in DiVA

Show all publications