Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Genetic Cartography at Massively Parallel Scale
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular Medicine. (Molecular Medicine)ORCID iD: 0000-0001-6962-1460
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Massively parallel sequencing (MPS) is revolutionizing genomics. In this work we use, refine, and develop new tools for the discipline.

MPS has led to the discovery of multiple novel subtypes in Acute Lymphoblastic Leukemia (ALL). In Study I we screen for fusion genes in 134 pediatric ALL patients, including patients without an assigned subtype. In approximately 80% of these patients we detect novel or known fusion gene families, most of which display distinct methylation and expression patterns. This shows the potential for improvements in the clinical stratification of ALL. Large sample sizes are important to detect recurrent somatic variation. In Study II we investigate if a non-index overlapping pooling schema can be used to increase sample size and detect somatic variation. We designed a schema for 172 ALL samples and show that it is possible to use this method to call somatic variants.

Around the globe there are many ongoing and completed genome projects. In Study III we sequenced the genome of 1000 Swedes to create a reference data set for the Swedish population. We identified more than 10 million variants that were not present in publicly available databases, highlighting the need for population-specific resources. Data, and the tools developed during this study, have been made publicly available as a resource for genomics in Sweden and abroad.

The increased amount of sequencing data has created a greater need for automation. In Study IV we present Arteria, a computational automation system for sequencing core facilities. This system has been adopted by multiple facilities and has been used to analyze thousands of samples. In Study V we developed CheckQC, a program that provides automated quality control of Illumina sequencing runs. These tools make scaling up MPS less labour intensive, a key to unlocking the full future potential of genomics.

The tools, and data presented here are a valuable contribution to the scientific community. Collectively they showcase the power of MPS and genomics to bring about new knowledge of human health and disease.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. , p. 68
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine, ISSN 1651-6206 ; 1492
Keywords [en]
Acute Lymphoblastic Leukemia (ALL), RNA-Sequencing, Bioinformatics, Pooling, Whole Genome Sequencing
National Category
Medical Genetics Cancer and Oncology Hematology Computer Systems Bioinformatics (Computational Biology)
Research subject
Medical Genetics; Bioinformatics
Identifiers
URN: urn:nbn:se:uu:diva-358289ISBN: 978-91-513-0428-1 (print)OAI: oai:DiVA.org:uu-358289DiVA, id: diva2:1242206
Public defence
2018-10-19, E10:1307-1309 (Trippelrummet), Navet, Biomedicinskt centrum, Husargatan 3, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2018-09-20 Created: 2018-08-27 Last updated: 2018-10-02
List of papers
1. Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles
Open this publication in new window or tab >>Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles
Show others...
2017 (English)In: Journal of Hematology & Oncology, E-ISSN 1756-8722, Vol. 10, article id 148Article in journal (Refereed) Published
Abstract [en]

Background: Structural chromosomal rearrangements that lead to expressed fusion genes are a hallmark of acute lymphoblastic leukemia (ALL). In this study, we performed transcriptome sequencing of 134 primary ALL patient samples to comprehensively detect fusion transcripts. Methods: We combined fusion gene detection with genome-wide DNA methylation analysis, gene expression profiling, and targeted sequencing to determine molecular signatures of emerging ALL subtypes. Results: We identified 64 unique fusion events distributed among 80 individual patients, of which over 50% have not previously been reported in ALL. Although the majority of the fusion genes were found only in a single patient, we identified several recurrent fusion gene families defined by promiscuous fusion gene partners, such as ETV6, RUNX1, PAX5, and ZNF384, or recurrent fusion genes, such as DUX4-IGH. Our data show that patients harboring these fusion genes displayed characteristic genome-wide DNA methylation and gene expression signatures in addition to distinct patterns in single nucleotide variants and recurrent copy number alterations. Conclusion: Our study delineates the fusion gene landscape in pediatric ALL, including both known and novel fusion genes, and highlights fusion gene families with shared molecular etiologies, which may provide additional information for prognosis and therapeutic options in the future.

Keywords
Pediatric acute lymphoblastic leukemia, RNA sequencing, Fusion genes, BCP-ALL, T-ALL, Translocation
National Category
Cancer and Oncology Pediatrics
Identifiers
urn:nbn:se:uu:diva-332658 (URN)10.1186/s13045-017-0515-y (DOI)000408001300001 ()28806978 (PubMedID)
Funder
Swedish Foundation for Strategic Research, RBc08-008Swedish Cancer Society, 130440, 160711Swedish Childhood Cancer Foundation, 11098Swedish Research Council, C0524801, 2016-03691_3
Note

De 2 sista författarna delar sistaförfattarskapet.

Available from: 2017-10-31 Created: 2017-10-31 Last updated: 2023-10-02Bibliographically approved
2. Identification of somatic variants by targeted sequencing of pooled cancer samples
Open this publication in new window or tab >>Identification of somatic variants by targeted sequencing of pooled cancer samples
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Medical Genetics
Research subject
Medical Genetics
Identifiers
urn:nbn:se:uu:diva-269752 (URN)
Available from: 2015-12-18 Created: 2015-12-18 Last updated: 2018-08-27
3. SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
Open this publication in new window or tab >>SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
Show others...
2017 (English)In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 25, no 11, p. 1253-1260Article in journal (Refereed) Published
Abstract [en]

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.

Place, publisher, year, edition, pages
NATURE PUBLISHING GROUP, 2017
National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:uu:diva-337314 (URN)10.1038/ejhg.2017.130 (DOI)000412823800012 ()28832569 (PubMedID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceKnut and Alice Wallenberg Foundation, 2014.0272Swedish Research CouncilSwedish National Infrastructure for Computing (SNIC), sens2016003EU, European Research Council, 282330
Available from: 2018-01-08 Created: 2018-01-08 Last updated: 2022-01-29Bibliographically approved
4. Arteria: An automation system for a sequencing core facility
Open this publication in new window or tab >>Arteria: An automation system for a sequencing core facility
Show others...
2019 (English)In: GigaScience, E-ISSN 2047-217X, Vol. 8, no 12, article id giz135Article in journal (Refereed) Published
Abstract [en]

Background: In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.

Findings: Arteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/.

Conclusions: We describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.

National Category
Bioinformatics (Computational Biology) Computer Systems
Identifiers
urn:nbn:se:uu:diva-357972 (URN)10.1093/gigascience/giz135 (DOI)000506804600004 ()31825479 (PubMedID)
Funder
Swedish Research CouncilKnut and Alice Wallenberg Foundation
Available from: 2018-08-23 Created: 2018-08-23 Last updated: 2023-02-06Bibliographically approved
5. CheckQC: Quick quality control of Illumina sequencing runs
Open this publication in new window or tab >>CheckQC: Quick quality control of Illumina sequencing runs
2018 (English)In: Journal of Open Source Software, E-ISSN 2475-9066, Vol. 3, no 22, article id 556Article in journal (Refereed) Published
Keywords
bioinformatics, sequencing
National Category
Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-349255 (URN)10.21105/joss.00556 (DOI)
Available from: 2018-04-24 Created: 2018-04-24 Last updated: 2022-09-15Bibliographically approved

Open Access in DiVA

fulltext(1367 kB)379 downloads
File information
File name FULLTEXT01.pdfFile size 1367 kBChecksum SHA-512
46e5a63dc32f49b36a0abc7f35e7e8c7beadc22a10b4d0af728722a90572a9719b82a8e7ed566c6622e2e520a2599b0d0e831ddd4bb6375efa436c807d9efaf4
Type fulltextMimetype application/pdf

Authority records

Dahlberg, Johan

Search in DiVA

By author/editor
Dahlberg, Johan
By organisation
Molecular Medicine
Medical GeneticsCancer and OncologyHematologyComputer SystemsBioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 379 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1079 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf