uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 11) Show all publications
Åslin, M., Brandt, M. & Dahlberg, J. (2018). CheckQC: Quick quality control of Illumina sequencing runs. The Journal of Open Source Software, 3(22), Article ID 556.
Open this publication in new window or tab >>CheckQC: Quick quality control of Illumina sequencing runs
2018 (English)In: The Journal of Open Source Software, ISSN 2475-9066, Vol. 3, no 22, article id 556Article in journal (Refereed) Published
Keywords
bioinformatics, sequencing
National Category
Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-349255 (URN)10.21105/joss.00556 (DOI)
Available from: 2018-04-24 Created: 2018-04-24 Last updated: 2018-08-27Bibliographically approved
Ameur, A., Che, H., Martin, M., Bunikis, I., Dahlberg, J., Höijer, I., . . . Gyllensten, U. B. (2018). De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data. Genes, 9(10), Article ID 486.
Open this publication in new window or tab >>De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data
Show others...
2018 (English)In: Genes, ISSN 2073-4425, E-ISSN 2073-4425, Vol. 9, no 10, article id 486Article in journal (Refereed) Published
Abstract [en]

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.

Keywords
de novo assembly, SMRT sequencing, GRCh38, human reference genome, human whole-genome sequencing, population sequencing, Swedish population
National Category
Genetics
Identifiers
urn:nbn:se:uu:diva-369762 (URN)10.3390/genes9100486 (DOI)000448656700024 ()30304863 (PubMedID)
Funder
Knut and Alice Wallenberg Foundation, 2014.0272Swedish Research Council
Available from: 2018-12-17 Created: 2018-12-17 Last updated: 2019-10-23Bibliographically approved
Dahlberg, J. (2018). Genetic Cartography at Massively Parallel Scale. (Doctoral dissertation). Uppsala: Acta Universitatis Upsaliensis
Open this publication in new window or tab >>Genetic Cartography at Massively Parallel Scale
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Massively parallel sequencing (MPS) is revolutionizing genomics. In this work we use, refine, and develop new tools for the discipline.

MPS has led to the discovery of multiple novel subtypes in Acute Lymphoblastic Leukemia (ALL). In Study I we screen for fusion genes in 134 pediatric ALL patients, including patients without an assigned subtype. In approximately 80% of these patients we detect novel or known fusion gene families, most of which display distinct methylation and expression patterns. This shows the potential for improvements in the clinical stratification of ALL. Large sample sizes are important to detect recurrent somatic variation. In Study II we investigate if a non-index overlapping pooling schema can be used to increase sample size and detect somatic variation. We designed a schema for 172 ALL samples and show that it is possible to use this method to call somatic variants.

Around the globe there are many ongoing and completed genome projects. In Study III we sequenced the genome of 1000 Swedes to create a reference data set for the Swedish population. We identified more than 10 million variants that were not present in publicly available databases, highlighting the need for population-specific resources. Data, and the tools developed during this study, have been made publicly available as a resource for genomics in Sweden and abroad.

The increased amount of sequencing data has created a greater need for automation. In Study IV we present Arteria, a computational automation system for sequencing core facilities. This system has been adopted by multiple facilities and has been used to analyze thousands of samples. In Study V we developed CheckQC, a program that provides automated quality control of Illumina sequencing runs. These tools make scaling up MPS less labour intensive, a key to unlocking the full future potential of genomics.

The tools, and data presented here are a valuable contribution to the scientific community. Collectively they showcase the power of MPS and genomics to bring about new knowledge of human health and disease.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. p. 68
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine, ISSN 1651-6206 ; 1492
Keywords
Acute Lymphoblastic Leukemia (ALL), RNA-Sequencing, Bioinformatics, Pooling, Whole Genome Sequencing
National Category
Medical Genetics Cancer and Oncology Hematology Computer Systems Bioinformatics (Computational Biology)
Research subject
Medical Genetics; Bioinformatics
Identifiers
urn:nbn:se:uu:diva-358289 (URN)978-91-513-0428-1 (ISBN)
Public defence
2018-10-19, E10:1307-1309 (Trippelrummet), Navet, Biomedicinskt centrum, Husargatan 3, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2018-09-20 Created: 2018-08-27 Last updated: 2018-10-02
Ameur, A., Dahlberg, J., Olason, P., Vezzi, F., Karlsson, R., Martin, M., . . . Gyllensten, U. B. (2017). SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population. European Journal of Human Genetics, 25(11), 1253-1260
Open this publication in new window or tab >>SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
Show others...
2017 (English)In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 25, no 11, p. 1253-1260Article in journal (Refereed) Published
Abstract [en]

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.

Place, publisher, year, edition, pages
NATURE PUBLISHING GROUP, 2017
National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:uu:diva-337314 (URN)10.1038/ejhg.2017.130 (DOI)000412823800012 ()28832569 (PubMedID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceKnut and Alice Wallenberg Foundation, 2014.0272Swedish Research CouncilSwedish National Infrastructure for Computing (SNIC), sens2016003EU, European Research Council, 282330
Available from: 2018-01-08 Created: 2018-01-08 Last updated: 2018-08-27Bibliographically approved
Marincevic-Zuniga, Y., Dahlberg, J., Nilsson, S., Raine, A., Nystedt, S., Lindqvist, C. M., . . . Syvänen, A.-C. (2017). Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles. Journal of Hematology & Oncology, 10, Article ID 148.
Open this publication in new window or tab >>Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles
Show others...
2017 (English)In: Journal of Hematology & Oncology, ISSN 1756-8722, E-ISSN 1756-8722, Vol. 10, article id 148Article in journal (Refereed) Published
Abstract [en]

Background: Structural chromosomal rearrangements that lead to expressed fusion genes are a hallmark of acute lymphoblastic leukemia (ALL). In this study, we performed transcriptome sequencing of 134 primary ALL patient samples to comprehensively detect fusion transcripts. Methods: We combined fusion gene detection with genome-wide DNA methylation analysis, gene expression profiling, and targeted sequencing to determine molecular signatures of emerging ALL subtypes. Results: We identified 64 unique fusion events distributed among 80 individual patients, of which over 50% have not previously been reported in ALL. Although the majority of the fusion genes were found only in a single patient, we identified several recurrent fusion gene families defined by promiscuous fusion gene partners, such as ETV6, RUNX1, PAX5, and ZNF384, or recurrent fusion genes, such as DUX4-IGH. Our data show that patients harboring these fusion genes displayed characteristic genome-wide DNA methylation and gene expression signatures in addition to distinct patterns in single nucleotide variants and recurrent copy number alterations. Conclusion: Our study delineates the fusion gene landscape in pediatric ALL, including both known and novel fusion genes, and highlights fusion gene families with shared molecular etiologies, which may provide additional information for prognosis and therapeutic options in the future.

Keywords
Pediatric acute lymphoblastic leukemia, RNA sequencing, Fusion genes, BCP-ALL, T-ALL, Translocation
National Category
Cancer and Oncology Pediatrics
Identifiers
urn:nbn:se:uu:diva-332658 (URN)10.1186/s13045-017-0515-y (DOI)000408001300001 ()28806978 (PubMedID)
Funder
Swedish Foundation for Strategic Research , RBc08-008Swedish Cancer Society, 130440, 160711Swedish Childhood Cancer Foundation, 11098Swedish Research Council, C0524801, 2016-03691_3
Note

De 2 sista författarna delar sistaförfattarskapet.

Available from: 2017-10-31 Created: 2017-10-31 Last updated: 2019-10-23Bibliographically approved
Spjuth, O., Bongcam-Rudloff, E., Dahlberg, J., Dahlö, M., Kallio, A., Pireddu, L., . . . Korpelainen, E. (2016). Recommendations on e-infrastructures for next-generation sequencing. GigaScience, 5, Article ID 26.
Open this publication in new window or tab >>Recommendations on e-infrastructures for next-generation sequencing
Show others...
2016 (English)In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 5, article id 26Article in journal (Refereed) Published
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-296012 (URN)10.1186/s13742-016-0132-7 (DOI)000377153700001 ()27267963 (PubMedID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC)Swedish e‐Science Research Center
Available from: 2016-06-07 Created: 2016-06-12 Last updated: 2018-01-10Bibliographically approved
Nordlund, J., Bäcklin, C., Zachariadis, V., Cavelier, L., Dahlberg, J., Öfverholm, I., . . . Syvänen, A.-C. (2015). DNA methylation-based subtype prediction for pediatric acute lymphoblastic leukemia. Clinical Epigenetics, 7, Article ID 11.
Open this publication in new window or tab >>DNA methylation-based subtype prediction for pediatric acute lymphoblastic leukemia
Show others...
2015 (English)In: Clinical Epigenetics, E-ISSN 1868-7083, Vol. 7, article id 11Article in journal (Refereed) Published
Abstract [en]

Background

We present a method that utilizes DNA methylation profiling for prediction of the cytogenetic subtypes of acute lymphoblastic leukemia (ALL) cells from pediatric ALL patients. The primary aim of our study was to improve risk stratification of ALL patients into treatment groups using DNA methylation as a complement to current diagnostic methods. A secondary aim was to gain insight into the functional role of DNA methylation in ALL.

Results

We used the methylation status of ~450,000 CpG sites in 546 well-characterized patients with T-ALL or seven recurrent B-cell precursor ALL subtypes to design and validate sensitive and accurate DNA methylation classifiers. After repeated cross-validation, a final classifier was derived that consisted of only 246 CpG sites. The mean sensitivity and specificity of the classifier across the known subtypes was 0.90 and 0.99, respectively. We then used DNA methylation classification to screen for subtype membership of 210 patients with undefined karyotype (normal or no result) or non-recurrent cytogenetic aberrations (‘other’ subtype). Nearly half (n = 106) of the patients lacking cytogenetic subgrouping displayed highly similar methylation profiles as the patients in the known recurrent groups. We verified the subtype of 20% of the newly classified patients by examination of diagnostic karyotypes, array-based copy number analysis, and detection of fusion genes by quantitative polymerase chain reaction (PCR) and RNA-sequencing (RNA-seq). Using RNA-seq data from ALL patients where cytogenetic subtype and DNA methylation classification did not agree, we discovered several novel fusion genes involving ETV6, RUNX1, and PAX5.

Conclusions

Our findings indicate that DNA methylation profiling contributes to the clarification of the heterogeneity in cytogenetically undefined ALL patient groups and could be implemented as a complementary method for diagnosis of ALL. The results of our study provide clues to the origin and development of leukemic transformation. The methylation status of the CpG sites constituting the classifiers also highlight relevant biological characteristics in otherwise unclassified ALL patients.

National Category
Hematology
Identifiers
urn:nbn:se:uu:diva-242351 (URN)10.1186/s13148-014-0039-z (DOI)000350260800001 ()25729447 (PubMedID)
Funder
Swedish Foundation for Strategic Research , RBc08-008
Note

De två sista författarna delar sistaförfattarskapet.

Available from: 2015-01-25 Created: 2015-01-25 Last updated: 2019-10-23Bibliographically approved
Lindqvist, C. M., Nordlund, J., Ekman, D., Johansson, A., Moghadam, B. T., Raine, A., . . . Berglund, E. C. (2015). The Mutational Landscape in Pediatric Acute Lymphoblastic Leukemia Deciphered by Whole Genome Sequencing. Human Mutation, 36(1), 118-128
Open this publication in new window or tab >>The Mutational Landscape in Pediatric Acute Lymphoblastic Leukemia Deciphered by Whole Genome Sequencing
Show others...
2015 (English)In: Human Mutation, ISSN 1059-7794, E-ISSN 1098-1004, Vol. 36, no 1, p. 118-128Article in journal (Refereed) Published
Abstract [en]

Genomic characterization of pediatric acute lymphoblastic leukemia (ALL) has identified distinct patterns of genes and pathways altered in patients with well-defined genetic aberrations. To extend the spectrum of known somatic variants in ALL, we performed whole genome and transcriptome sequencing of three B-cell precursor patients, of which one carried the t(12;21)ETV6-RUNX1 translocation and two lacked a known primary genetic aberration, and one T-ALL patient. We found that each patient had a unique genome, with a combination of well-known and previously undetected genomic aberrations. By targeted sequencing in 168 patients, we identified KMT2D and KIF1B as novel putative driver genes. We also identified a putative regulatory non-coding variant that coincided with overexpression of the growth factor MDK. Our results contribute to an increased understanding of the biological mechanisms that lead to ALL and suggest that regulatory variants may be more important for cancer development than recognized to date. The heterogeneity of the genetic aberrations in ALL renders whole genome sequencing particularly well suited for analysis of somatic variants in both research and diagnostic applications.

National Category
Medical Genetics Cancer and Oncology
Identifiers
urn:nbn:se:uu:diva-238183 (URN)10.1002/humu.22719 (DOI)000347076700016 ()25355294 (PubMedID)
Available from: 2014-12-10 Created: 2014-12-10 Last updated: 2019-10-23Bibliographically approved
Tyden, E., Dahlberg, J., Karlberg, O. & Hoglund, J. (2014). Deep amplicon sequencing of preselected isolates of Parascaris equorum in beta-tubulin codons associated with benzimidazole resistance in other nematodes. Parasites & Vectors, 7, 410
Open this publication in new window or tab >>Deep amplicon sequencing of preselected isolates of Parascaris equorum in beta-tubulin codons associated with benzimidazole resistance in other nematodes
2014 (English)In: Parasites & Vectors, ISSN 1756-3305, E-ISSN 1756-3305, Vol. 7, p. 410-Article in journal (Refereed) Published
Abstract [en]

Background: The development of anthelmintic resistance (AR) to macrocyclic lactones in the equine roundworm Parascaris equorum has resulted in benzimidazoles now being the most widely used substance to control Parascaris infections. However, over-reliance on one drug class is a risk factor for the development of AR. Consequently, benzimidazole resistance is widespread in several veterinary parasites, where it is associated with single nucleotide polymorphisms (SNPs) in drug targets encoded by the beta-tubulin genes. The importance of these SNPs varies between different parasitic nematodes, but it has been hypothesised that they occur, at low allele frequencies, even in unselected populations. This study investigated whether these SNPs exist in the P. equorum population and tested the hypothesis that BZ resistance can develop from pre-existing SNPs in codons 167, 198 and 200 of the beta-tubulin isotype 1 and 2 genes, reported to be associated with AR in strongylids. The efficacy of the oral paste formula fenbendazole on 11 farms in Sweden was also assessed. Methods: Two isotype-specific primer pairs were designed, one on either side of the codon 167 and one on either side of codons 198 and 200. A pool of 100 000 larvae was sequenced using deep amplicon sequencing by Illumina HiSeq. Faecal egg count reduction test was used to assess the efficacy of fenbendazole. Results: No SNPs were observed in codons 167, 198 or 200 of the beta-tubulin isotype 1 or 2 genes of P. equorum, even though 100 000 larvae were sequenced. Faecal egg count reduction testing of fenbendazole showed that this anthelmintic was still 100% effective, meaning that the likelihood of finding high allele frequency of SNPs associated with benzimidazoles resistance in P. equorum was low. Unexpectedly, the allele frequencies observed in single worms were comparable to those in pooled samples. Conclusions: We concluded that fenbendazole does not exert selection pressure on the beta-tubulin genes of isotypes 1 and 2 in P. equorum. The fact that no pre-existing SNPs were found in codons 167, 198 and 200 in P. equorum also illustrates the difficulties in generalising about AR mechanisms between different taxonomic groups of nematodes.

Keywords
Parascaris equorum, Anthelmintic resistance, beta-tubulin, SNP, Fenbendazole, Illumina HiSeq
National Category
Medical Genetics
Identifiers
urn:nbn:se:uu:diva-233007 (URN)10.1186/1756-3305-7-410 (DOI)000341205100001 ()
Available from: 2014-10-09 Created: 2014-09-29 Last updated: 2018-01-11Bibliographically approved
Dahlberg, J., Hermansson, J., Sturlaugsson, S., Smeds, P., Ladenvall, C., Valls Guimera, R., . . . Larsson, P.Arteria: An automation system for a sequencing core facility.
Open this publication in new window or tab >>Arteria: An automation system for a sequencing core facility
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics (Computational Biology) Computer Systems
Identifiers
urn:nbn:se:uu:diva-357972 (URN)
Available from: 2018-08-23 Created: 2018-08-23 Last updated: 2018-08-27
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-6962-1460

Search in DiVA

Show all publications