Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Genotyping of SNPs in bread wheat at reduced cost from pooled experiments and imputation
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0009-0006-3654-6525
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science. Uppsala University, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0003-0458-6902
2024 (English)In: Theoretical and Applied Genetics, ISSN 0040-5752, E-ISSN 1432-2242, Vol. 137, no 1, article id 26Article in journal (Refereed) Published
Abstract [en]

The plant breeding industry has shown growing interest in using the genotype data of relevant markers for performing selection of new competitive varieties. The selection usually benefits from large amounts of marker data and it is therefore crucial to dispose of data collection methods that are both cost-effective and reliable. Computational methods such as genotype imputation have been proposed earlier in several plant science studies for addressing the cost challenge. Genotype imputation methods have though been used more frequently and investigated more extensively in human genetics research. The various algorithms that exist have shown lower accuracy at inferring the genotype of genetic variants occurring at low frequency, while these rare variants can have great significance and impact in the genetic studies that underlie selection. In contrast, pooling is a technique that can efficiently identify low-frequency items in a population and it has been successfully used for detecting the samples that carry rare variants in a population. In this study, we propose to combine pooling and imputation, and demonstrate this by simulating a hypothetical microarray for genotyping a population of recombinant inbred lines in a cost-effective and accurate manner, even for rare variants. We show that with an adequate imputation model, it is feasible to accurately predictthe individual genotypes at lower cost than sample-wise genotyping and time-effectively. Moreover, we provide code resources for reproducing the results presented in this study in the form of a containerized workflow.

Place, publisher, year, edition, pages
Springer Nature, 2024. Vol. 137, no 1, article id 26
Keywords [en]
genotyping, imputation, MAGIC population, pooling, wheat
National Category
Bioinformatics (Computational Biology)
Research subject
Scientific Computing
Identifiers
URN: urn:nbn:se:uu:diva-518436DOI: 10.1007/s00122-023-04533-5ISI: 001145311600001PubMedID: 38243086OAI: oai:DiVA.org:uu-518436DiVA, id: diva2:1825836
Projects
eSSENCE - An eScience Collaboration
Funder
Swedish Research Council Formas, 2017-00453Available from: 2024-01-10 Created: 2024-01-10 Last updated: 2025-01-07Bibliographically approved
In thesis
1. A computational and statistical framework for cost-effective genotyping combining pooling and imputation
Open this publication in new window or tab >>A computational and statistical framework for cost-effective genotyping combining pooling and imputation
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The information conveyed by genetic markers, such as single nucleotide polymorphisms (SNPs), has been widely used in biomedical research to study human diseases and is increasingly valued in agriculture for genomic selection purposes. Specific markers can be identified as a genetic signature that correlates with certain characteristics in a living organism, e.g. a susceptibility to disease or high-yield traits. Capturing these signatures with sufficient statistical power often requires large volumes of data, with thousands of samples to be analysed and potentially millions of genetic markers to be screened. Relevant effects are particularly delicate to detect when the genetic variations involved occur at low frequencies.

The cost of producing such marker genotype data is therefore a critical part of the analysis. Despite recent technological advances, production costs can still be prohibitive on a large scale and genotype imputation strategies have been developed to address this issue. Genotype imputation methods have been extensively studied in human data and, to a lesser extent, in crop and animal species. A recognised weakness of imputation methods is their lower accuracy in predicting the genotypes for rare variants, whereas those can be highly informative in association studies and improve the accuracy of genomic selection. In this respect, pooling strategies can be well suited to complement imputation, as pooling is efficient at capturing the low-frequency items in a population. Pooling also reduces the number of genotyping tests required, making its use in combination with imputation a cost-effective compromise between accurate but expensive high-density genotyping of each sample individually and stand-alone imputation. However, due to the nature of genotype data and the limitations of genotype testing techniques, decoding pooled genotypes into unique data resolutions is challenging. 

In this work, we study the characteristics of decoded genotype data from pooled observations with a specific pooling scheme using the examples of a human cohort and a population of inbred wheat lines. We propose different inference strategies to reconstruct the genotypes before devising them as input to imputation, and we reflect on how the reconstructed distributions affect the results of imputation methods such as tree-based haplotype clustering or coalescent models.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2024. p. 81
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2354
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-519887 (URN)978-91-513-2006-9 (ISBN)
Public defence
2024-03-08, 101195 (Heinz-Otto Kreiss), Ångströmlaboratoriet, Lägerhyddsvägen 1, hus 10, Uppsala, 10:15 (English)
Opponent
Supervisors
Funder
Swedish Research Council Formas, 2017-00453
Available from: 2024-02-08 Created: 2024-01-10 Last updated: 2024-02-08

Open Access in DiVA

fulltext(1279 kB)176 downloads
File information
File name FULLTEXT02.pdfFile size 1279 kBChecksum SHA-512
cbec24660a0ea6fdc0b7991556a8222ff6acf656cfb4c451262363a2f10bb00d9ddeb9867db3b301ae167a24500c9903ba0cff1c3d65a804b244a7ec526c7acf
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Authority records

Clouard, CamilleNettelblad, Carl

Search in DiVA

By author/editor
Clouard, CamilleNettelblad, Carl
By organisation
Division of Scientific ComputingComputational ScienceScience for Life Laboratory, SciLifeLab
In the same journal
Theoretical and Applied Genetics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 176 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 182 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf