uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
Kähäri, Andreas
Publications (2 of 2) Show all publications
Ruffier, M., Kähäri, A., Komorowska, M., Keenan, S., Laird, M., Longden, I., . . . Flicek, P. (2017). Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation. Database: The Journal of Biological Databases and Curation, Article ID bax020.
Open this publication in new window or tab >>Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation
Show others...
2017 (English)In: Database: The Journal of Biological Databases and Curation, ISSN 1758-0463, E-ISSN 1758-0463, article id bax020Article in journal (Refereed) Published
Abstract [en]

The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list (http://www.ensembl.org/info/about/contact/index.html).

Place, publisher, year, edition, pages
National Category
Bioinformatics (Computational Biology)
urn:nbn:se:uu:diva-320207 (URN)10.1093/database/bax020 (DOI)000397532900001 ()28365736 (PubMedID)
Wellcome trust, WT108749/Z/15/Z WT200990/Z/16/Z WT201535/Z/16/ZEU, European Research Council, LSHG-CT-2003-503265 284209
Available from: 2017-04-18 Created: 2017-04-18 Last updated: 2018-01-13Bibliographically approved
Ameur, A., Dahlberg, J., Olason, P., Vezzi, F., Karlsson, R., Martin, M., . . . Gyllensten, U. B. (2017). SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population. European Journal of Human Genetics, 25(11), 1253-1260
Open this publication in new window or tab >>SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
Show others...
2017 (English)In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 25, no 11, p. 1253-1260Article in journal (Refereed) Published
Abstract [en]

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.

Place, publisher, year, edition, pages
National Category
Medical and Health Sciences
urn:nbn:se:uu:diva-337314 (URN)10.1038/ejhg.2017.130 (DOI)000412823800012 ()28832569 (PubMedID)
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceKnut and Alice Wallenberg Foundation, 2014.0272Swedish Research CouncilSwedish National Infrastructure for Computing (SNIC), sens2016003EU, European Research Council, 282330
Available from: 2018-01-08 Created: 2018-01-08 Last updated: 2018-08-27Bibliographically approved

Search in DiVA

Show all publications