uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A machine learning pipeline for predicting success rates in PrEST production
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
2019 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Protein epitope signature tags (PrESTs) are antigens produced in Escherichia coli at Atlas Antibodies and immunized into rabbits for antibody production. This project uses machine learning models to predict success rates for production and immunization and to find features important for success. The features are generated based on the PrEST sequences using web servers, downloadable software and Pyhton scripts. An additional analysis of the effect of rabbit- and environmental features on immunization success is performed. Many different models, model architectures and a few thousand features were tried. The models reached a maximum F1 scores of about 0.55 for a target outcome divided into two classes for both production and immunization analysis. No important features could be identified with significance.

The rabbit- and environmental analysis showed that this type of features is more important for PrEST immunization success than the PrEST-related features. F1 score rose to abut 0.6 and the environmental features ranked higher based on information gain. More data is needed to draw definitive conclusions, but this indicates that Atlas Antibodies should in the future focus on recording environmental features during production for better chances of predicting success rates.

Place, publisher, year, edition, pages
2019. , p. 84
Series
UPTEC X ; 19009
Keywords [en]
bioinformatik, proteomik, maskininlärning
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-385494OAI: oai:DiVA.org:uu-385494DiVA, id: diva2:1324575
External cooperation
Atlas Antibodies
Educational program
Molecular Biotechnology Engineering Programme
Supervisors
Examiners
Note

Sekretess

Available from: 2019-06-14 Created: 2019-06-13 Last updated: 2019-06-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

By organisation
Biology Education Centre
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 116 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf