Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On Deep Learning for Low-Dimensional Representations
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Artificial Intelligence.ORCID iD: 0000-0003-4397-9952
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Description
Abstract [en]

In science and engineering, we are often concerned with creating mathematical models from data. These models are abstractions of observed real-world processes where the goal is often to understand these processes or to use the models to predict future instances of the observed process. Natural processes often exhibit low-dimensional structures which we can embed into the model. In mechanistic models, we directly include this structure into the model through mathematical equations often inspired by physical constraints. In contrast, within machine learning and particularly in deep learning we often deal with high-dimensional data such as images and learn a model without imposing a low-dimensional structure. Instead, we learn some kind of representations that are useful for the task at hand. While representation learning arguably enables the power of deep neural networks, it is less clear how to understand real-world processes from these models or whether we can benefit from including a low-dimensional structure in the model.

Learning from data with intrinsic low-dimensional structure and how to replicate this structure in machine learning models is studied within this dissertation. While we put specific emphasis on deep neural networks, we also consider kernel machines in the context of Gaussian processes, as well as linear models, for example by studying the generalisation of models with an explicit low-dimensional structure. First, we argue that many real-world observations have an intrinsic low-dimensional structure. We can find evidence of this structure for example through low-rank approximations of many real-world data sets. Then, we face two open-ended research questions. First, we study the behaviour of machine learning models when they are trained on data with low-dimensional structures. Here we investigate fundamental aspects of learning low-dimensional representations and how well models with explicit low-dimensional structures perform. Second, we focus on applications in the modelling of dynamical systems and the medical domain. We investigate how we can benefit from low-dimensional representations for these applications and explore the potential of low-dimensional model structures for predictive tasks. Finally, we give a brief outlook on how we go beyond learning low-dimensional structures and identify the underlying mechanisms that generate the data to better model and understand these processes.

This dissertation provides an overview of learning low-dimensional structures in machine learning models. It covers a wide range of topics from representation learning over the study of generalisation in overparameterized models to applications with time series and medical applications. However, each contribution opens up a range of questions to study in the future. Therefore this dissertation serves as a starting point to further explore learning of low-dimensional structure and representations.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2024. , p. 110
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2392
Keywords [en]
Machine Learning, Deep Learning, Gaussian Process, Low-Dimensional Representations, Representation Learning, Dynamical Systems, Electrocardiogram
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-526130ISBN: 978-91-513-2102-8 (print)OAI: oai:DiVA.org:uu-526130DiVA, id: diva2:1849417
Public defence
2024-06-14, room 80121, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala, 09:15 (English)
Opponent
Supervisors
Available from: 2024-05-02 Created: 2024-04-07 Last updated: 2024-05-02
List of papers
1. Uncertainty Estimation with Recursive Feature Machines
Open this publication in new window or tab >>Uncertainty Estimation with Recursive Feature Machines
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In conventional regression analysis, predictions are typically represented as point estimates derived from covariates. The Gaussian Process (GP) offer a kernel-based framework that predicts and additionally quantifies associated uncertainties. However, kernel-based methods often underperform ensemble-based decision tree approaches in regression tasks involving tabular and categorical data. Recently, Recursive Feature Machines (RFMs) were proposed as a novel feature-learning kernel which strengthens the capabilities of kernel machines. In this study, we harness the power RFMs in a probabilistic GP-based approach to enhance uncertainty estimation through feature extraction within kernel methods. We employ this learned kernel for in-depth uncertainty analysis. On tabular datasets, our RFM-based method surpasses other leading uncertainty estimation techniques, including NGBoost and CatBoost-ensemble. Additionally, when assessing out-of-distribution performance, we found that boosting-based methods are surpassed by our RFM-based approach.

National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:uu:diva-526129 (URN)
Conference
The 40th Conference on Uncertainty in Artificial Intelligence
Available from: 2024-04-04 Created: 2024-04-04 Last updated: 2024-08-28Bibliographically approved
2. Invertible Kernel PCA With Random Fourier Features
Open this publication in new window or tab >>Invertible Kernel PCA With Random Fourier Features
2023 (English)In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 30, p. 563-567Article in journal (Refereed) Published
Abstract [en]

Kernel principal component analysis (kPCA) is a widely studied method to construct a low-dimensional data representation after a nonlinear transformation. The prevailing method to reconstruct the original input signal from kPCA-an important task for denoising-requires us to solve a supervised learning problem. In this paper, we present an alternative method where the reconstruction follows naturally from the compression step. We first approximate the kernel with random Fourier features. Then, we exploit the fact that the nonlinear transformation is invertible in a certain subdomain. Hence, the name invertible kernel PCA (ikPCA). We experiment with different data modalities and show that ikPCA performs similarly to kPCA with supervised reconstruction on denoising tasks, making it a strong alternative.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Principal component analysis, Kernel, Image reconstruction, Dimensionality reduction, Noise reduction, Electrocardiography, Toy manufacturing industry, Denoising, ECG, Index Terms, kernel PCA, pre-image, random Fourier features, reconstruction
National Category
Signal Processing
Identifiers
urn:nbn:se:uu:diva-507434 (URN)10.1109/LSP.2023.3275499 (DOI)001010346600002 ()
Funder
Knut and Alice Wallenberg FoundationSwedish Research Council, 202104321
Available from: 2023-07-11 Created: 2023-07-11 Last updated: 2024-04-07Bibliographically approved
3. No Double Descent in Principal Component Regression: A High-Dimensional Analysis
Open this publication in new window or tab >>No Double Descent in Principal Component Regression: A High-Dimensional Analysis
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Understanding the generalization properties of large-scale models necessitates incorporating realistic data assumptions into the analysis. Therefore, we consider Principal Component Regression (PCR)---combining principal component analysis and linear regression---on data from a low-dimensional manifold. We present an analysis of PCR when the data is sampled from a spiked covariance model, obtaining fundamental asymptotic guarantees for the generalization risk of this model. Our analysis is based on random matrix theory and allows us to provide guarantees for high-dimensional data. We additionally present an analysis of the distribution shift between training and test data. The results allow us to disentangle the effects of (1) the number of parameters, (2) the data-generating model and, (3) model misspecification on the generalization risk. The use of PCR effectively regularizes the model and prevents the interpolation peak of the double descent. Our theoretical findings are empirically validated in simulation, demonstrating their practical relevance.

National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:uu:diva-526128 (URN)
Available from: 2024-04-04 Created: 2024-04-04 Last updated: 2024-04-15Bibliographically approved
4. Deep State Space Models for Nonlinear System Identification
Open this publication in new window or tab >>Deep State Space Models for Nonlinear System Identification
2021 (English)In: IFAC PapersOnLine, Elsevier BV Elsevier, 2021, Vol. 54, no 7, p. 481-486Conference paper, Published paper (Refereed)
Abstract [en]

Deep state space models (SSMs) are an actively researched model class for temporal models developed in the deep learning community which have a close connection to classic SSMs. The use of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the probabilistic nature of the model class allows the uncertainty of the system to be modelled. In this work a deep SSM class and its parameter learning algorithm are explained in an effort to extend the toolbox of nonlinear identification methods with a deep learning based method. Six recent deep SSMs are evaluated in a first unified implementation on nonlinear system identification benchmarks.

Place, publisher, year, edition, pages
ElsevierElsevier BV, 2021
Keywords
Nonlinear system identification, black box modeling, deep learning
National Category
Control Engineering
Identifiers
urn:nbn:se:uu:diva-457741 (URN)10.1016/j.ifacol.2021.08.406 (DOI)000696396200083 ()
Conference
19th IFAC Symposium on System Identification (SYSID), JUL 13-16, 2021, Padova, ITALY
Funder
Knut and Alice Wallenberg FoundationSwedish Research Council, 2016-06079Swedish Research Council, 2019-04956Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2021-11-12 Created: 2021-11-12 Last updated: 2024-04-07Bibliographically approved
5. First Steps Towards Self-Supervised Pretraining of the 12-Lead ECG
Open this publication in new window or tab >>First Steps Towards Self-Supervised Pretraining of the 12-Lead ECG
2021 (English)In: 2021 Computing In Cardiology (CINC), Institute of Electrical and Electronics Engineers (IEEE), 2021Conference paper, Published paper (Refereed)
Abstract [en]

Self-supervised learning is a paradigm that extracts general features which describe the input space by artificially generating labels from the input without the need for explicit annotations. The learned features can then be used by transfer learning to boost the performance on a downstream task. Such methods have recently produced state of the art results in natural language processing and computer vision. Here, we propose a self-supervised learning method for 12-lead electrocardiograms (ECGs). For pretraining the model we design a task to mask out subsegements of all channels of the input signals and try to predict the actual values. As the model architecture, we use a U-ResNet containing an encoder-decoder structure. We test our method by self-supervised pretraining on the CODE dataset and then transfer the learnt features by finetuning on the PTBXL and CPSC benchmarks to evaluate the effect of our method in the classification of 12-leads ECGs. The method does provide modest improvements in performance when compared to not using pretraining. In future work we will make use of these ideas in smaller dataset, where we believe it can lead to larger performance gains.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
Computing in Cardiology Conference, ISSN 2325-8861, E-ISSN 2325-887X
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-481355 (URN)10.23919/CinC53138.2021.9662748 (DOI)000821955000067 ()978-1-6654-7916-5 (ISBN)
Conference
Conference on Computing in Cardiology (CinC), SEP 12-15, 2021, Brno, CZECH REPUBLIC
Funder
Knut and Alice Wallenberg Foundation
Available from: 2022-08-09 Created: 2022-08-09 Last updated: 2024-04-07Bibliographically approved
6. Development and validation of deep learning ECG-based prediction of myocardial infarction in emergency department patients
Open this publication in new window or tab >>Development and validation of deep learning ECG-based prediction of myocardial infarction in emergency department patients
Show others...
2022 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 12, article id 19615Article in journal (Refereed) Published
Abstract [en]

Myocardial infarction diagnosis is a common challenge in the emergency department. In managed settings, deep learning-based models and especially convolutional deep models have shown promise in electrocardiogram (ECG) classification, but there is a lack of high-performing models for the diagnosis of myocardial infarction in real-world scenarios. We aimed to train and validate a deep learning model using ECGs to predict myocardial infarction in real-world emergency department patients. We studied emergency department patients in the Stockholm region between 2007 and 2016 that had an ECG obtained because of their presenting complaint. We developed a deep neural network based on convolutional layers similar to a residual network. Inputs to the model were ECG tracing, age, and sex; and outputs were the probabilities of three mutually exclusive classes: non-ST-elevation myocardial infarction (NSTEMI), ST-elevation myocardial infarction (STEMI), and control status, as registered in the SWEDEHEART and other registries. We used an ensemble of five models. Among 492,226 ECGs in 214,250 patients, 5,416 were recorded with an NSTEMI, 1,818 a STEMI, and 485,207 without a myocardial infarction. In a random test set, our model could discriminate STEMIs/NSTEMIs from controls with a C-statistic of 0.991/0.832 and had a Brier score of 0.001/0.008. The model obtained a similar performance in a temporally separated test set of the study sample, and achieved a C-statistic of 0.985 and a Brier score of 0.002 in discriminating STEMIs from controls in an external test set. We developed and validated a deep learning model with excellent performance in discriminating between control, STEMI, and NSTEMI on the presenting ECG of a real-world sample of the important population of all-comers to the emergency department. Hence, deep learning models for ECG decision support could be valuable in the emergency department.

Place, publisher, year, edition, pages
Springer Nature, 2022
National Category
Cardiac and Cardiovascular Systems
Identifiers
urn:nbn:se:uu:diva-489599 (URN)10.1038/s41598-022-24254-x (DOI)000885139000003 ()36380048 (PubMedID)
Funder
Knut and Alice Wallenberg FoundationEU, Horizon 2020, 101054643Swedish Research Council, sens2020005Swedish Research Council, sens2020598Swedish Research Council, 2018-05973Uppsala UniversityKjell and Marta Beijer Foundation
Available from: 2022-12-02 Created: 2022-12-02 Last updated: 2024-04-07Bibliographically approved

Open Access in DiVA

UUThesis_D-Gedon-2024(1178 kB)311 downloads
File information
File name FULLTEXT01.pdfFile size 1178 kBChecksum SHA-512
b50e34bfdb1ab5c5c56d252d8fceb908a9ac96c728d79b48c6d33ebc5e29db9f5777b1069ce46c9f1a152b9a6107a1dc2dbbd45adc05451d604e7d18bf9f4c60
Type fulltextMimetype application/pdf

Authority records

Gedon, Daniel

Search in DiVA

By author/editor
Gedon, Daniel
By organisation
Division of Systems and ControlArtificial Intelligence
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 311 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1851 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf