This study assesses the outcomes of the NTIRE 2023 Challenge on Non-Homogeneous Dehazing, wherein novel techniques were proposed and evaluated on new image dataset called HD-NH-HAZE. The HD-NH-HAZE dataset contains 50 high resolution pairs of real-life outdoor images featuring nonhomogeneous hazy images and corresponding haze-free images of the same scene. The nonhomogeneous haze was simulated using a professional setup that replicated real-world conditions of hazy scenarios. The competition had 246 participants and 17 teams that competed in the final testing phase, and the proposed solutions demonstrated the cutting-edge in image dehazing technology.
Recent developments within deep learning are relevant for nonlinear system identification problems. In this paper, we establish connections between the deep learning and the system identification communities. It has recently been shown that convolutional architectures are at least as capable as recurrent architectures when it comes to sequence modeling tasks. Inspired by these results we explore the explicit relationships between the recently proposed temporal convolutional network (TCN) and two classic system identification model structures; Volterra series and block-oriented models. We end the paper with an experimental study where we provide results on two real-world problems, the well-known Silverbox dataset and a newer dataset originating from ground vibration experiments on an F-16 fighter aircraft.
We propose a model for hierarchical structured data as an extension to the stochastic temporal convolutional network. The proposed model combines an autoregressive model with a hierarchical variational autoencoder and downsampling to achieve superior computational complexity. We evaluate the proposed model on two different types of sequential data: speech and handwritten text. The results are promising with the proposed model achieving state-of-the-art performance.
When deploying machine learning algorithms in the real world, guaranteeing safety is an essential asset. Existing safe learning approaches typically consider continuous variables, i.e., regression tasks. However, in practice, robotic systems are also subject to discrete, external environmental changes, e.g., having to carry objects of certain weights or operating on frozen, wet, or dry surfaces. Such influences can be modeled as discrete context variables. In the existing literature, such contexts are, if considered, mostly assumed to be known. In this work, we drop this assumption and show how we can perform safe learning when we cannot directly measure the context variables. To achieve this, we derive frequentist guarantees for multiclass classification, allowing us to estimate the current context from measurements. Furthermore, we propose an approach for identifying contexts through experiments. We discuss under which conditions we can retain theoretical guarantees and demonstrate the applicability of our algorithm on a Furuta pendulum with camera measurements of different weights that serve as contexts.
The linear-quadratic-Gaussian (LQG) control paradigm is well-known in literature. The strategy of minimizing the cost function is available, both for the case where the state is known and where it is estimated through an observer. The situation is different when the cost function has an exponential discount factor, also known as a prescribed degree of stability. In this case, the optimal control strategy is only available when the state is known. This paper builds onward from that result, deriving an optimal control strategy when working with an estimated state. Expressions for the resulting optimal expected cost are also given.
Linear Quadratic Gaussian (LQG) systems are well-understood and methods to minimize the expected cost are readily available. Less is known about the statistical properties of the resulting cost function. The contribution of this paper is a set of analytic expressions for the mean and variance of the LQG cost function. These expressions are derived using two different methods, one using solutions to Lyapunov equations and the other using only matrix exponentials. Both the discounted and the non-discounted cost function are considered, as well as the finite-time and the infinite-time cost function. The derived expressions are successfully applied to an example system to reduce the probability of the cost exceeding a given threshold.
We present elliptical processes—a family of non-parametric probabilistic models that subsumes Gaussian processes and Student's t processes. This generalization includes a range of new heavy-tailed behaviors while retaining computational tractability. Elliptical processes are based on a representation of elliptical distributions as a continuous mixture of Gaussian distributions. We parameterize this mixture distribution as a spline normalizing flow, which we train using variational inference. The proposed form of the variational posterior enables a sparse variational elliptical process applicable to large-scale problems. We highlight advantages compared to Gaussian processes through regression and classification experiments. Elliptical processes can supersede Gaussian processes in several settings, including cases where the likelihood is non-Gaussian or when accurate tail modeling is essential.
We present an algorithm to estimate and quantify the uncertainty of the accelerometers' relative geometry in an inertial sensor array. We formulate the calibration problem as a Bayesian estimation problem and propose an algorithm that samples the accelerometer positions' posterior distribution using Markov chain Monte Carlo. By identifying linear substructures of the measurement model, the unknown linear motion parameters are analytically marginalized, and the remaining non-linear motion parameters are numerically marginalized. The numerical marginalization occurs in a low dimensional space where the gyroscopes give information about the motion. This combination of information from gyroscopes and analytical marginalization allows the user to make no assumptions of the motion before the calibration. It thus enables the user to estimate the accelerometer positions' relative geometry by simply exposing the array to arbitrary twisting motion. We show that the calibration algorithm gives good results on both simulated and experimental data, despite sampling a high dimensional space.
We present the new Bokeh Effect Transformation Dataset (BETD), and review the proposed solutions for this novel task at the NTIRE 2023 Bokeh Effect Transformation Challenge. Recent advancements of mobile photography aim to reach the visual quality of full-frame cameras. Now, a goal in computational photography is to optimize the Bokeh effect itself, which is the aesthetic quality of the blur in out-of-focus areas of an image. Photographers create this aesthetic effect by benefiting from the lens optical properties. The aim of this work is to design a neural network capable of converting the the Bokeh effect of one lens to the effect of another lens without harming the sharp foreground regions in the image. For a given input image, knowing the target lens type, we render or transform the Bokeh effect accordingly to the lens properties. We build the BETD using two full-frame Sony cameras, and diverse lens setups. To the best of our knowledge, we are the first attempt to solve this novel task, and we provide the first BETD dataset and benchmark for it. The challenge had 99 registered participants. The submitted methods gauge the state-of-the-art in Bokeh effect rendering and transformation.
This paper considers the problem of computing Bayesian estimates of both states and model parameters for nonlinear state-space models. Generally, this problem does not have a tractable solution and approximations must be utilised. In this work, a variational approach is used to provide an assumed density which approximates the desired, intractable, distribution. The approach is deterministic and results in an optimisation problem of a standard form. Due to the parametrisation of the assumed density selected first- and second-order derivatives are readily available which allows for efficient solutions. The proposed method is compared against state-of-the-art Hamiltonian Monte Carlo in two numerical examples.
This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is deterministic, tractable and can be solved using standard optimisation tools. A specialisation of this approach for systems with additive Gaussian noise is also detailed. The proposed method is examined numerically on a range of simulated and real examples focusing on the robustness to parameter initialisation; additionally, favourable comparisons are performed against state-of-the-art alternatives.
In this paper, the problem of state estimation, in the context of both filtering and smoothing, for nonlinear state-space models is considered. Due to the nonlinear nature of the models, the state estimation problem is generally intractable as it involves integrals of general nonlinear functions and the filtered and smoothed state distributions lack closed-form solutions. As such, it is common to approximate the state estimation problem. In this paper, we develop an assumed Gaussian solution based on variational inference, which offers the key advantage of a flexible, but principled, mechanism for approximating the required distributions. Our main contribution lies in a new formulation of the state estimation problem as an optimisation problem, which can then be solved using standard optimisation routines that employ exact first- and second-order derivatives. The resulting state estimation approach involves a minimal number of assumptions and applies directly to nonlinear systems with both Gaussian and non-Gaussian probabilistic models. The performance of our approach is demonstrated on several examples; a challenging scalar system, a model of a simple robotic system, and a target tracking problem using a von Mises-Fisher distribution and outperforms alternative assumed Gaussian approaches to state estimation.
Particle Metropolis–Hastings (PMH) allows for Bayesian parameter inference in nonlinear state space models by combining Markov chain Monte Carlo (MCMC) and particle filtering. The latter is used to estimate the intractable likelihood. In its original formulation, PMH makes use of a marginal MCMC proposal for the parameters, typically a Gaussian random walk. However, this can lead to a poor exploration of the parameter space and an inefficient use of the generated particles. We propose a number of alternative versions of PMH that incorporate gradient and Hessian information about the posterior into the proposal. This information is more or less obtained as a byproduct of the likelihood estimation. Indeed, we show how to estimate the required information using a fixed-lag particle smoother, with a computational cost growing linearly in the number of particles. We conclude that the proposed methods can: (i) decrease the length of the burn-in phase, (ii) increase the mixing of the Markov chain at the stationary phase, and (iii) make the proposal distribution scale invariant which simplifies tuning.
This tutorial provides a gentle introduction to the particle Metropolis-Hastings (PMH) algorithm for parameter inference in nonlinear state-space models together with a software implementation in the statistical programming language R. We employ a step-by-step approach to develop an implementation of the PMH algorithm (and the particle filter within) together with the reader. This final implementation is also available as the package pmhtutorial in the CRAN repository. Throughout the tutorial, we provide some intuition as to how the algorithm operates and discuss some solutions to problems that might occur in practice. To illustrate the use of PMH, we consider parameter inference in a linear Gaussian state-space model with synthetic data and a nonlinear stochastic volatility model with real-world data.
The recursive direct weight optimization method is used to solve challenging nonlinear system identification problems. This note provides a new derivation and a new interpretation of the method. The key underlying the note is to acknowledge and exploit a certain structure inherent in the problem.
The Kaczmarz algorithm (KA) is a popular method for solving a system of linear equations. In this note we derive a new exponential convergence result for the KA. The key allowing us to establish the new result is to rewrite the KA in such a way that its solution path can be interpreted as the output from a particular dynamical system. The asymptotic stability results of the corresponding dynamical system can then be leveraged to prove exponential convergence of the KA. The new bound is also compared to existing bounds.
In this paper, we propose variations of Willems' fundamental lemma that utilize second-order moments such as correlation functions in the time domain and power spectra in the frequency domain. We believe that using a formulation with estimated correlation coefficients is suitable for data compression, and possibly can reduce noise. Also, the formulations in the frequency domain can enable modeling of a system in a frequency region of interest.
In this paper, we consider data driven control of Hammerstein systems. For such systems a common control structure is a transfer function followed by a static output nonlinearity that tries to cancel the input nonlinearity of the system, which is modeled as a polynomial or piece-wise linear function. The linear part of the controller is used to achieve desired disturbance rejection and tracking properties. To design a linear part of the controller, we propose a weighted average risk criterion with the risk being the average of the squared L_{2} tracking error. Here the average is with respect to the observations used in the controller and the weighting is with respect to how important it is to have good control for different impulse responses. This criterion corresponds to the average risk criterion leading to the Bayes estimator and we therefore call this approach Bayes control. By parametrizing the weighting function and estimating the corresponding hyperparameters we tune the weighting function to the information regarding the true impulse response contained in the data set available to the user for the control design. The numerical results show that the proposed methods result in stable controllers with performance comparable to the optimal controller, designed using the true input nonlinearity and true plant.
This letter concerns the problem of learning robust LQ-controllers, when the dynamics of the linear system are unknown. First, we propose a robust control synthesis method to minimize the worst-case LQ cost, with probability 1 - delta, given empirical observations of the system. Next, we propose an approximate dual controller that simultaneously regulates the system and reduces model uncertainty. The objective of the dual controller is to minimize the worst-case cost attained by a new robust controller, synthesized with the reduced model uncertainty. The dual controller is subject to an exploration budget in the sense that it has constraints on its worst-case cost with respect to the current model uncertainty. In our numerical experiments, we observe better performance of the proposed robust LQ regulator over the existing methods. Moreover, the dual control strategy gives promising results in comparison with the common greedy random exploration strategies.
In conventional regression analysis, predictions are typically represented as point estimates derived from covariates. The Gaussian Process (GP) offer a kernel-based framework that predicts and additionally quantifies associated uncertainties. However, kernel-based methods often underperform ensemble-based decision tree approaches in regression tasks involving tabular and categorical data. Recently, Recursive Feature Machines (RFMs) were proposed as a novel feature-learning kernel which strengthens the capabilities of kernel machines. In this study, we harness the power RFMs in a probabilistic GP-based approach to enhance uncertainty estimation through feature extraction within kernel methods. We employ this learned kernel for in-depth uncertainty analysis. On tabular datasets, our RFM-based method surpasses other leading uncertainty estimation techniques, including NGBoost and CatBoost-ensemble. Additionally, when assessing out-of-distribution performance, we found that boosting-based methods are surpassed by our RFM-based approach.
Understanding the generalization properties of large-scale models necessitates incorporating realistic data assumptions into the analysis. Therefore, we consider Principal Component Regression (PCR)---combining principal component analysis and linear regression---on data from a low-dimensional manifold. We present an analysis of PCR when the data is sampled from a spiked covariance model, obtaining fundamental asymptotic guarantees for the generalization risk of this model. Our analysis is based on random matrix theory and allows us to provide guarantees for high-dimensional data. We additionally present an analysis of the distribution shift between training and test data. The results allow us to disentangle the effects of (1) the number of parameters, (2) the data-generating model and, (3) model misspecification on the generalization risk. The use of PCR effectively regularizes the model and prevents the interpolation peak of the double descent. Our theoretical findings are empirically validated in simulation, demonstrating their practical relevance.
Self-supervised learning is a paradigm that extracts general features which describe the input space by artificially generating labels from the input without the need for explicit annotations. The learned features can then be used by transfer learning to boost the performance on a downstream task. Such methods have recently produced state of the art results in natural language processing and computer vision. Here, we propose a self-supervised learning method for 12-lead electrocardiograms (ECGs). For pretraining the model we design a task to mask out subsegements of all channels of the input signals and try to predict the actual values. As the model architecture, we use a U-ResNet containing an encoder-decoder structure. We test our method by self-supervised pretraining on the CODE dataset and then transfer the learnt features by finetuning on the PTBXL and CPSC benchmarks to evaluate the effect of our method in the classification of 12-leads ECGs. The method does provide modest improvements in performance when compared to not using pretraining. In future work we will make use of these ideas in smaller dataset, where we believe it can lead to larger performance gains.
Kernel principal component analysis (kPCA) is a widely studied method to construct a low-dimensional data representation after a nonlinear transformation. The prevailing method to reconstruct the original input signal from kPCA-an important task for denoising-requires us to solve a supervised learning problem. In this paper, we present an alternative method where the reconstruction follows naturally from the compression step. We first approximate the kernel with random Fourier features. Then, we exploit the fact that the nonlinear transformation is invertible in a certain subdomain. Hence, the name invertible kernel PCA (ikPCA). We experiment with different data modalities and show that ikPCA performs similarly to kPCA with supervised reconstruction on denoising tasks, making it a strong alternative.
Deep state space models (SSMs) are an actively researched model class for temporal models developed in the deep learning community which have a close connection to classic SSMs. The use of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the probabilistic nature of the model class allows the uncertainty of the system to be modelled. In this work a deep SSM class and its parameter learning algorithm are explained in an effort to extend the toolbox of nonlinear identification methods with a deep learning based method. Six recent deep SSMs are evaluated in a first unified implementation on nonlinear system identification benchmarks.
This paper addresses the problem of computing fixed interval smoothed state estimates of a linear time varying Gaussian stochastic system. There already exist many algorithms that perform this computation, but all of them impose certain restrictions on system matrices in order for them to be applicable. This paper develops a new forwards–backwards pass algorithm that is applicable under the mildest restrictions possible - namely that the smoothed state distribtions exists in forms that can be characterised by means and covariances, for which this paper also develops a new necessary and sufficient condition.
This article addresses the problem of computing fixed-interval smoothed state estimates of a linear time-varying Gaussian stochastic system. There already exist many algorithms that perform this computation, but all of them impose certain restrictions on system matrices in order for them to be applicable, and the restrictions vary considerably between the various existing algorithms. This article establishes a new sufficient condition for the fixed-interval smoothing density to exist in a Gaussian form that can be completely characterized by associated means and covariances. It then develops an algorithm to compute these means and covariances with no further assumptions required. This results in an algorithm more generally applicable than any one of the multitude of existing algorithms available to date.
Spatiotemporal imaging has applications in e.g. cardiac diagnostics, surgical guidance, and radiotherapy monitoring, In this paper, we explain the temporal motion by identifying the underlying dynamics, only based on the sequential images. Our dynamical model maps the inputs of observed high-dimensional sequential images to a low-dimensional latent space wherein a linear relationship between a hidden state process and the lower-dimensional representation of the inputs holds. For this, we use a conditional variational auto-encoder (CVAE) to nonlinearly map the higher dimensional image to a lower-dimensional space, wherein we model the dynamics with a linear Gaussian state-space model (LG-SSM). The model, a modified version of the Kalman variational auto-encoder, is end-to-end trainable, and the weights, both in the CVAE and LG-SSM, are simultaneously updated by maximizing the evidence lower bound of the marginal likelihood. In contrast to the original model, we explain the motion with a spatial transformation from one image to another. This results in sharper reconstructions and the possibility of transferring auxiliary information, such as segmentation, through the image sequence. Our experiments, on cardiac ultrasound time series, show that the dynamic model outperforms traditional image registration in execution time, to a similar performance. Further, our model offers the possibility to impute and extrapolate for missing samples.
We introduce an end-to-end unsupervised (or weakly supervised) image registration method that blends conventional medical image registration with contemporary deep learning techniques from computer vision. Our method downsamples both the fixed and the moving images into multiple feature map levels where a displacement field is estimated at each level and then further refined throughout the network. We train and test our model on three different datasets. In comparison with the initial registrations we find an improved performance using our model, yet we expect it would improve further if the model was fine-tuned for each task. The implementation is publicly available (https://github.com/ngunnar/learning-a-deformable-registration-pyramid).
Accurate 3D object detection (3DOD) is crucial for safe navigation of complex environments by autonomous robots. Regressing accurate 3D bounding boxes in cluttered environments based on sparse LiDAR data is however a highly challenging problem. We address this task by exploring recent advances in conditional energy-based models (EBMs) for probabilistic regression. While methods employing EBMs for regression have demonstrated impressive performance on 2D object detection in images, these techniques are not directly applicable to 3D bounding boxes. In this work, we therefore design a differentiable pooling operator for 3D bounding boxes, serving as the core module of our EBM network. We further integrate this general approach into the state-of-the-art 3D object detector SA-SSD. On the KITTI dataset, our proposed approach consistently outperforms the SA-SSD baseline across all 3DOD metrics, demonstrating the potential of EBM-based regression for highly accurate 3DOD. Code is available at https://github.com/fregu856/ebms_3dod.
While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistemic uncertainty. The former can be estimated by letting a neural network output the parameters of a certain probability distribution. Epistemic uncertainty estimation is a more challenging problem, and while different scalable methods recently have emerged, no extensive comparison has been performed in a real-world setting. We therefore accept this task and propose a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning. Our proposed framework is specifically designed to test the robustness required in real-world computer vision applications. We also apply this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the-art scalable methods: ensembling and MC-dropout. Our comparison demonstrates that ensembling consistently provides more reliable and practically useful uncertainty estimates. Code is available at https://github.com/fregu856/evaluating_bdl.
Energy-based models (EBMs) have experienced a resurgence within machine learning in recent years, including as a promising alternative for probabilistic regression. However, energy-based regression requires a proposal distribution to be manually designed for training, and an initial estimate has to be provided at test-time. We address both of these issues by introducing a conceptually simple method to automatically learn an effective proposal distribution, which is parameterized by a separate network head. To this end, we derive a surprising result, leading to a unified training objective that jointly minimizes the KL divergence from the proposal to the EBM, and the negative log-likelihood of the EBM. At test-time, we can then employ importance sampling with the trained proposal to efficiently evaluate the learned EBM and produce standalone predictions. Furthermore, we utilize our derived training objective to learn mixture density networks (MDNs) with a jointly trained energy-based teacher, consistently outperforming conventional MDN training on four real-world regression tasks within computer vision. Code is available at https://github.com/fregu856/ebms_proposals.
Myocardial infarction diagnosis is a common challenge in the emergency department. In managed settings, deep learning-based models and especially convolutional deep models have shown promise in electrocardiogram (ECG) classification, but there is a lack of high-performing models for the diagnosis of myocardial infarction in real-world scenarios. We aimed to train and validate a deep learning model using ECGs to predict myocardial infarction in real-world emergency department patients. We studied emergency department patients in the Stockholm region between 2007 and 2016 that had an ECG obtained because of their presenting complaint. We developed a deep neural network based on convolutional layers similar to a residual network. Inputs to the model were ECG tracing, age, and sex; and outputs were the probabilities of three mutually exclusive classes: non-ST-elevation myocardial infarction (NSTEMI), ST-elevation myocardial infarction (STEMI), and control status, as registered in the SWEDEHEART and other registries. We used an ensemble of five models. Among 492,226 ECGs in 214,250 patients, 5,416 were recorded with an NSTEMI, 1,818 a STEMI, and 485,207 without a myocardial infarction. In a random test set, our model could discriminate STEMIs/NSTEMIs from controls with a C-statistic of 0.991/0.832 and had a Brier score of 0.001/0.008. The model obtained a similar performance in a temporally separated test set of the study sample, and achieved a C-statistic of 0.985 and a Brier score of 0.002 in discriminating STEMIs from controls in an external test set. We developed and validated a deep learning model with excellent performance in discriminating between control, STEMI, and NSTEMI on the presenting ECG of a real-world sample of the important population of all-comers to the emergency department. Hence, deep learning models for ECG decision support could be valuable in the emergency department.
This paper is directed towards the problem of learning nonlinear ARX models based on observed input output data. In particular, our interest is in learning a conditional distribution of the current output based on a finite window of past inputs and outputs. To achieve this, we consider the use of so-called energy-based models, which have been developed in allied fields for learning unknown distributions based on data. This energy-based model relies on a general function to describe the distribution, and here we consider a deep neural network for this purpose. The primary benefit of this approach is that it is capable of learning both simple and highly complex noise models, which we demonstrate on simulated and experimental data.
This letter considers the problem of determining an optimal control action based on observed data. We formulate the problem assuming that the system can be modeled by a nonlinear state-space model, but where the model parameters, state and future disturbances are not known and are treated as random variables. Central to our formulation is that the joint distribution of these unknown objects is conditioned on the observed data. Crucially, as new measurements become available, this joint distribution continues to evolve so that control decisions are made accounting for uncertainty as evidenced in the data. The resulting problem is intractable which we obviate by providing approximations that result in finite dimensional deterministic optimization problems. The proposed approach is demonstrated in simulation on a nonlinear system.
Polycrystals illuminated by high-energy X-rays or neutrons produce diffraction patterns in which the measured diffraction peaks encode the individual single crystal strain states. While state of the art X-ray and neutron diffraction approaches can be used to routinely recover per grain mean strain tensors, less work has been produced on the recovery of higher order statistics of the strain distributions across the individual grains. In the setting of small deformations, we consider the problem of estimating the crystal elastic strain tensor probability distribution from diffraction data. For the special case of multivariate Gaussian strain tensor probability distributions, we show that while the mean of the distribution is well defined from measurements, the covariance of strain has a null-space. We show that there exist exactly 6 orthogonal perturbations to this covariance matrix under which the measured strain signal is invariant. In particular, we provide analytical parametrisations of these perturbations together with the set of possible maximum-likelihood estimates for a multivariate Gaussian fit to data. The parametric description of the null-space provides insights into the strain PDF modes that cannot be accurately estimated from the diffraction data. Understanding these modes prevents erroneous conclusions from being drawn based on the data. Beyond Gaussian strain tensor probability densities, we derive an iterative radial basis regression scheme in which the strain tensor probability density is estimated by a sparse finite basis expansion. This is made possible by showing that the operator mapping the strain tensor probability density onto the measured histograms of directional strain is linear, without approximation. The utility of the proposed algorithm is demonstrated by numerical simulations in the setting of single crystal monochromatic X-ray scattering. The proposed regression methods were found to robustly reject outliers and accurately predict the strain tensor probability distributions in the presence of Gaussian measurement noise.
The convolutional neural network (CNN) remains an essential tool in solving computer vision problems. Standard convolutional architectures consist of stacked layers of operations that progressively downscale the image. Aliasing is a well-known side-effect of downsampling that may take place: it causes high-frequency components of the original signal to become indistinguishable from its low-frequency components. While downsampling takes place in the max-pooling layers or in the strided-convolutions in these models, there is no explicit mechanism that prevents aliasing from taking place in these layers. Due to the impressive performance of these models, it is natural to suspect that they, somehow, implicitly deal with this distortion. The question we aim to answer in this paper is simply: "how and to what extent do CNNs counteract aliasing?" We explore the question by means of two examples: In the first, we assess the CNNs capability of distinguishing oscillations at the input, showing that the redundancies in the intermediate channels play an important role in succeeding at the task; In the second, we show that an image classifier CNN while, in principle, capable of implementing anti-aliasing filters, does not prevent aliasing from taking place in the intermediate layers.
We study the error of linear regression in the face of adversarial attacks. In this framework, an adversary changes the input to the regression model in order to maximize the prediction error. We provide bounds on the prediction error in the presence of an adversary as a function of the parameter norm and the error in the absence of such an adversary. We show how these bounds make it possible to study the adversarial error using analysis from non-adversarial setups. The obtained results shed light on the robustness of overparameterized linear models to adversarial attacks. Adding features might be either a source of additional robustness or brittleness. On the one hand, we use asymptotic results to illustrate how double-descent curves can be obtained for the adversarial error. On the other hand, we derive conditions under which the adversarial error can grow to infinity as more features are added, while at the same time, the test error goes to zero. We show this behavior is caused by the fact that the norm of the parameter vector grows with the number of features. It is also established that l(infinity) and l(2)-adversarial attacks might behave fundamentally differently due to how the l(1) and l(2)-norms of random projections concentrate. We also show how our reformulation allows for solving adversarial training as a convex optimization problem. This fact is then exploited to establish similarities between adversarial training and parameter-shrinking methods and to study how the training might affect the robustness of the estimated models.
We shed new light on the smoothness of optimization problems arising in prediction error parameter estimation of linear and nonlinear systems. We show that for regions of the parameter space where the model is not contractive, the Lipschitz constant and β-smoothness of the objective function might blow up exponentially with the simulation length, making it hard to numerically find minima within those regions or, even, to escape from them. In addition to providing theoretical understanding of this problem, this paper also proposes the use of multiple shooting as a viable solution. The proposed method minimizes the error between a prediction model and the observed values. Rather than running the prediction model over the entire dataset, multiple shooting splits the data into smaller subsets and runs the prediction model over each subset, making the simulation length a design parameter and making it possible to solve problems that would be infeasible using a standard approach. The equivalence to the original problem is obtained by including constraints in the optimization. The new method is illustrated by estimating the parameters of nonlinear systems with chaotic or unstable behavior, as well as neural networks. We also present a comparative analysis of the proposed method with multi-step-ahead prediction error minimization.
In this paper, we propose an auxiliary-particlefilter-based two-filter smoother for Wiener state-space models. The proposed smoother exploits the model structure in order to obtain an analytical solution for the backward dynamics, which is introduced artificially in other two-filter smoothers. Furthermore, Gaussian approximations to the optimal proposal density and the adjustment multipliers are derived for both the forward and backward filters. The proposed algorithm is evaluated and compared to existing smoothing algorithms in a numerical example where it is shown that it performs similarly to the state of the art in terms of the root mean squared error at lower computational cost for large numbers of particles.
In state-space models, smoothing refers to the task of estimating a latent stochastic process given noisy measurements related to the process. We propose an unbiased estimator of smoothing expectations. The lack-of-bias property has methodological benefits: independent estimators can be generated in parallel, and CI can be constructed from the central limit theorem to quantify the approximation error. To design unbiased estimators, we combine a generic debiasing technique for Markov chains, with a Markov chain Monte Carlo algorithm for smoothing. The resulting procedure is widely applicable and we show in numerical experiments that the removal of the bias comes at a manageable increase in variance. We establish the validity of the proposed estimators under mild assumptions. Numerical experiments are provided on toy models, including a setting of highly informative observations, and for a realistic Lotka-Volterra model with an intractable transition density. Supplementary materials for this article are available online.
This article describes a memory efficient method for solving large-scale optimizationproblems that arise when planning scanning-beam lithography processes. These processes require the identification of an exposure pattern that minimizes the difference between a desired and predicted output image, subject to constraints. The number of free variables is equal to the number of pixels, which can be on the order of millions or billions in practical applications. The proposed method splits the problem domain into a number of smaller overlapping subdomains with constrained boundary conditions, which are then solved sequentially using a constrained gradient search method (L-BFGS-B). Computational time is reduced by exploiting natural sparsity in the problem and employing the fast Fourier transform for efficient gradient calculation. When it comes to the trade-off between memory usage and computational time we can make a different trade-off compared to previous methods, where the required memory is reduced by approximately the number of subdomains at the cost of more computations. In an example problem with 30 million variables, the proposed method reduces memory requirements by 67% but increases computation time by 27%. Variations of the proposed method are expected to find applications in the planning of processes such as scanning laser lithography, scanning electron beam lithography, and focused ion beam deposition, for example.