Model misspecification and bias for inverse probability weighting estimators of average causal effects

Abstract Commonly used semiparametric estimators of causal effects specify parametric models for the propensity score (PS) and the conditional outcome. An example is an augmented inverse probability weighting (IPW) estimator, frequently referred to as a doubly robust estimator, because it is consistent if at least one of the two models is correctly specified. However, in many observational studies, the role of the parametric models is often not to provide a representation of the data‐generating process but rather to facilitate the adjustment for confounding, making the assumption of at least one true model unlikely to hold. In this paper, we propose a crude analytical approach to study the large‐sample bias of estimators when the models are assumed to be approximations of the data‐generating process, namely, when all models are misspecified. We apply our approach to three prototypical estimators of the average causal effect, two IPW estimators, using a misspecified PS model, and an augmented IPW (AIPW) estimator, using misspecified models for the outcome regression (OR) and the PS. For the two IPW estimators, we show that normalization, in addition to having a smaller variance, also offers some protection against bias due to model misspecification. To analyze the question of when the use of two misspecified models is better than one we derive necessary and sufficient conditions for when the AIPW estimator has a smaller bias than a simple IPW estimator and when it has a smaller bias than an IPW estimator with normalized weights. If the misspecification of the outcome model is moderate, the comparisons of the biases of the IPW and AIPW estimators show that the AIPW estimator has a smaller bias than the IPW estimators. However, all biases include a scaling with the PS‐model error and we suggest caution in modeling the PS whenever such a model is involved. For numerical and finite sample illustrations, we include three simulation studies and corresponding approximations of the large‐sample biases. In a dataset from the National Health and Nutrition Examination Survey, we estimate the effect of smoking on blood lead levels.


K E Y W O R D S
average causal effects, comparing biases, outcome model, propensity score INTRODUCTION Identifying an average causal effect of a treatment with observational data requires adjustment for background variables that affect both the treatment and the outcome under study. Often parametric models are assumed for parts of the joint distribution of the treatment, outcome, and background variables (covariates) and large-sample properties of estimators are derived under the assumption that the parametric models are correctly specified.
Inverse probability weighting (IPW) estimators use the difference between the weighted means of the outcomes for the treatment groups as an estimator of the average causal effect. See, for example, the early paper by Hirano et al. (2003) for a nonparametric implementation of standard IPW estimators of the average causal effect. Under an assumption of no unmeasured confounding, IPW estimators reweight the observed outcomes to represent a full sample of potential outcomes, missing and observed, by letting each observed outcome account for itself and other individuals with similar characteristics. IPW estimators can be found in applied literature (see Chang et al., 2017;Kwon et al., 2015 for examples) and their properties have been generalized by Robins, Rotnitzky, and others (Robins & Rotnitzky, 1994;Robins et al., 1995Robins et al., , 2000 to address both confounding bias in observational studies and bias due to missing data. Vansteelandt et al. (2010) and Seaman and White (2013) provide reviews.
To decrease reliance on the choice of a parametric model and increase the efficiency of augmented IPW (AIPW) estimators, Robins and Rotnitzky (1994) proposed including a combination of regression adjustment with weighting based on the PS. See also the review by Seaman and Vansteelandt (2018). The AIPW estimators are referred to as doubly robust (DR) estimators (Bang & Robins, 2005;Tsiatis, 2007) because they are consistent estimators of the average causal effect if either a model for the PS or the outcome regression (OR) model is correct (Scharfstein et al., 1999). The efficiency of different DR estimators is a key property and the variances of the estimators have been described under correct specification of at least one of the models (Cao et al., 2009;Tan, 2010). When both models are correct, the estimator reaches the semiparametric efficiency bound described in Robins and Rotnitzky (1994). The large-sample properties of IPW estimators with standard, normalized and variance minimized weights, together with an AIPW estimator were studied and compared in Lunceford and Davidian (2004) under correct specification of the PS and OR models. Multiply robust estimators allow several candidate models for the PS and OR, respectively. The property of multiple robustness means that the estimators are consistent for the true average treatment effect if any one of the multiple models is correctly specified (Han & Wang, 2013).
There are few studies of doubly or multiply robust estimators under misspecification of both (all) the PS and the OR models. Kang and Schafer (2007) studied and compared the performance of an AIPW estimator for missing data under misspecification of both the PS and OR model. They concluded that many DR methods perform better than simple IPW. However, a regression-based estimator under a misspecified model was not improved upon. The paper was commented on and the relevance of the results was discussed by several authors. See, for example, Tsiatis and Davidian (2007), Tan (2007), and Robins et al. (2007). In Waernbaum (2012), a matching estimator was compared to IPW and AIPW estimators under misspecification of both the PS and OR models. Here, a robustness class for the matching estimator under misspecification of the PS model was described. Formulated in the missing data framework, Tan (2010) evaluated several semiparametric estimators, including IPW and AIPW estimators. In the evaluation, additional criteria were proposed describing robustness classes of the estimators. Vermeulen and Vansteelandt (2015) proposed a bias-reduced AIPW estimator that locally minimizes the squared first-order asymptotic bias under misspecification of both working models. One of the difficulties in the estimation of PSs occurs when the treatment groups have substantially different covariate distributions resulting in some PSs being close to zero or one. This lack of overlap raises issues with respect to model specification. Parametric binary response models, such as the commonly used probit and logit models, are similar in the middle areas of their arguments. However, for probabilities closer to zero or one, they tend to differ more resulting in the specified parametric model being more influential. In Zhou et al. (2020), misspecification of the PS linked to limited overlap is investigated for causal effect estimators using balancing weights (Li et al., 2018). Comparing IPW estimators with and without trimming with overlap weights, matching weights, and entropy weights, they find in extensive simulations that the latter three methods outperform the former (IPW and trimmed IPW) with respect to bias, root-mean-squared error and coverage (Zhou et al., 2020).
In this paper, we describe two commonly used IPW estimators and a prototypical AIPW estimator of the average causal effect under the assumption that none of the working models is correctly specified. For this purpose, we study the difference between the probability limit of the estimator under model misspecification and the true average causal effect. The purpose of this definition of the bias is that the estimators under study converge to a well-defined limit, that is not, however, necessarily consistent for the true average causal effect. We study the biases of the (A)IPW estimators and compare them under the same misspecification of the PS model. The three estimators contain an error involving the ratio of the true PS and the limiting misspecified PS; however, the error affects the estimators in different ways. As the biases for the three estimators can be in different directions, we describe sufficient and necessary conditions using inequalities involving the absolute value of the biases. For a simple and a normalized IPW estimator, we show that the normalization in general moderates the bias due to the PS model misspecification. Comparing the IPW estimators to the AIPW estimator, the biases provide a means to describe when two wrong models are better than one, which would normally be the case for a moderate misspecification of the outcome model. Three simulation studies are performed to investigate the biases for finite samples. The data-generating processes and the misspecified models from the simulation designs are also used for numerical approximations of the large-sample properties derived in the paper.
The paper proceeds as follows. Section 2 presents the model and theory together with the estimators and their properties when the working models are correctly specified. Section 3 presents a general approach and associated assumptions to study model misspecification. In Section 4, the generic biases are derived and comparisons between the estimators are performed. We present three simulation studies in Section 5 containing both finite sample properties of the estimators and numerical large-sample approximations. We apply the estimators under study on an observational dataset in Section 6 where we evaluate the effect of smoking on blood lead levels, and thereafter we conclude with a discussion.

MODEL AND THEORY
The potential outcome framework defines a causal effect as a comparison of potential outcomes that would be observed under different treatments (Rubin, 1974). Let be a vector of pretreatment variables, referred to as covariates, a binary treatment, with realized value = 1 if treated and = 0 if control. Under SUTVA (Rubin, 1980), the causal effect of the treatment is defined as a contrast between two potential outcomes, for example, the difference, (1) − (0), where (1) is the potential outcome under treatment and (0) is the potential outcome under the control treatment. The observed outcome is assumed to be the potential outcome for each level of the treatment = (1) + (1 − ) (0), so that the data vector that we observe is ( , , ), where = 1, … , are assumed independent and identically distributed copies. In the remainder of the paper, we will drop the subscript i for the random variables when not needed. Since each individual only can be subject to one treatment, either (1) or (0) will be missing. If the treatment is randomized, the difference between the sample averages of the treated and controls will be an unbiased estimator of the average causal effect Δ = [ (1) − (0)], the parameter of interest. In the following, we will use the notation 1 = [ (1)], 0 = [ (0)] for the marginal expectations and 1 ( ) = ( (1)| ), 0 ( ) = ( (1)| ) for their conditional counterparts. We denote the probability of being treated conditional on the covariates, the PS by ( ) = ( = 1| ). When the treatment is not assigned randomly,common identification criteria include assumptions of no unmeasured confounding and overlap: where the assumption that ( = 1| = ) is bounded away from zero and one guarantees the existence of a consistent estimator (Khan & Tamer, 2010). Under Assumptions 1 and 2, we can estimate the average causal effect by weighting the observed outcomes with the inverse of the PSs, because leading to an estimatorΔ IPW 1 defined by: A common version of the simple IPW estimator in (1) is an IPW estimatorΔ IPW 2 with normalized weightŝ Using parametric IPW, we assume a finite-dimensional model for ( ) = ( , ), ∈ . Under Assumptions 1-3, the IPW estimators are consistent estimators of the average causal effect Δ with asymptotic distribution √ (Δ IPW − Δ) ∼ (0, 2 IPW ), = 1, 2. Asymptotic properties of (1) and (2) are described in Lunceford and Davidian (2004) under an assumption of a logistic regression model for the treatment assignment.

MODEL MISSPECIFICATION
Our interest lies in the behaviors of the estimators when the PS and the OR models are misspecified. For this purpose, we replace Assumptions 3 and 4 with two other assumptions defining the probability limit of the estimators under a general misspecification. The misspecifications will further be used to define a general bias of the IPW and DR estimators. When the PS is misspecified an estimator, for example, a quasi-maximum likelihood estimator (QMLE) is not consistent for in Assumption 3. However, a probability limit for an estimator under model misspecification exists under general conditions, see, for example, White (1982, Theorem 2.2) for QMLE or Wooldridge (2010, Section 12.1) and Boos and Stefanski (2013, Theorem 7.1) for estimators that can be written as a solution of an estimating equation (M-estimators).
In the following, and as an alternative to Assumptions 1 and 4, we will assume that such limits exist. Below we define an estimator̂ * ( ) of the PS under a misspecified model ( , * ).
Under model misspecification, the probability limit of̂ * is generally well defined; however, ( , * ) is not equal to the true PS ( ). In the following, we use the notation̂ * ( ) as the estimated PS under model misspecification and * ( ) = ( , * ) under Assumption 5. Below we give an example for true and misspecified parametric models, however, for Assumption 5, we do not need the existence of a true parametric model. We use the concept of quasi-maximum likelihood used for maximum likelihood estimators when parts of the distribution are misspecified.
When considering the existence of true and misspecified parametric models, as illustrated in Example 1, the parameters in and the limiting parameters * under the misspecified model need not to be of the same dimension. For instance, the true model could contain higher order terms and interactions that are not present in the estimation model.
The next assumption concerns overlap under model misspecification.
Assumptions 5 and 7 are defined for misspecified PS and OR models for the purpose of describing their influence on the estimation of Δ. The estimators (1)-(3) can be written by estimating equations where the equations solving for the PS and OR parameters are set up below the main equation for the (A)IPW estimators. See, for example, Lunceford and Davidian (2004) and Williamson et al. (2014). Assuming parametric PS and OR models, the IPW estimators correspond to solving 2 + estimating equations ∑ =1 ( , , , ) = 0 for the parameters IPW = ( 1 , 0 , ), = 1, 2, and for the AIPW estimator 2 + + 1 + 0 estimating equations for the parameters AIPW = ( 1 , 0 , , 1 , 0 ). Using the notation for the misspecified models in Assumptions 5 and 7, the estimating equations change according to the dimensions of the parameters * and * , = 0, 1. A key condition for Assumptions 5 and 7 to hold is that the misspecification of the PS and/or OR provides estimating equations that uniquely define the parameters * and * , = 0, 1, although, as a consequence of the misspecification, the resulting (A)IPW estimators will be biased. In the next section, we present the asymptotic bias for the (A)IPW estimators under study with general expressions including the limits of the misspecified PS and OR models.
To assess the properties of the estimators, we assume 1, 2, 5, 6, and 7 and regularity conditions (see Appendix A.1). We evaluate the difference between the probability limits of the estimators under model misspecification and the average causal effect Δ for the (A)IPW estimators.

Comparisons
The consequences of model misspecification for the estimators, respectively, can further be investigated from the general biases in Equations (4)-(6).
To study the role of the model misspecification, we compare the biases in Section 4.1 for two separate parts of the estimator. The first part concerns the bias with respect to 1 and the second part with respect to 0 . The motivation behind this component-wise comparison is that if each of the estimators of 1 , 0 is unbiased, then the resulting estimator of Δ = 1 − 0 is also unbiased. The inverse relationship between the model errors ( )∕ * ( ) and (1 − ( ))∕(1 − * ( )) has the result that the contribution to the overall bias from 1 and 0 may, in general, be of the same sign (see Appendix A.3). We define Bias 1 (Δ * IPW 1 ), Bias 1 (Δ * IPW 2 ), and Bias 1 (Δ * AIPW ) as ForΔ * IPW 1 , the expression in (7) shows that the bias consists of a scaling between the model error ( )∕ * ( ) and the conditional outcome 1 ( ). If the distribution of ( )∕ * ( ) is positively skewed resulting in over estimation of the weighted mean [ ( ) * ( ) 1 ( )], we see that forΔ * IPW 2 , the bias is mitigated because [ ( )∕ * ( )] > 1. A similar effect is obtained for a negatively skewed distribution of ( )∕ * ( ) where [ ( )∕ * ( )] < 1. There is no correspondence to this bias reduction for Bias 1 (Δ * IPW 1 ). The sign of the two biases in (7) and in (8) depends on the covariance of the PS-model errors and the conditional outcome cov[ and implying that the biases can be in different directions for the same model misspecification. Here, Bias 1 (Δ * ). It is not surprising that the covariance of ( )∕ * ( ) and 1 ( ) plays a role for the bias of the estimators. If 1 ( ) was a constant, it could be taken out of the expectations of the first terms in (7) and (8) and the PS-model ratio, ( )∕ * ( ), would be canceled by the denominator [ ( )∕ * ( )] in (8). In this case, the bias forΔ * IPW 2 would be 0. Next, we investigate inequalities involving the absolute values of the biases in Equations (7)-(9). (All derivations are given in Appendix A.3). The results can be directly applied for 0 by replacing ( )∕ * ( ), with (1 − ( ))∕(1 − * ( )) and 1 ( ) with 0 ( ), see Appendix A.3.
First, to study the role of normalization for the IPW estimators, we compare Bias 1 (Δ * IPW 1 ) and Bias 1 (Δ * IPW 2 ). A sufficient and necessary condition for is that That is, the absolute value of the covariance between the PS-model ratio and the conditional outcome is smaller than the absolute value of Bias 1 (Δ * IPW 2 ) scaled with the PS-model ratio. To study the issue of misspecifying two models instead of one, we investigate the difference between the bias of the IPW estimators (7) and (8) and the bias of the AIPW estimator (9). We give a necessary condition for the bias of the AIPW estimator to be smaller than the bias of the simple IPW estimator: If By (12), we see that if the AIPW estimator improves upon the simple IPW estimator under misspecification of both the PS and the OR model, then the absolute value of the misspecified outcome model is less than double the absolute value of the true conditional mean under the same scaling of the PS-model error, ( )∕ * ( ) − 1.
A sufficient condition for the AIPW estimator to have a smaller bias than the simple IPW estimator can be expressed as a comparison between the misspecified OR model and the true conditional outcomes under the same PS-model error.
are either both positive or both negative, then |Bias 1 (Δ * AIPW )| < |Bias 1 (Δ * IPW 1 )|. To provide a numerical example, we assume a second-order model in one variable and obtain the misspecified models' limits by omitting the second order term in both the PS (logistic regression) and the OR model (linear model). We use numerical approximations to provide values for the parameters in * ( ) and * ( ), = 0, 1 under the given true and misspecified models.
Example 2. For a covariate ∼ (−2, 2) and a binary treatment ∼ ( ( )), we assume a logistic PS model and a linear conditional outcome and misspecified nonlinearities in the models In this example, we have that the inequality in (11) We expect based on previous calculations that |Bias 1 (Δ * are both positive, which is consistent with the sufficient conditions for |Bias 1 (Δ * AIPW )| < |Bias 1 (Δ * IPW 1 )| of Equation (13). To confirm, Bias 1 (Δ * AIPW ) = 0.09 that is smaller than 0.16.
By (14), we see that in order for the AIPW estimator to improve upon the normalized IPW estimator, the (PS-error scaled) outcome misspecification must lie within an interval defined by the true conditional outcome and the absolute value of the covariance of ( )∕ * ( ) and 1 ( ). This means that the smaller the covariance, the greater the accuracy of the outcome model forΔ * AIPW to be less biased thanΔ * then Illustrating the sufficient conditions with the data-generating process in Example 2  Summarizing the results of the comparisons of the bias conditions, we note that the expected values of the product of the PS-model error and the true and misspecified conditional outcomes play important roles. Here, the covariances of the PS-model ratio and the true and misspecified conditional outcomes are two of their respective components. In Figure 1, we illustrate these parts with the data-generating processes from Example 2. The PS-model ratio deviates from 1 for both small and large values of , but more so for larger values of . Since both conditional outcomes 1 ( ) and * 1 ( ) are strictly increasing, both covariances are positive (cov[ ( )∕ * ( ), 1 ( )] = 0.12 and cov[ ( )∕ * ( ), * 1 ( )] = 0.03) owing to the PS-model ratio being larger for larger values of . The interval characterization of the described conditions implies that if the two covariances are of the same magnitude, the bias ofΔ * AIPW will often be smaller than the biases ofΔ *

Design
To investigate the asymptotic biases described in Section 4 and also the finite-sample performance ofΔ * , and Δ * AIPW under model misspecification, we perform three simulation studies with three different designs A-C. The first part of the simulations evaluates the finite-sample performance of the estimators and consist of 1000 replications of sample sizes 500, 1000, and 5000. In addition to the simulation results, we also give numerical approximations to the asymptotic biases by fitting the misspecified models with a large sample = 1, 000, 000. The simulations and numerical approxi-TA B L E 2 Simulation Designs A-C mations are carried out using R (R Core Team, 2020). The misspecified models are fitted with the glm() function and were selected in order to be well-known simple models that could have been chosen in practice by a data analyst. The link functions together with the true parameter values are given in Tables 1 and 2, which also contain the details for the misspecified models.

Simulation 1
The covariates ( 1 , 2 , 3 ) are generated 1 ∼ Uniform(1,4), 2 ∼ Poisson(3) and 3 ∼ Bernoulli(0.4). Generalized linear models are used to generate a binary treatment and potential outcomes ( ), = 0, 1 with second-order terms of 1 and 2 in both the PS and OR models, see Table 1. The PS distributions for the treated and controls are bounded away from 0 and 1 under the true models and under the model misspecifications (see Figure 2). The PS and OR models (for the AIPW estimator) are stepwise misspecified in the three designs (A, B, C).
A: a quadratic term 1 2 is omitted in the PS and OR models; B: two quadratic terms, 1 2 and 2 2 , are omitted in the PS and OR models; C: two quadratic terms are omitted and both the OR and PS link functions are misspecified.

Simulation 2
The design is inspired by the simulation study of Funk et al. (2011). We generate the same covariates ( 1 , 2 , 3 , 4 ) where 1 ∼ Normal(0,1), 2 ∼ Normal(0,1), 3 ∼ Uniform(0.1), and 4 ∼ Normal(0,1). The treatment and outcomes are generated with second-order terms of 1 and 2 in both the PS and OR models given in Table 1. In Figure 4, we see that the PS distributions, under both the true and misspecified models, have poorer overlap and values that are close to 0 and 1. The PS and OR models (for the AIPW estimator) are stepwise misspecified in three designs where: A: two quadratic terms, 1 2 and 2 2 , are omitted in the PS model and 1 2 and 3 2 in the OR models; B: two quadratic terms, 1 2 and 2 2 , are omitted in the PS model and 1 2 and 3 2 in the OR models; and transformations of the first-order terms, 1 = 1 + 2 + 3 and 2 = 1 + 3 + 4 , are applied in the PS and the OR models, respectively; C: two quadratic terms, 1 2 and 2 2 , are omitted in the PS model and 1 2 and 3 2 in the OR models, 3 and 4 are omitted in the PS and the OR models, respectively; and transformations of the first-order terms, 3 = 1 + 2 and 4 = 1 + 3 , are applied in the PS and the OR models, respectively.

Simulation 3
The design replicates the covariates and PS models of Zhou et al. (2020), in the setting referred to as medium treatment prevalence and PS distributions with good, moderate, and poor overlap (see Figure 5). In our design, the PS and OR models (for the AIPW estimator) are misspecified using the following variable transformations: 2 = 2 (1 + 1 ) + 10, .

Results
In Tables 3 and 4, we give the simulation bias, standard error, and mean squared error (MSE) of the three estimators. Tables 5-7 give numerical approximations for Bias(Δ * IPW 1 ), Bias(Δ * IPW 2 ), and Bias(Δ * AIPW ) using a sample size of = 1, 000, 000. When using the true models, that is, when studying the estimatorsΔ IPW 1 ,Δ IPW 2 , andΔ AIPW , the bias is small and decreases as the sample size increases. In Simulations 1 and 2, the standard errors follow the expected order with the smallest forΔ AIPW followed byΔ IPW 2 andΔ IPW 1 (Lunceford & Davidian, 2004). In Simulation 3, the standard

F I G U R E 5
Overlap plots for the propensity score distributions,̂( ) and̂ * ( ) for treated and controls for Design A (good overlap), B (moderate overlap), and C (poor overlap) in Simulation 3 TA B L E 3 Results for Simulations 1 and 2 for sample sizes 500, 1000, and 5000. In Simulation 1, Designs A and B share the same true models. In Simulation 2, the true models are the same in Designs A-C. All true models are given in Table 1.  Figure 3 gives an illustration of the suppressing effect obtained by the normalization inΔ * Under misspecification, the bias in all three simulations and all designs are close to the asymptotic approximations, at least for the largest sample size. For example, in Design C in Simulation 3 with poor overlap, the bias ofΔ * IPW 2 is smaller thanΔ * AIPW when = 1000 although for = 5000, the bias ofΔ * AIPW is smaller thanΔ *

IPW 2
that is what we see in the asymptotic approximation. For the simulations with poor overlap (both in Simulation 2 and Simulation 3, Design C), the bias is very large because the model misspecification forΔ IPW 1 , and bothΔ IPW 2 andΔ * AIPW have substantially smaller biases.
In Simulation 1, Bias(Δ * IPW 1 ) is the largest. Bias(Δ * IPW 2 ) and Bias(Δ * AIPW ) are similar, although Bias(Δ * AIPW ) is slightly smaller in most cases. In Simulation 2, with poorer overlap, Bias(Δ * IPW 2 ) is the smallest for all three study designs, demon-TA B L E 4 Results for Simulation 3 for sample sizes 500, 1000, and 5000. The true and false models for Designs A-C are described in Studying the two different parts of the biases illustrate how-not only the variances, but also the biases, get inflated from the lack of overlap in the PS distribution. In Simulation 3 in the design with poor overlap (see Table 7), we have that Bias 1 (Δ * IPW 1 ) is small but Bias 2 (Δ * IPW 1 ) is very large. This result is owing to the ratio (1 − ( ))∕(1 − * ( )) being instable because of sparse data for values close to 0. Correspondingly, we see that the mean (1 − ( ))∕(1 − * ( )) is far away from 1. In the previous section, we described the stabilizing properties of the normalized IPW, counteracting the model misspecification error. The standard errors under misspecification follow the same pattern as under the true models, which can be expected under regularity conditions from semiparametric theory, see, for example, Boos and Stefanski (2013, Chapter 7.2). In Simulation 2, the standard errors ofΔ * IPW 2 are the smallest followed byΔ * AIPW for Designs A and B andΔ * IPW 1 for Design C. In Simulation 3, the standard errors ofΔ * IPW 2 are the smallest followed byΔ * AIPW andΔ * IPW 1 for Designs B and C corresponding to moderate and poor PS distribution overlap.
To relate the simulations to the sufficient and necessary conditions derived in Section 4.2, the related expressions from the asymptotic approximations are shown in Tables 5-7. The expectations and covariances that are used for the necessary and sufficient conditions in Equations (11)-(15) can be applied to draw the corresponding conclusions of the estimators. As an example, we see that the necessary condition for the absolute values of Bias (Δ * AIPW ) to be smaller than the absolute value of Bias (Δ * IPW 1 ) holds for both = 0, and = 1 in Simulation 1, but the same condition is not satisfied in Simulation 2.

DATA EXAMPLE
As a motivating example, we analyze data from the National Health and Nutrition Examination Survey (NHANES, 2007(NHANES, -2008 for the purpose of estimating the effect of smoking on blood lead levels. Earlier studies have suggested that increased blood lead levels are associated with chronic kidney disease and peripheral arterial diseases (Muntner et al., 2005). Higher blood lead levels are also associated with mortality in the general U.S. population (Menke et al. 2006). The NHANES dataset studied here is a subset of the data previously analyzed by Hsu and Small (2013) evaluating the relationship between smoking and blood lead levels for the treated population (ATT) with a matching approach. To improve overlap for the estimation of the average causal effect, Δ, we select the study population of males, = 1392. The covariate set is also expanded with four more covariates from the original NHANES demographic data. The treated individuals are defined as daily smokers ( 1 = 386) and the controls are individuals who had smoked fewer than 100 cigarettes during their life and no cigarettes in the last 30 days ( 0 = 1006). The outcome of interest is blood lead levels (micrograms per deciliter, g/dL). We control for the covariates, age, army service, marriage, birth country, education, family size, and income-topoverty level, see Table 8. For this dataset, we apply the three estimators using a logistic propensity score model and for the AIPW estimator, we additionally use a linear OR model with the same covariates. The overlap is displayed in the mirror histogram of Figure 6a together with balance diagnostics in Figure 6b. Here, we see that the balance reduction achieved from the weighing seems satisfactory for most of the covariates, having standardized mean differences within a balance threshold of 0.10. However, for the age squared, army service, college education group, and the group born in Spanish-speaking countries other than Mexico, they have standardized mean differences just exceeding this threshold.    Applying the simple IPW estimator,Δ * IPW 1 , smoking increases the blood lead levels with 1.10 g/dL (95% CI: 0.63-1.56), whereas the normalized version,Δ * IPW 2 results in a smaller estimated effect of 0.88 g/dL and a smaller standard error (95% CI: 0.59-1.18). The AIPW estimator,Δ * AIPW , further reduces the effect estimate to 0.86 g/dL and the standard error (95% CI:0.57-1.14, see Table 9). Although applying the estimators to the data does not give information on possible bias due to model misspecification, our results from Section 4 provide guidance to rely on the estimates fromΔ * IPW 2 orΔ * AIPW , that is, 0.88 or 0.86 g/dL with corresponding confidence intervals, rather than the higher value fromΔ * IPW 1 , of 1.10 g/dL.

DISCUSSION
In this paper, we investigate biases of two IPW estimators and an AIPW estimator under model misspecification. For this purpose, we use a generic probability limit, under misspecification of the PS and OR models, which exists under general conditions. Since the PS enters the estimator in different ways for the IPW estimators under study, the consequences of the model misspecification are not the same. The bias of the IPW estimators depends on the covariance between the PSmodel error and the conditional outcome in different ways and the resulting bias can be in opposite directions. For the IPW estimators, normalization has the potential of reducing the bias because it scales the estimator in a mitigating manner.
Comparing the bias of the AIPW estimator with a simple IPW estimator, the necessary condition for the AIPW estimator to have a smaller bias is that the expectation of the outcome model under misspecification is less than twice the true conditional outcome, where the expectations include a scaling with the PS-model error. For comparison with the normalized TA B L E 8 Summary statistics for covariates in the NHANES data. Means and sd for lead, age, income-to-poverty level and family size, proportions for education, missing income, race, army service, marriage indicator, and birth country IPW estimator, the (PS-error scaled) misspecified outcome involves an interval defined by the true conditional outcome adding and subtracting the absolute value of the covariance between the PS-model error and the conditional outcome.
The biases and conditions are exemplified in three simulation studies where the fitted misspecified models fails in specifying nonlinearities, functional form (through misspecified link functions), and covariates. The simulation studies are also accompanied by numerical approximations of the large-sample biases. The third simulation study specifically compares the impact of good, moderate, and poor overlap on the bias due to model misspecification. Here, we see that it is not only the variance that gets inflated from PS values close to 0 or 1, but the bias due to model misspecification also increases rapidly. The normalized IPW and AIPW estimators show a more stable performance. The bias expressions of the IPW and AIPW estimators suggest that the AIPW estimator has a smaller bias than the IPW estimators even under moderate misspecification of the outcome model. For the AIPW estimator, poor overlap and large differences between ( ) and * ( ) are compensated for by outcome model assumptions in the area where data are sparse. However, in the simulations, the normalized IPW estimator also performs well due to the implicit stabilization from the PS-model errors. Since all biases include the PS-model error, we suggest that a researcher should be careful when modeling the PS even though an OR model is additionally involved.

A C K N O W L E D G M E N T
The authors acknowledge Professor Elena Stanghellini for valuable input. Financial support was received from the Royal Swedish Academy of Sciences and the Swedish Research Council (grant number 2016-00703).

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available at https://github.com/IngWae/Bias_AIPW

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section. This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results. The results reported in this article could fully be reproduced. , andΔ * AIPW to their corresponding expectations would follow directly from a weak law of large numbers (WLLN) for an iid sample of ( , , ) except for the estimated parameterŝ * in̂ * ( ) and * in̂ * ( ), = 0, 1. , and (̂ * ,̂ * ) forΔ * AIPW and under Assumptions 5 and 7, the consistency of̂is ensured. Regularity conditions for the function can be given, see, for example, Boos and Stefanski (2013, Theorem 7.3) who show that (A1) holds for differentiable functions with bounded derivatives (w.r.t. ). The regularity conditions for [ , , ,̂], for the three estimators, imply conditions on the models * ( , * ) and * ( , * ) such that the regularity condition for is satisfied. Under (A1), we can insert the limiting values * and * and their corresponding * ( ) and * ( ), = 0, 1 when taking a WLLN.

A.3 Comparisons
To study the consequences of model misspecification for the estimators, we compare each difference involving 1 ( ) and 0 ( ) separately. For example, we study the bias ofΔ * Since the errors ( )∕ * ( ) and (1 − ( ))∕(1 − * ( )) are inversely related, we would normally not expect that the biases from the two parts [ Inequalities concerning the biases are made with respect to the absolute values for two parts separately, for example, for Bias(Δ * ] − 1 .