Nonlinear taxation of income and education in the presence of income ‐ misreporting

We study the joint design of nonlinear income and education taxes when the government pursues redistributive objectives. A key feature of our setup is that the ability type of an agent can affect both the costs and benefits of acquiring education. Market remuneration of agents depends on both their innate ability type and their educational choices. Our focus is on the properties of constrained efficient allocations when educational choices are publicly observable at the individual level, but earned income is subject to misreporting. We find that income ‐ misreporting (IM) affects the optimal distortions on income and education and shed light on the reasons for it and mechanisms through which it is done. We show how and why IM strengthens the case for downward distorting the educational choices of low ‐ ability agents. Finally, we find that IM provides another mechanism that makes commodity taxation useful.


| INTRODUCTION
The contributions growing out of Mirrlees' (1971) seminal paper on optimal income taxation have mostly assumed that an individual's productivity or wage rate is exogenously given.More recently, however, comparatively small literature has analyzed optimal redistributive taxation in settings with endogenous wages.These contributions may be divided roughly into three strands.One strand maintains the assumption of perfect competition in the labor market and generates wage endogeneity by treating workers of different skill type as separate inputs that are imperfectly substitutable in the production function. 1A second strand generates wage endogeneity by introducing frictions in the labor market that may either be due to imperfect competition or to problems of asymmetric information between workers and employers. 2The third strand endogenizes wages by allowing for the possibility to invest in productivity-enhancing education. 3 Within this last strand, a number of contributions assume that educational attainment is publicly observable at the individual level and thus can be taxed nonlinearly.These studies have greatly enhanced our understanding of what determines the direction of the optimal distortion on the educational choices of agents.Yet, while they differ in many aspects, they all maintain the Mirrleesian assumption of public observability of earnings.This makes nonlinear taxation of both incomes and educational expenditures possible.However, in reality, incomemisreporting (IM) is often a relevant phenomenon-a fact that might very well undermine the efficacy of the income tax in achieving redistribution.
The distinctive feature of our contribution lies in the recognition that agents can conceal part of their earned income for tax purposes.Our main goal is to investigate if, and how, the optimal distortion on agents' educational choices varies depending on whether or not earned income is perfectly observable by the government at the individual level.And, to simplify our analysis, we model IM following the so called riskless approach pioneered by Usher (1986).
As a vehicle for our study, we set up a two-type optimal income tax model (à la Stern, 1982 andStiglitz, 1982) where an agent's productivity depends on his type (innate ability), and on the amount of education he acquires.To attain a given amount of education, agents incur an effort cost which is type-dependent (in addition to monetary cost of education). 4 We characterize the properties of an informationally constrained Pareto-efficient tax policy, focusing on the so-called "normal" case where the direction of redistribution goes from the high-to the low-ability type.
Our main results can be summarized as follows.First, when an agent's productivity or wage depends only on his education level and not directly on his type (i.e., the effect of one's type is channeled only indirectly through the education level he chooses), IM does not affect the qualitative properties of an optimal tax policy.Whether earned income is perfectly observable at the individual level or not, all agents-high-and low-ability alike-face a zero marginal income tax rate.Tax treatment of educational attainment, on the other hand, is not the same.Its choice is left undistorted for high-ability agents but downward distorted for low-ability agents.
Second, in the more general case wherein an agent's productivity or wage depends directly on both his type and educational attainment, IM does not change the characterization and the sign of the marginal income tax rate faced by low-ability agents (which should be positive), and high-ability agents (which should be zero).It also leaves unscathed the result that education 1 should be undistorted for high-ability agents.However, compared with a setting where earned income is perfectly observed by the government at the individual level, it strengthens the case for a downward distortion on the educational choice of low-ability agents.We will explore the various factors that are behind this later in the paper.However, it is worth pointing out here that, with or without IM, education provides another instrument for making "mimicking" less attractive (making the preferred choice of the low-ability agents less desirable to high-ability agents), thus allowing more redistribution to low-ability agents.
Third, in the general case wherein an agent's productivity depends directly on his type, IM implies that consumption taxation is no longer a redundant policy instrument.We thus reconsider our results by assuming that a linear consumption tax is used alongside a joint nonlinear tax on education and reported income.Under this scenario, it becomes desirable to let high-ability agents face a negative marginal tax on reported income and also to distort upwards their educational choice.For low-ability agents, consumption taxation exerts a moderating effect on the optimal marginal tax on reported income.It also generates a mitigating effect on the tendency, attributable to the possibility of IM, to warrant a downward distortion on the educational choice of low-ability individuals.
The paper is related to a diverse body of literature in public economics.
(i) Tax evasion.Most of the early evasion literature, following the seminal contribution by Allingham and Sandmo (1972), assumes that decisions about IM involve risk.Evasion may be detected with some probability, for instance due to random audits by the tax authorities, in which case a sanction applies (see, e.g., Cremer et al., 1990;Cremer & Gahvari, 1994, 1996;Schroyen, 1997).The riskless approach to evasion, introduced in the literature by Usher (1986), assumes that taxpayers are able to fully avoid detection by incurring a cost that depends on the amount they misreport.The cost function may be implicitly assumed to capture some of the elements from the uncertainty model, for instance to make the cost higher the more extensive is the auditing activity of the tax collector.As in the uncertainty case, there is a trade-off between the gain from lowering the tax by IM and the cost incurred, which is modeled as a pure concealment cost.Following the contribution by Usher (1986), the riskless approach has been used in a number of subsequent contributions (e.g., Mayshar, 1991;Boadway et al., 1994;Kopczuk, 2001;Slemrod, 2001;Christiansen & Tuomala, 2008;Chetty, 2009;Gahvari & Micheletto, 2014;Gerritsen, 2021).(ii) Redistributive role of education policy.Beginning with Arrow (1971), a large body of public economics literature has investigated the redistributive role of education policy.Earlier contributions, including Arrow (1971), Green and Sheshinski (1975), and Bruno (1976), left aside asymmetric information problems and assumed that the government could observe an individual's type.The main goal of these contributions was to characterize the optimal allocation of a given amount of educational expenditure amongst a population of individuals of different ability.Considering the educational policy in isolation, or assuming an exogenously given income tax schedule, these papers could not shed light on the relative merits of tax-and expenditure policy for redistributive purposes.(iii) Interaction of income redistribution and educational policy.Ulph (1977) and Hare and Ulph (1979) developed Bruno's work on the interaction of income redistribution and educational policy by allowing both types of policies to be simultaneously optimized.However, they retained the assumption that the ability to benefit from education is observed by the education authorities.Relaxing this assumption, and assuming that a nonlinear income tax is the only government's policy instrument, Tuomala (1986) analyzed how individuals' educational choices affect the progressivity of the optimal nonlinear income tax.He did so in the context of a timeless model where private agents only differ in their ability to transform education into labor productivity.Subsequently, Brett and Weymark (2003) generalized Tuomala's model to a setting where individuals differ both in terms of their ability to transform education into labor productivity, and in terms of the time needed to acquire a given amount of education.However, as in Tuomala's case, they assumed that taxes could only be set as a function of earned income.Bovenberg and Jacobs (2005) was the first paper which jointly optimized, within a Mirrleesian static framework, a nonlinear tax on income and on educational expenditure; they also considered the case where education entails for agents both a resource-and an effort-cost, but assumed away the possibility that agents differ in the effort cost of acquiring education.A famous result that they obtained was that education subsidies and income taxes are "Siamese twins," in that education subsidies should be used for the sole purpose of offsetting the distortions created by redistributive income taxation.This result was later challenged by Maldonado (2008) who showed that distorting the educational choices of agents is desirable when the education elasticity of wage varies with ability. 5(iv) Dynamic Mirrleesian settings.From a normative standpoint, the interaction between the taxation of income and education has also been investigated in dynamic Mirrleesian settings where commitment issues or risky properties of human capital investment have been important elements of the analysis.The differences between an optimal policy under full or limited commitment have been studied by Guo and Krause (2013) and Findeisen and Sachs (2018).Uncertainty about the return to human capital investment implies that the tax policy serves a dual purpose, achieving redistribution and providing insurance.The role of uncertainty has been analyzed in various contributions such as da Costa and Maestri (2007), da Costa and Severo (2008), Findeisen and Sachs (2016), Stantcheva (2017), and Kapicka and Neira (2019).The first three papers consider two-period models where agents make a one-shot education decision, with a one-time realization of uncertainty.Stantcheva (2017) considers an n-period model where investment in education occurs during the entire life-cycle of an agent, with progressive realization of uncertainty throughout life.Kapicka and Neira (2019) consider a two-period model with one-time realization of uncertainty but assume that human capital investments are only partially observable by the planner.
Agents differ in ability type, which is described by the parameter θ.A higher-ability corresponds to a higher value of θ.The labor productivity of a given agent depends on his type θ and education level e. Attaining a given level of education entails both a resource-and an effortcost.The resource cost is given by p (per unit of education e) and the effort cost is captured by a function φ θ e ( , ) which in general depends on the individual's ability type.An agent of ability θ who acquires education in the amount e has labor productivity w θ e ( , ), which means that he supplies w θ e ( , ) units of labor in efficiency units per unit of time.We assume that w θ e θ ( , ) 0 ∂ ∕∂ ≥ and w θ e e ( , ) > 0 ∂ ∕∂ .There is a single consumption good, denoted by c, and produced using labor in efficiency units as the sole input.The technology exhibits constant returns to scale.The consumption good is treated as the numéraire and we choose the units of measurement so that one unit of labor in efficiency units produces one unit of output.The labor market is perfectly competitive, so that an individual's wage is equal to the marginal (and average) product of his labor w θ e ( , ).Earned income, denoted by I , is thus given by I w θ e L ( , ) ≡ , where L is the amount of labor supplied to the market.Agents' preferences are described by the function where  , ).
The associated first-order conditions with respect to I and e are given by: where the LHS of (2) represents the marginal rate of substitution between labor and consumption (MRS Lc ), and the LHS of (3) represents the marginal rate of substitution between education and consumption, or as it is called below, the marginal willingness to pay for education (MWP ec ).

| Design
Consider a discrete setting with two types of agents, those with θ θ = ℓ (low-ability) and those with θ θ = h (high-ability), with θ θ > h ℓ .The government intends to design a Pareto-efficient tax policy that allows achieving some given revenue-raising-and redistributive goals.The informational structure of the problem includes the standard Mirrleesian assumption that the government knows the distribution of types in the population but does not know "who is who."This rules out the possibility of using first-best, type-specific, lump-sum taxes/subsidies.However, in contrast to what is commonly assumed in optimal taxation models, we shall assume that earned incomes are not publicly observable either.This opens up the possibility of tax evasion through IM.The educational level achieved by a given individual, on the other hand, is assumed publicly observable and thus taxable.
To model IM, we follow the riskless approach introduced by Usher (1986); specifically, once agents have incurred a cost, they face no risk of detection.This simple structure allows the government to achieve its objectives through a general (nonlinear) tax function T ( , )  ⋅ ⋅ which depends on reported income, M, and education level e.To see this, recall that in a two-group model without IM, one needs only determine the two groups' allocations.This can be done by a direct mechanism consisting of two bundles, each specifying a particular amount of income (earned equal to reported), education, and consumption.The same procedure works in our model if the two bundles, the one intended for the low-skilled agents and the other for the highskilled ones, are specified in terms of reported income M, education e, and tax payment T (equivalently, net-of-tax reported income: ).The reason is that, as we show below, an (e M B , , )-bundle corresponds to an (e L c , , )-bundle; notwithstanding the fact that reported incomes will likely differ from actual earned incomes.
Consider the optimization problem of an agent who is to choose between bundles (e M B , ,

and (e M B , ,
h h h ).He will choose the bundle that maximizes his utility (1).Denote by a the difference between earned-and reported-income.IM is costly: a taxpayer who evades a incurs a (pecuniary) cost σ a ( ), where σ ( ) ⋅ is assumed to be nonnegative, increasing in (the absolute value of) a, strictly convex.We also assume σ σ (0) = ′(0) = 0.The consumption level of a taxpayer who selects (e M B , , ), and subsequently misreports a, is equal to where we have dropped the superscripts for ease in notation.Observe that in designing this mechanism, we have implicitly assumed that the pecuniary cost of education is paid by the government.Alternatively, one may assume the cost is incurred by the individual paying p per unit of e.The two formulations provide identical results. 6 With the taxpayer's true earnings being equal to w θ e L ( , ) , we have Substituting ( 4) and ( 5) in (1), yields Notice that any given (e M B , , )-bundle where the resource cost of education is paid for by the government is equivalent, from an agent's standpoint, to an (e M B , , ∼ )-bundle, where B B pe and the resource cost of education is paid for by the agent.The two bundles, under the respective assumptions about who pays for the resource cost of education, are also equivalent in terms of net revenue collected by the government.
The taxpayer's optimization problem consists of choosing a to maximize (6).This results in the first-order condition which determines a and subsequently c.Obviously, given (5), choosing a is tantamount to choosing L.
Having shown that a taxpayer's choice of (e M B , , ) determines his consumption bundle (e L c , , ), we now discuss how the mechanism designer determines (e M B , , There are two types of constraints that must be considered in this problem.One is that the bundles must satisfy the economy's resource constraint.The other, with the taxpayers being ultimately free to determine their allocations when given a tax function, is that the bundles must be incentive-compatible.This requires that agents do not behave as "mimickers," misrepresenting their ability type: each agent must weakly prefer the (e M B , , )-bundle intended for his ability type to that intended for the other type.
For a given (e M B , , )-bundle, the (conditional) indirect utility of an agent of ability θ is given by where a* represents the value of a that solves the first-order condition ( 7).Normalizing to one the size of the total population and denoting by π the proportion of agents of type ℓ (low- ability), the government's problem (hereafter, problem 1  ) can be formalized as subject to: where V and R represent, respectively, an exogenous prespecified utility level for agents of type h (high-ability) and a government's exogenous revenue requirement. 7Notice that, according to the last constraint, the resource cost of education is covered by the government.As pointed out in footnote 6, this kind of approach is without loss of generality.
Denoting the Lagrange multipliers associated with the first, second, and third constraint by δ λ , , and μ, the optimal values of M e B M e B , , , , , are determined by the first-order conditions to this problem presented in Appendix A (Equations A1-A6).We have neglected to take into account the self-selection constraint requiring low-ability agents not to mimic high-ability agents.This approach can be justified by assuming that we focus on the so called "normal" case when the intended direction of redistribution is from the high-to the low-ability agents.Put differently, we are implicitly assuming that the value for V appearing on the RHS of the first constraint is lower than the utility enjoyed by high-ability agents under laissez-faire.

| Properties
We can now specify the properties of the tax schedule From the first-order conditions of the above problem, we can derive the following implicit characterizations for the marginal tax rates Notice also that, combining ( 7) and ( 9), one gets that For later purposes, it is useful to define the marginal rate of substitution between M and B, denoted by MRS MB , for an agent choosing a given (e M B , , )-bundle.From (8), by invoking the envelope theorem, we have: Based on (12), we introduce the following notations for the marginal rates of substitution between M and B for a j-type (with j h = , ℓ) and for an h-type mimicking an ℓ-type (i.e., a high-ability agent choosing the bundle e M B ( , , ) where a a e M B θ *( , , ; ) ≡ , for j h = , ℓ, and a a e M B θ *( , , ; ) We are now in a position to give a characterization for the implicit marginal tax rates faced by high-and low-ability agents at an optimum.This is done in Proposition 1 where we denote the earnings of a low-ability agent by I M a + ℓ ℓ ℓ ≡ and the earnings of a high-ability agent behaving as a mimicker by Proposition 1.Consider a two-group optimal income tax model with IM, wherein a worker's productivity depends directly on his ability type and educational attainment.The optimal implicit marginal tax rates faced by the high-and low-ability agents are given by: T M e T M e p ( , ) = 0, ( , ) = , and ( ) ( ) where Proof.See Appendix A. □ The fact that T M e ( , ) = 0 implies that the labor supply of high-ability agents is undistorted (which also implies that a = 0 h ) and the fact that T M e p ( , ) = e h h tells us that the optimal marginal tax on education is given by a purely nondistortionary term that is meant to let agents internalize the marginal resource cost of e. 8 These particular results are not surprising as they are yet another reflection of the well-known "no distortion at the top" result, and an artifact of the assumption that redistribution is from high-to low-ability types.What is more interesting is the nature of the marginal tax rates faced by the low-ability types to which we now turn.
To proceed on this front, we find it useful to distinguish between two cases.Case (a): One's type does not affect his productivity directly; educational attainment serves as the only vehicle for any effect that one's type may have on his productivity (through the type-dependent effortcost of achieving a given education level, namely, the function φ θ e ( , )).Algebraically, this is depicted by w θ e θ ( , ) = 0 ∂ ∕∂ .Case (b): One's type has a positive direct effect on productivity separate from the effect channeled through education so that w θ e θ ( , ) > 0 ∂ ∕∂ .
3.2.1 | Case (a): w θ e θ ( , ) = 0 ∂ ∕∂ We start by presenting a lemma which will help in studying this case.
(ii) For any given (e M B , , )-bundle we have that Proof.See Appendix A. □ The result in (i) is due to the fact that our riskless modeling of evasion, with a typeindependent IM-cost function σ ( ) ⋅ , ensures that an individual's choice of a is independent of his type for any given (e Using the results of Lemma 1 in Equations ( 16)-( 17) simplifies them into T M e p λ μπ φ θ e e φ θ e e ( , ) = + ( , ) − ( , ) , which implies that T M e p ( , ) > .The key to understand these results is to notice that, when taxes can be conditioned on education, a high-ability mimicker is denied any information rent associated with differences in productivities.This is due to the fact that, conditional on education, productivity is not type-dependent when w θ e θ ( , ) = 0 ∂ ∕∂ .Absent a difference in market productivities, there is no mimicking-deterring benefit from distorting the labor supply of low-ability agents; it would serve no screening purpose given that MRS MRS = MB MB h ℓ ℓ .This observation accounts for the result provided by Equation (19).Moreover, given that from Proposition 1 we also have that T M e ( , ) = 0 , it also follows that, once taxes can be conditioned on education, it is useless to condition the tax liability also on reported income; a nonlinear tax on education is all that the government needs.High-ability mimickers only enjoy an information rent that is associated with differences in the (effort) cost of acquiring education, and this information rent is reflected in the term within square brackets in Equation ( 20).This term calls for setting the marginal tax on education at a level higher than p, entailing a downward distortion on e ℓ which is justified by mimicking-deterring considerations.The intuition comes from the observation that, since the marginal (effort) cost of acquiring education is lower for a mimicker than for a low-ability agent, the marginal willingness to pay for education is higher for the former than for the latter. 9Hence, introducing a small distortion on e ℓ imposes a first-order utility loss on the 9 As we pointed out at the end of Section 2, the expression on the LHS of (3) represents the marginal willingness to pay for education.mimicker, thereby relaxing the binding self-selection constraint, while at the same time exerting only a second-order effect on low-ability agents.
The above results, and in particular the redundancy of conditioning the tax liability also on reported income, warrant an important remark.This result hinges on the fact that for any given (e M B , , )-bundle, a a = hℓ ℓ when w θ e θ ( , ) = 0 ∂ ∕∂ , an outcome that descends from the assumption that there is no heterogeneity in the misreporting technology available to taxpayers.With heterogeneous IM-technologies, parts (i) and (ii) of Lemma 1 would cease to be valid. 10,11As a consequence, if type-ℓ agents had access to a cheaper IM-technology, the government would benefit from conditioning the tax liability also on reported income, and letting T M e ( , ) M ℓ ℓ deviate from zero.12 Intuitively, if IM is easier for those agents who are regarded as "more deserving" from a social point of view, there are redistributive gains from designing the tax schedule in such a way that transfer-recipients are induced to misreport their income.These gains arise from mimickingdeterring effects: requiring transfer-recipients to report a specific amount of income (on top of acquiring a given level of education) imposes a softer constraint on the labor supply of agents having access to a cheaper IM-technology.Therefore, if deserving agents have better IM-opportunities, requiring transfer-recipients to report a specific amount of income imposes a lighter burden on those who are the intended beneficiaries of the redistributive program than on mimickers.
Our final observation here is that the characterization of T M e ( , ) and T M e ( , ) provided by Equations ( 19)-( 20) are precisely the ones that emerge in a model without IM.This is quite intuitive in light of the fact that the no-IM case can be interpreted as a special case in which the condition a a = h ℓ ℓ applies.Thus, if wage conditional on education and IM-costs are both type-independent, then IM will have no impact on the characterization of optimal marginal taxes on incomes and educational attainments.Lemma 2. Assume that w θ e θ ( , ) > 0 ∂ ∕∂ .Then: (i) A high-ability worker, when behaving as a mimicker, will misreport more than a lowability worker.That is, a a > hℓ ℓ .
(ii) For any given (e M B , , )-bundle we have that For example, assume that the function σ a ( ) only applies to low-ability agents, whereas for high-ability agents (independently on whether they behave as mimickers or not) the cost-of-IM function is given by kσ a ( ), with k representing a positive constant.We would have that a a = = 0 hℓ ℓ

, and therefore MRS MRS
; at all other (e M B , , )-bundles we would have that a a hℓ ℓ

≠
. Within a different setting, heterogeneous misreporting technologies are considered by Kopczuk (2001), Blomquist et al. (2016), andCanta et al. (2021). 11 The same would be true if one were to relax the assumption that the effort-cost of acquiring education and the effort cost of supplying labor in the market are additively separable in the agents' utility function.For example, assuming that U u c g v L φ θ e = ( ) − ( ( ) + ( , )), with g ( ) ⋅ representing an increasing and convex function, would again imply that the results provided by parts (i) and (ii) of Lemma 1 break down.The intuition comes from observing that the reformulated utility function implies that the marginal disutility of labor supply is affected by the (effort) cost of acquiring education.
The implication of Lemma 2 for the optimal tax characterization given in Proposition 1 is that T M e ( , ) > 0 which is the same result as the one in the absence of IM.Indeed, T M e ( , ) As a benchmark for our discussion, consider a setting where agents cannot misreport their earned income to the tax authority.In that case, we would have that a a = = 0 To make things simpler, let us make one further simplification and assume that the effort cost of achieving a given education level, depends only on the education level and not ability.That is, φ θ e θ ( , ) = 0 ∂ ∕∂ so that φ θ e e φ θ e e φ e ( , ) = ( , ) ′( ) . Under this assumption, Equation ( 21) simplifies to We know that in the absence of IM, ; this includes the case in which w θ e ( , ) can be written as w θ e f θ g e ( , ) = ( ) ( ), where we have g e g e ϵ = ϵ = ′( ) ( )

∕
, and the one in which w θ e ( , ) can be written as w θ e f θ g e ( , ) = ( ) + ( ), where we have (assuming e ℓ ℓ to exceed or fall short of p.To interpret this result one should consider again whether or not, for a given e M B ( , , )-bundle, a mimicker differs from a low-ability agent in terms of marginal willingness to pay for education (MWP ec ).If such a difference exists, the government will have an incentive, for mimicking-deterring purposes, to let T M e ( , ) e ℓ ℓ deviate from p.More specifically, if MWP ec is higher (resp.: lower) for a mimicker than for a low-ability agent, it will be desirable to set T M e ( , ) e ℓ ℓ at a level that is higher (resp.: lower) than p.
Without IM, both agents have the same consumption and therefore the same marginal utility of consumption, ℓ .Thus, if they also incur the same marginal (effort) cost of acquiring education, a difference in their respective MWP ec can only arise from a difference in the extent to which a marginal increase in education lowers their disutility of labor supply.For a given e M B ( , , )-bundle, this effect is given by , ℓ a mimicker's marginal willingness to pay for education will exceed that of a low-ability agent, in which case it will be desirable to set T M e p ( , ) > e ℓ ℓ , and vice versa for the case when ℓ .This is indeed the message provided by Equation ( 22).Consider now the counterpart of ( 22) in the presence of IM.Assuming again that φ θ e θ ( , ) = 0 ∂ ∕∂ , we have from Equation ( 17), ℓ , by comparing ( 22) and ( 24) we can see that IM has an ambiguous effect on the sign of T M e p ( , ) − e ℓ ℓ . 13 On one hand, given that I I > hℓ ℓ , the second term on the RHS of ( 24) is raised compared to the corresponding term in ( 22), and this makes it more likely that T M e p ( , ) > e ℓ ℓ . To interpret this result, observe that a marginal increase in education, raising an agent's productivity, allows earning a given amount of income at a lower labor supply.In turn, the magnitude of this laborsaving effect is increasing in the initial labor supply of an agent (dL ).With IM, the difference between the labor supply of a low-ability type and of a mimicker does not only depend on a difference in their wage rate but also on a difference in their earned income (for a given, common, reported income). 14That the earned income of a mimicker exceeds that of a low-ability type tends to raise MWP ec hℓ above MWP ec ℓ .This tendency is further strengthened by the fact that, as we previously remarked, IM tends to lower the value of the difference On the other hand, the RHS of (24) contains a third term which takes a negative sign (under the assumption that u c ″( ) < 0) and which was absent in (22).This term, which depends on the difference between the marginal utility of consumption of a high-ability mimicker and a lowability type, works in the direction of favoring T M e p ( , ) < e ℓ ℓ . To interpret this result, observe that the marginal willingness to pay for education is decreasing in the ratio φ θ e e u ( , ) We must emphasize here that our discussion refers only to what the "tax rules" ( 22) and ( 24) suggest; it does not refer to the final equilibrium "levels" of the marginal tax.The common variables appearing in the tax formulas with and without IM assume different values in the two cases and do not allow a comparison between tax levels.The distinction between "tax rules" and "tax levels" was introduced by Atkinson and Stern (1974) in the context of the analysis of optimal provision of public goods in a first-best versus a second-best world.
14 Taking the derivative of the RHS of ( 5) with respect to θ, and taking into account that both a low-ability agent and a high-ability mimicker report the same amount of income M ℓ , we have that

| Income misreporting and distortion in educational attainment vis-à-vis labor supply
The discussion above centered around distortions in labor supply and educational attainment vis-à-vis consumption.An equally interesting perspective is to look at the distortion in educational attainment vis-à-vis labor supply.The condition for optimal educational attainment versus labor supply at a first-best allocation can be found through dividing laissez-faire condition (3) by laissez-faire condition (2).This results in where the LHS of ( 25) shows the marginal rate of substitution between education and labor supply. 15 In what follows we will rely on condition (25) to evaluate if, and how, an optimal tax policy distorts the educational attainment of an agent vis-à-vis his labor supply.More specifically, the education acquired by type-j agents (with j h = , ℓ) will be downward (resp.: upward) distorted if it is the case that, at the solution to problem 1  , we have that .
Before proceeding further, it is worth noticing that when T M e ( , ) M j j (as defined by the RHS of 9) is greater than zero, e j could be downward distorted even when T M e ( , ) e j j (as defined by the RHS of 10) is smaller than p.In particular, e j will remain downward distorted as long as the inequality T M e T M e p ( , ) > [1 − ( , )] e j j M j j holds. 16Proposition 2 provides our main results regarding the optimal distortion on e j vis-a-vis L j .
Proposition 2. Under an optimal nonlinear tax schedule T M e ( , ): (i) The amount of education acquired by high-ability agents is undistorted vis-à-vis labor supply; (ii) For the low-ability agents, e ℓ is downward (resp.: upward) distorted vis-à-vis labor supply if the following condition holds:  ) . At the most-efficient (e L , )-pair it must be that dU = 0, which is tantamount to require that condition (25) holds.Taking into account that T M e ( , ) e j j represents the implicit marginal (pecuniary) cost of education for an agent of ability j, the condition above states that, for a given ability-type and a given amount of education, the ratio between the net marginal return of labor supply and the marginal (pecuniary) cost of education should be the same as under laissez-faire.Under a linear income tax such a condition would be fulfilled when the pecuniary costs of education are fully deductible from the income tax base (since this implies that education is being subsidized at a rate that is equal to the marginal income-tax-rate).
where, denoting w θ e ( , ) ℓ ℓ by w ℓ and w θ e ( , ) h ℓ by w hℓ , The outcome in (i) is a direct consequence of the fact that there is no mimicking-deterring motive to distort the choices of high-ability agents.According to Proposition 1, at an optimum we have that T M e ( , ) = 0 . Together, these two results trivially imply that , that is, that the educational attainment is undistorted for high-ability agents.
To get an intuition for the result stated in (ii), suppose to start from an initial situation where the (e M B , , )-bundle offered to low-ability agents satisfies the no-distortion condition MRS p = eM ℓ . 17 If it is the case that MRS MRS > (<) eM h eM ℓ ℓ , introducing a small downward (upward) distortion on e ℓ will entail a first-order utility loss on the high-ability mimicker, thereby relaxing the binding selfselection constraint, while having only second-order effects on the utility of low-ability agents and on the government's budget constraint.Based on the expressions for MRS eM ℓ and MRS eM hℓ provided by ( 27)-( 28), a downward distortion is more likely to be desirable when the education elasticity of wage rate is increasing in ability and the marginal effort-cost of acquiring education is decreasing in ability (with the reverse directions for an upward distortion).
To evaluate the specific effects of IM, we again consider case (a) and case (b) separately.

| Case (a):
w θ e θ ( , ) = 0 ∂ ∕∂ In Section 3.2.1 we have seen that, if wage-conditional-on-education and IM-costs are both typeindependent, IM becomes irrelevant.Independently on whether agents can or cannot misreport their income, Equations ( 19)-( 20) apply (although a nonlinear tax on education would suffice for implementation purposes).This means that the inequality The marginal rate of substitution MRS eM ℓ provides a measure of the required variation in M ℓ that would leave unchanged the utility of low-ability agents when e ℓ is marginally increased.The marginal rate of substitution Notice that the sign of the expression on the LHS of ( 30) is unambiguously negative when ϵ ϵ , and v″ > 0).Accordingly, education should be upward distorted for the low-ability agents even when ϵ = ϵ ℓ .This result is reminiscent of a similar one obtained by Bovenberg and Jacobs (2005) and is driven by the assumption that education entails both a resource-and an effort-cost. 18As shown by Maldonado (2008), in a setting without IM and where education entails only a pecuniary cost, e ℓ should be left undistorted when ϵ = ϵ With IM, condition (30) changes to ( ) Comparing the first term on the LHS of (31) with the corresponding term in (30), we can see that the possibility of IM tends to favor distorting downwards e ℓ .Whereas in (30) the sign of the first term only depends on the difference ϵ − ϵ With respect to the second term in ( 30) and ( 31), notice that they are formally identical.However, whereas in (30) we have that L Given that a a > hℓ ℓ , with IM one can no longer be sure that L L > h ℓ ℓ .Notwithstanding this difference, the sign of the second term in (31) remains negative as it was the case for the second term in (30). 20 In Appendix B, we present an illustrative example with v L L ( ) = 2 2 ∕ and ϵ = ϵ ϵ w e h w e w e , ℓ , ℓ , ≡ to elaborate more on the effects of IM.There, we show that as it becomes easier for agents to engage in IM, the case for downward distorting e ℓ is strengthened.18 See, in particular, Equation (42) on p. 2022 of Bovenberg and Jacobs (2005).The result also hinges on the assumption that the effortcost of acquiring education and the effort cost of supplying labor in the market are additively separable in the agents' utility function.
We have thus far neglected the possibility of using a consumption tax as an additional policy instrument.In a setting without IM, a consumption tax would be redundant: with ℓ the introduction of a consumption tax (subsidy) would hurt (benefit) a highability mimicker in the same way as it would hurt (benefit) a true low-ability type. 21With IM, however, a consumption tax ceases to be redundant given the possibility that c c h ℓ ℓ

≠
. 22 For this reason, we will now evaluate how our previous qualitative results are affected if the government supplements an optimal nonlinear tax with a consumption tax.
Denote the consumption tax rate by t so that the consumer price of c is t 1 + . 23Adding t as an additional policy instrument, the government's problem (hereafter, problem 2  ) becomes e M B e M B t , , , , , , Denote the Lagrange multipliers associated with the first, second, and third constraint by δ λ , , and μ.The optimal values of e M B e M B t , , , , , , are determined by the first-order conditions to this problem presented in Appendix C (Equations C1-C7).There, we prove the following proposition.
Proposition 3. In a setting where the government supplements the optimal nonlinear tax T M e ( , ) with a linear consumption tax, the consumption tax rate t is 21 More generally, relaxing our assumption of a single consumption good, we also know from the Atkinson-Stiglitz (1976) theorem that (uniform or nonuniform) commodity taxation would be a redundant instrument, in the absence of IM, as long as labor supply is weakly separable from the vector of consumption goods in the individuals' utility function.

22
In the context of a Mirrleesian optimal tax model this result was first highlighted by Boadway et al. (1994).

23
Given that we have normalized to 1 the producer price of c t , can equivalently be interpreted as a specific or an ad valorem tax rate.Notice also that, since the consumer price of c is t 1 + , it must be that t > −1.The assumption of linear taxation is made for realism.It is justified by the idea that the tax administration has information on anonymous transactions but not on the identity of the consumers.That is, the administration does not observe who bought how much; it only observes the total sales of a commodity.While this is a common approach in the optimal tax literature, it leaves aside the possibility that also consumption taxation is vulnerable to evasion.Within an optimal tax framework, the consequences of commodity tax evasion have been investigated by, among others, Usher (1986), Kaplow (1990), and Cremer and Gahvari (1993).
Proof.See Appendix C.

□
The results follow from the characterization of the optimal value of t that we provide in the proof of Proposition 3. There, we show that ( ) ( ) and we also prove that the denominator of the expression for t is negative, so that ∂ ∕∂ and t > 0. Contrary to the traditional result when there is no IM, optimal general income-education taxation benefits from being supplemented by consumption taxation.
That t = 0 when w θ e θ ( , ) = 0 ∂ ∕∂ and t > 0 when w θ e θ ( , ) > 0 ∂ ∕∂ comes from the effect of t on the self-selection constraint faced by the government in designing the nonlinear tax T M e ( , ).Whereas in the former case, the introduction of t does not relax this self-selection constraint, it does so in the latter case.To see this, consider the following perturbation.Starting from a prereform equilibrium where t = 0, raise t by dt > 0 while at the same time adjust B j (for j h = ℓ, ) by dB c dt = j j .Observe first that, by construction, the reform is welfare-neutral for low-skilled agents and for high-skilled agents (when not behaving as mimickers).
Second, the reform has no effect on the government's budget.On the one hand, the upward adjustment of B j changes tax revenue by at the pre-reform equilibrium where t = 0, given that dB c dt = j j .On the other hand, the increase in t changes revenue by Third, what is left to consider is the effect of the reform on the utility of a high-skilled agent behaving as a mimicker.This effect is given by ) at the prereform equilibrium where t = 0. Thus, if ⇒ so that the reform has a detrimental effect on a mimicker's utility.This eases the self-selection constraint faced by the government in the design of the redistributive tax policy.
The expression for t in Equation ( 32) can thus be interpreted as providing the value of t that strikes an optimal balance between the mimicking-deterring effects of a marginal compensated increase in t (i.e., a marginal increase in t which is accompanied by adjusting upwards B j , for j h = ℓ, , by dB c dt = j j ) and the revenue losses due to the behavioral responses on a j (which shrink the aggregate consumption-tax base).

| Consumption taxation and the properties of T M e ( , )
Having shown that IM creates a role for consumption taxation when w θ e θ ( , ) > 0 ∂ ∕∂ , it will be interesting to evaluate how such a tax affects the features of the accompanying tax T M e  ( , ).For space considerations, we will concentrate only on the subject of optimal marginal income taxes, which is widely studied in the literature.Nevertheless, we will later also discuss the effects of consumption taxation on the distortion of educational attainment vis-à -vis labor supply.Proposition 4 presents our results relating to the optimal marginal income tax rates.
Proposition 4. Assume that w θ e θ ( , ) > 0 ∂ ∕∂ and that a nonlinear tax T M e ( , ) is supplemented by a consumption tax levied at rate t.At an optimum, the implicit marginal tax rates on reported incomes are given by: where To interpret Equation (34), recall that there are no mimicking-deterring reasons to distort the bundle offered to high-ability agents.Therefore, for a given value of t, the bundle (e M B , , h h h ) should be chosen in such a way that it maximizes the revenue collected from highability agents while at the same time allowing them to achieve the utility target V set in problem 2  .This is precisely the message provided by (34).To see this, suppose to start from a supposedly optimal equilibrium where high-ability agents are offered the bundle (e M B , , and consider the effects of a small perturbation that raises with the first term on the RHS of (37) capturing the direct effect of an increase in B h and the second term capturing the behavioral effect of the reform, working through a change in a h .Therefore, denoting by R Δ the total effect of the reform on government's revenue, we have that Given that condition (11) also applies to a setting where the tax function T M e ( , ) is supplemented with a linear consumption tax (see Appendix C), Equation ( 38) can be equivalently restated as If it turns out that R Δ 0 ≠ , we would necessarily have to conclude that the initial (e M B , , bundle had been chosen suboptimally.If R Δ > 0 the suggested reform would allow the government to raise more revenue without prejudice for the utility of low-and high-ability agents, and without violating the self-selection constraint.If R Δ < 0 the same outcome would be achieved by changing the direction of the reform (dM < 0 h ).Thus, Equation (34) tells us that, if the (e M B , , chosen optimally (for given t), a small perturbation of the kind that we have considered would leave unaffected the total revenue collected from high-ability agents.Furthermore, notice that the perturbation that we have analyzed leads to an increase in the consumption-tax revenue collected from high-ability agents.This is because t > 0 at an optimum and since (36) ensures that dc h , as given by (37), is greater than zero.Thus, to get R Δ = 0 in (39), it must be that T M e ( , ) < 0

M h h
: high-ability agents should face a negative marginal tax rate on reported income.This is meant to let them fully internalize the positive fiscal externality (on consumption-tax revenue) stemming from a marginal increase in M h .It is worth noticing, however, that IM dampens the magnitude of the positive fiscal externality that we have discussed.This is due to the fact that, as we show in Appendix C, the term ( ) 37) is negative.Recall that this term captures the behavioral response, through an adjustment in a h , to the reform.Hence, while the envisaged reform has an overall positive effect on the revenue collected from high-ability agents through consumption taxation, the response along the IM-margin dampens this effect.
Turning to T M e ( , ) M ℓ ℓ , the second term on the RHS of ( 35) has the same structure (and sign) as the term appearing on the RHS of (34), and it admits a similar interpretation.It represents the opposite of the increase in the consumption-tax revenue that would be collected from low-ability agents if they were induced to marginally increase their reported income along an indifference curve in the (M B , )-space (for given e ℓ ). 24Given that, from Lemma 2, the first term on the RHS of ( 35) is positive, the sign of T M e ( , ) M ℓ ℓ remains ambiguous.Nevertheless, the takeaway message from ( 35) is that fiscal externality considerations warrant to lower T M e ( , ) M ℓ ℓ below the value that would be optimal based solely on self-selection considerations.

| Consumption taxation and distortion in educational attainment
We now turn attention to how consumption taxation affects the optimal distortions on e ℓ and e h .Recall from Section 3.3 that, for any given type of agent, education is distorted with respect to labor supply when L p − − 0 .The next proposition summarizes the main results about the optimal distortions on e j .
Proposition 5. Assume that w θ e θ ( , ) > 0 ∂ ∕∂ and that a nonlinear tax T M e ( , ) is supplemented by a consumption tax levied at rate t.At an optimum: (i) The amount of education acquired by high-ability agents is distorted upwards.In particular, we have that (ii) For low-ability agents, the distortion on e ℓ has, in general, an ambiguous direction.In particular, we have that ( ) + .| 699 (iii) For both types, da de MRS da dM j h + >0 , = , ℓ.To interpret Equation ( 40) we can rely on an approach that is similar to the one used in Section 4.1.Suppose the government changes the high-ability agents' initial (e M B , , via increasing e h by de h while simultaneously adjusting M h in a manner that keeps their utility unaffected.This can be achieved by setting dM MRS de = h eM h h (while keeping B h at its initial value).Notice that the reform has no impact on the binding self-selection constraint: for highability agents not behaving as mimickers the reform is welfare neutral by construction; for high-ability mimickers the reform is welfare-neutral because the (e M B , , ℓ ℓ ℓ )-bundle did not change (and for the same reason it has no impact on low-ability agents).What the reform does is to induce the high-ability agents to increase their a h as seen from ( 42), which in turn increases their consumption with the RHS of (43) capturing the behavioral effect of the reform, which is due to a change in the optimal amount of IM.Therefore, denoting by R Δ the total effect of the reform on government's revenue, we have ( ) Observe that the proposed reform has a positive impact on the revenue collected from consumption taxation.This is because t > 0 at an optimum, and since dc h , as given by ( 43), is greater than zero. 25Notice also that, if education was undistorted (vis-à-vis labor supply) at the initial (e M B , , h h h )-bundle, the extra cost for education (pde h ), paid for by the government,

25
In Appendix C we show that condition (11) applies also to a setting where the tax function T M e ( , ) is supplemented with a linear consumption tax.Therefore, from (34) we can conclude that σ a 1 − ′( ) > 0 h .Exploiting the results stated by ( 42), and since t > 0, it then follows that dc h , as given by the RHS of ( 43), is positive.would be exactly matched by the increase in revenue (dM h ) collected through the nonlinear tax T M e ( , ).Thus, leaving education undistorted for high-ability agents would not be revenuemaximizing for the government.The result stated in (40) can then be interpreted in Pigouvian terms: an upward distortion on e h is justified as a way to internalize the fiscal externality (on consumption-tax-revenue) associated with the type of reform that we have described.
Consider next Equation ( 41) pertaining to low-ability agents.The first term on its RHS reflects mimicking-deterring considerations.Its sign coincides with that of MRS MRS − eM h eM ℓ ℓ .In the absence of consumption taxation, the sign of this difference is the sole determinant of the direction of the optimal distortion on e ℓ (see part (ii) of Proposition 2): when should be upward distorted.
The effect of consumption taxation manifests itself in the second term on the RHS of (41).This term, which favors an upward distortion on e ℓ , captures fiscal-externality considerations of the same kind that we have previously discussed when analyzing Equation ( 40).It represents the opposite of the increase in the consumption-tax revenue that would be collected from lowability agents if they were induced to marginally increase e ℓ along an indifference curve in the (e M , )-space (for given B ℓ ).Hence, whereas mimicking-deterring considerations warrant a downward distortion on e ℓ , the fiscal-externality considerations exert a countervailing effect.On the other hand, when mimicking-deterring considerations warrant an upward distortion on e ℓ , the distortion is further magnified by the fiscal-externality considerations associated with consumption taxation.

| CONCLUDING REMARKS
Within a Mirrleesian optimal tax framework, we have studied the joint design of nonlinear income-and education taxes in a two-type model where an agent's ability-type affects both the (effort) costs and the (pecuniary) benefits of acquiring education.Following several earlier contributions on this topic, we have assumed that educational choices are publicly observable at the individual level; however, we departed from the previous literature by allowing for the possibility that agents conceal part of their earned income for tax purposes.
Using the riskless approach of Usher (1986) to model IM, we have characterized the properties of an informationally constrained Pareto-efficient tax policy, focusing on the so called "normal" case where the direction of redistribution goes from the high-to the low-ability type.We have shown that, when an agent's productivity depends only indirectly on his type (i.e., it depends on the individual's type only through the education level chosen by the agent), IM does not affect the qualitative properties of an optimal tax policy.Whether or not earned income is perfectly observable at the individual level, all agents face a zero marginal income tax rate and education is downward distorted for low-ability agents (while it is left undistorted for high-ability agents).A simple intuition for this result comes from the observation that, when the individual hidden characteristic only affects a person's cost of attaining a given education level, a nonlinear tax on education is all that the government needs to control.In particular, even if IM were not an issue, conditioning the tax liability also on earned income would serve no purpose once an optimal nonlinear tax on education is in place.
When an agent's productivity depends, at least partly, directly on his type (i.e., when an agent's productivity remains type-dependent even when keeping education fixed), the results are different.IM does not affect the results about the sign of the marginal income tax rate faced by low-ability agents, which should be positive, and high-ability agents, which should be zero.It also leaves unscathed the result that education should be undistorted for high-ability agents.However, compared with a setting where earned income is perfectly observed by the government at the individual level, the case for a downward distortion on the educational choice of low-ability agents is strengthened.
An interesting result that emerges from our study is our finding that IM opens up an avenue for the usefulness of consumption taxation (as long as the productivity of agents depends directly on their types).The reason for it is that with the agents' productivity depending directly on their types, a high-ability mimicker and a low-ability agent differ in the amount of income that they earn, despite the fact that both of them report the same amount to the tax authority.In particular, a high-ability mimicker earns more than a low-ability agent, and therefore can afford a higher consumption.As a consequence, a linear consumption tax imposes a heavier burden on a mimicker than on a lowability type, and it can be used to soften the binding self-selection constraint faced by the government.Obviously, the same logic implies that consumption taxation would also be welfareenhancing in a generalized version of our model with several consumption goods.But in such a generalized version we would also obtain a violation of the celebrated Atkinson-Stiglitz (1976) result.Despite the assumption of separability between labor and consumption in the individuals' utility function, differentiated commodity taxation would be a valuable policy instrument (unless the subutility of consumption is homothetic).
The imposition of a linear consumption tax changes the nature of optimal marginal taxes.Its presence implies that any reform of the nonlinear tax on education and reported income that is welfare-neutral for either agent has behavioral effects for that agent (operating via adjustments in IM and thus labor supply).In turn, these effects have an impact on the revenue collected via consumption-taxation.Given that the optimal consumption tax rate is positive, to internalize these fiscal externalities, we show that it becomes desirable to let high-ability agents face a negative marginal tax on reported income and also distort upwards their educational choice.For low-ability agents, the same fiscal-externality considerations imply a moderating effect on the optimal marginal tax on reported income.They also imply a mitigating effect on the tendency, attributable to the possibility of IM, to warrant a downward distortion on the educational choice of low-ability individuals.
We conclude by pointing out that our analysis has been based on the assumption that education is a unidimensional variable and is publicly observable at the individual level.This is a strong assumption.Education has both quality and quantity dimension.And, realistically, the government can only observe the quantity dimension and tax it.In an attempt to capture the multidimensional nature of human capital investments, in a background working paper (Bastani et al., 2022) we allow for the possibility that they are not fully observable by the government.This particular modification does not change the paper's qualitative results, and especially the fact that IM strengthens the case for downward distorting the education acquired by low-ability agents.Nevertheless, our attempt is only a first step; further investigation of this question is called for.

APPENDIX A
Proof of Proposition 1.The first-order conditions of the government's problem are: a w θ e φ θ e e μπp From the first-order conditions pertaining to M e , h h and B h (Equations A4-A6) we get: which imply, from ( 9)-( 10), that T M e ( , ) = 0 from which one obtains (17) by using (10).□ Proof of Lemma 1. Part (i).For a given e M B ( , , )-bundle, the first-order condition (7) for an optimal choice of a can be rewritten as It is clear from above that if w is independent of θ, the choice of a will also be independent of θ.This implies that a a = h ℓ ℓ .
□ w θ e θ w θ e dθ Moreover, from the second-order condition of the individual optimization problem, we know that the denominator of the term on the RHS of (A10) is positive; that, It then immediately follows from (A10), (A11), and Part (ii).For any given (e M B , , )-bundle, we have ( ) Now, from ( 9) and ( 11), we have v wu . This allows us to rewrite (A14) as It then follows from (A15), (A11), and Proof of Proposition 2. Dividing (A5) by (A4), and multiplying both sides of the resulting equation by

′ ( , ) h h w θ h e h e h h h h h h h h h h M h a h w θ h e h h h
Simplifying and collecting terms gives Dividing (A2) by (A1), and multiplying both sides of the resulting equation by the RHS of (A1) gives a w θ e φ θ e e μπp ( + ) ( , ) Rearranging and collecting terms, one can rewrite the above equation as, M a w θ e ′ + ( , )
Dividing both sides by μπ gives:

An illustrative example
To elaborate more on the effects of IM, suppose that Furthermore, assume that preferences are quasi-linear in consumption, so that u′ = ϰ > 0 and u″ = 0, and that σ a ka ( ) = 2 2 ∕ , where k is a positive constant.The latter assumption allows exploring how a change in the cost of IM, modeled as a variation in k, affects the two terms appearing in (B1).
Start with the difference . With u′ = ϰ we have that, for a given (e M B , , ℓ ℓ ℓ )-bundle, the optimality condition for a is where the sign of the inequality follows from the fact that a a < h ℓ ℓ and the ratio Notice that, with u″ = 0 and u′ = ϰ, an agent of ability j would choose to earn w θ e [ ( , )] ϰ j ℓ 2 when the tax liability is only conditioned on the amount of acquired education and he/she is forced to acquire an amount of education e ℓ .When the tax is conditioned on both education and reported income, agents exploit IM to adjust their labor supply and fill part of the gap between w θ e [ ( , )] ϰ j ℓ 2 and M ℓ .As shown by Equation (B2), how much of this gap is optimally filled depends on k.At the limit, if IM were costless, we have from Equation (B2) that can be re-expressed as ( ) ( ) implying that, as IM becomes easier (k goes down), the difference w I w I ( ) − ( ) ∕ ∕ becomes smaller in absolute value. 28 We can then conclude that, as it becomes easier for agents to engage in IM, the first term in (B1) tends to get larger and the second term tends to become smaller in absolute value.Both of these effects strengthen the case for downward distorting e ℓ .a w θ e φ θ e e μπ p t t σ a da de At the limit, when k approaches zero, the difference w I w I ( ) − ( ) approaches zero given that both terms tend to 1 ϰ ∕ .
( ) ( ) Equations ( C1)-(C7) determine the optimal values of e M B e M B , , , , , , and t.To derive an expression for t, start with multiplying both sides of (C3) by c ℓ .This gives Then multiply both sides of (C6) by c h to get Simplifying and rearranging terms one obtains from which one gets the result stated in Equation ( 32).The properties of the tax schedule  , ).
The associated first-order conditions are: From (C10) to (C11), we can derive the following implicit characterizations for the marginal tax rates Notice also that, combining (C10) and (C12), one gets that Next define the marginal rate of substitution between M and B, denoted by MRS MB , for an agent choosing a given (e M B , , )-bundle.Since for a given (e M B , , )-bundle an agent's indirect utility is given by the agent's (conditional) indirect utility is given by where a* denotes the value for a that solves the problem (C16).By invoking the envelope theorem, we have that . Thus, we can define Based on (C17) we introduce the following notation: We now prove that the denominator of the expression for t in Equation ( 32) is negative so that sign t sign c c ( ) = ( − ) hℓ ℓ .To this end, first observe that, from (C13) and (C15), and the fact that t > −1, we have which proves that the denominator of the expression on the RHS of ( 32) is negative.
To complete the proof, we show below that Lemma 1 and Lemma 2 remain valid in this setting.We do this separately for the two lemmas following the same steps we took there.
Lemma 1 in which w θ = 0 ∂ ∕∂ .First, for a given t and a given e M B ( , , )-bundle, an agent's optimal choice for a satisfies the first-order condition It is clear from above that if w is independent of θ, the choice of a will also be independent of θ.
From Equation (C22), substitute ( ) in above, then rearrange and simplify to get: From the second-order condition of the individual optimization problem, we know that . For any given (e M B , , )-bundle we have, from (C18) to (C19), that ( ) Rearranging and collecting terms one can rewrite the equation above as

M h h h h h h h h h h h h h h h h h h h h h
from which one obtains (34) by applying (C18).
(ii) Derivation of the expression for T M e ( , ) Rearranging and collecting terms one can rewrite the equation above as   where the second-order conditions of the individual optimization problem implies Ω > 0.
Now, unless preferences are quasi-linear in consumption (u″ = 0), Φ appears to be ambiguous in sign.The reason is that σ a ′( ) could in principle be either positive (when the agent faces a positive marginal tax on reported income, and therefore under-reports his income) or negative (when the agent faces a negative marginal tax on reported income, and therefore over-reports his income).Nevertheless, we can safely establish that Φ > 0.
It is clear from the expression for Φ in (C33), and with u c ″( ) < 0, that the only possibility under which Φ < 0 is for σ a ′( ) < 0 to make the second term on the RHS of (C33) positive, but also larger in size than the first term.Suppose this is the case.Then, given that t > 0 at an optimum, the RHS of (34) which is of opposite sign to Φ, will be positive so that T M e ( , ) > 0 . The same argument tells us that if Φ < 0, the second expression on the RHS of ( 35) is positive.Moreover, from Lemma 2, we know that the first term on the RHS of ( 35) is also positive.Consequently, T M e ( , ) > 0 . This cannot happen though.With σ a ′( ) < 0, condition (C15) implies that T M e σ a j h ( , ) = ′( ) < 0, = , ℓ M j j j . A contradiction.Finally, notice that even though Φ > 0, we have that MRS Φ < MB .This is because from (C29) we have that

′( )] . h h h h h h h h h h h h h h h h h h h h h h h
Simplifying terms allows rewriting the above equation as Rearranging terms gives Equation ( 40).

′( ) .
7 can define MRS MB as ( ) these circumstances, the difference between the marginal willingness to pay of a lowability agent and of a mimicker is only related to the difference in their marginal (effort) cost of acquiring education.
26) 15 To interpret condition (25), consider all the (e L , )-pairs that allow producing a given amount of output available for consumption, that is, a given amount wL pe − .A change in e by de must be accompanied by a change in L by pe − constant.Changing e by de and L by
-ability agent and a high-ability mimicker earn the same income), the fact that I 32) implies that t = 0: consumption taxation is a redundant instrument.On the other hand, when w θ e θ zero.Hence, the increase in consumption-tax-revenue is dampened by the behavioral response along the IM-margin.BASTANI ET AL.
(ii) Deriving the expression for (41) Divide (C1) by (C2) and multiply both sides of w θ e v follows from the above equation once one takes into account that, Determining the sign of the expression in (42).Totally differentiate (C22) to get, has the same characterization as in the case without IM.What one can observe, however, is that IM tends to lower the value of the difference MRS MRS − M ℓ ℓ M ℓ ℓ compared to the case without IM.e ℓ ℓ , the interesting questions to ask is if T M e ( , ) e ℓ ℓ exceeds or falls short of p and what the role of IM in this is.
Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jpet.12634 by Uppsala University Karin Boye, Wiley Online Library on [28/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License MRS eM hℓ provides the corresponding amount for a high-ability mimicker.