Question 1
"The disease known as malaria, an affliction of mankind through recorded history, persists in tropical regions up to the present day. These same tropical areas have, generally speaking, a much lower level of economic development than that enjoyed in the temperate climates. These facts lead us to a natural question: does malaria hold back economic progress? Unfortunately, simple correlations between tropical disease and productivity cannot answer this question. Malaria might depress productivity, but the failure to eradicate malaria might equally well be a symptom of underdevelopment, itself caused by poor institutions or bad luck" (Bleakley, 2010 ,
p. 1).
Episodes of rapid progress against malaria were made possible in the U.S. and in Latin America by critical innovations in health technology: the discovery of DDT, a powerful pesticide that when sprayed on the walls of houses, helped kill the mosquitos involved in the transmission of malaria. These efforts to eradicate malaria led to substantial declines in malaria infection rates in a relatively short period of time.
Children may be particularly vulnerable to malaria exposure, as most of a person's human-capital and physiological development happens in childhood. However, little is known about the effects of malaria on children or whether there are any potential persistent effects later in life.
You have been provided a new (and very unique) dataset to try to answer this question with a variety of empirical methods. The data you have include whether an individual was ever infected by malaria, as well as individual's year and place of birth, sex, age, wages, and mother's education for a very large and nationally representative set of individuals aged $35-45$. You are asked to use this very rich dataset to estimate the effect of childhood exposure to malaria on labor market productivity.
In sum, we want to check if individuals who had malaria in their childhood have lower wages due to the hypothetical long-term consequences of the disease.
Part 1 (25 Points)
Describe an empirical strategy that uses the innovations in health technology to provide exogenous variation in the risk of malaria infection, to estimate the effect of malaria in childhood on future wages. [Hint, you create a variable Eradication campaign exposure $_{i}$ for each individual that is a dummy that takes the value of one when the person $i$ was exposed to malaria eradication efforts during childhood and zero otherwise. How can you use this variable below?]
Your answer should start along the lines of:
To measure the impact of malaria infection on future wages, I estimate the following two-stage model:
followed by a description of the variables in each equation. Then describe the identification assumptions needed to attribute a causal interpretation to your key estimate of interest. Finally, discuss any concerns with the identification assumptions in this setting.
Notice the subtle hints in this question: it is suggesting that the Malaria infection variable is endogenous and is proposing the use of a two-stage model. In other words, the question is implicitly leading one to believe the use of Instrumental Variables is a possible solution.
Let’s analyse the question first then we proceed to answering it. If we were to use regressions in our case, we could use something along the following lines: 𝐹𝑢𝑡𝑢𝑟𝑒 𝑤𝑎𝑔𝑒𝑖 = 𝜃0 + 𝜃1𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖 + 𝛾𝑿𝒊 + 𝑢𝑖 Where X includes information on the individual’s age, place of birth, gender and mother’s education. The issue is that the variable Malaria Infection is endogenous (as suggested by the question). In other words, it is correlated with the error term, most likely because a factor(s) influencing an individual’s wage is also affecting their chance of having had malaria in their childhood (i.e., confounding effects).
So, 𝐸 𝑢 𝑿, 𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛 ≠ 0. If we do not solve this problem, our estimates (thetas and gammas) will all be biased and inconsistent. Therefore, we won’t be able to establish a causal relationship between malaria and wages. There are a few confounding effects one could consider. For instance, access to basic education, sanitation and health could be confounding factors. One way of solving the question is to use IVs, and the question already proposed the instrument: Eradication campaign exposure. Using the 2SLS technique, we may be able to remove the endogeneity of Malaria Infection and obtain consistent estimates of 𝜃1 (the parameter of interest).
Q1 answer
So, the first stage equation (also called the reduced form equation) would be: 𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖 = 𝛼0 + 𝛼1𝐸𝑟𝑎𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛿𝑿𝒊 + 𝜖𝑖 For this regression to be interesting to us, we need to satisfy two conditions: 𝐶𝑜𝑣(𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖 , 𝐸𝑟𝑎𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑖 ) ≠ 0 𝐶𝑜𝑣 𝑢𝑖,𝐸𝑟𝑟𝑎𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑖 = 0 The second one is called exogeneity/validity of the instrumental variable and is essential for the IV to work well. The first stage of the 2SLS is the step responsible for breaking up the suspected endogenous variable (𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖) into two parts: one completely exogenous (𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛 ) and another containing the suspected endogeneity (𝜖ෝ). The 𝑖𝑖 completely exogenous part is the one we want.
The second stage of the 2SLS is identical to the structural equation but we use the exogenous part of the endogenous variable instead: 𝐹𝑢𝑡𝑢𝑟𝑒 𝑤𝑎𝑔𝑒 = 𝛽 + 𝛽 𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛 + 𝜓𝑿 + 𝑢 𝑖01 𝑖𝒊𝑖

The assumptions we need to satisfy are: 1) Independence Assumption/Validity/Exogeneity 𝐶𝑜𝑣 𝑢𝑖,𝐸𝑟𝑎𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑖 = 0 The IV cannot be correlated with any variable(s) in the error term. One way to guarantee this assumption holds is the random assignment of the eradication policy. 2) Strong Instrument/Relevance 𝐶𝑜𝑣(𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖 , 𝐸𝑟𝑎𝑑𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑖 ) ≠ 0 The IV has to be correlated with the suspected endogenous variable. Plus, this correlation should be large (the larger the better), i.e. close to -1 or +1. 3) Exclusion Restriction The IV cannot directly affect the dependent variable of the structural equation. If it does, it should be an explanatory variable, not an IV. The correlation between the IV and dependent variable will exist because of the endogenous variable.
Are these assumptions likely to hold? 1) If the spraying of DDT on the walls of houses was randomly applied then we could expect no correlation with the unobserved effects inside the error term (u). But someone might hypothesise that this substance was applied in areas with high incidence of malaria. Would that cause a problem? Probably not because we have controlled for place of birth. It is possible, however, that there are still some correlation because a person’s place of birth might not be the same as the place where they spent their childhood. It is also possible that the spraying of DDT was done in areas with poor basic infrastructure/education/health conditions (the variable which caused the endogeneity in the first place). If this is the case, then our proposed IV will be correlated with the error term, potentially compromising our identification strategy. We cannot directly test this assumption. Economic intuition is usually the way we assess the feasibility of this assumption. 2) It’s reasonable to expect that there is a negative association between the dummy of malaria infection and the spraying of DDT. If the individual had had this protection, their chances of getting malaria would have greatly diminished, so it’s likely this second condition is satisfied. This assumption can be easily verified by a regression of the IV on the suspected endogenous variable (make sure to also include all the exogenous regressors in this regression). 3) It is reasonable to assume that spraying of DDT on a person’s house will not affect their wages directly.
Question 2
Part 2 (25 Points)
Figure 1 shows the differential exposure to eradication campaign efforts by child's age. Using this temporal variation, now describe an alternative strategy where you use specific year of birth information to identify the effect of malaria eradication campaigns on future wages, comparing cohorts that were born after the campaign with those who were adults when the campaign started.
Note that you are estimating an intent-to-treat estimate of the effect of complete childhood exposure to eradication campaigns on the outcome, not the effect of malaria infection on the outcome.
Your answer should start along the lines of:
To measure the impact of malaria eradication campaigns in childhood, from age 0 to age 18, on the wages of individuals 35 to $45, I$ estimate the following model:
followed by a description of the variables in the equation. Then describe the identification assumptions needed to attribute a causal interpretation to your key estimate of interest and ways in which you could assess these identification assumptions. Finally, discuss any concerns with the identification assumptions in this setting and how you would interpret the results. Discuss what would be the interpretation of the coefficient of interest if you use the same model but only among those who were partially exposed to the campaign (i.e., from age 0 to age 18$)$.
Notice the subtle hints in this question: there is a clear graph showing a (potential) discontinuity, so Regression Discontinuity Designs (RDDs) will likely be the answer to this question. Notice that there are a lot of smaller questions too. Make sure to answer them all!
Let’s (again) analyse the question first then proceed to answering it. The “standard” starting point would be a basic regression: 𝐹𝑢𝑡𝑢𝑟𝑒 𝑤𝑎𝑔𝑒𝑖 = 𝜃0 + 𝜃1𝑀𝑎𝑙𝑎𝑟𝑖𝑎 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖 + 𝛾𝑿𝒊 + 𝑢𝑖 We have discussed in question 1 how this would not be an ideal strategy as it’s riddled with biases. We need a way of identifying the causal effect of interest (𝜃1). We can utilise RDDs. First, we need to identify a discontinuity in the selection of treatment (i.e., the children who were exposed to the spraying to DDT). According to the graph in the question, the only factor determining if a child would be exposed was their age (notice that there was a period of 18 years separating those who got treated from those who didn’t).
Recall that c is the cut-off point. In our case, this point is 18 (years). We will evaluate the LATE for children (people between 0-18 years of age) in relation to adults, when the treatment was applied. Looking at the problem in this way will transform the situation into a standard problem of RDD. The running variable is year of birth relative to the start of the treatment (I erroneously indicated it as “age”).
So, a possible regression would therefore be: 𝐹𝑢𝑡𝑢𝑟𝑒𝑤𝑎𝑔𝑒 =𝜃 +𝜃 𝑇 +𝑓(𝑐)+𝛾𝑿 +𝑢 𝑖01𝑖 𝒊𝑖 Where the dummy variable T is equal to unity if the individual is a child (here defined as anyone between the ages of 0 to 18 during the spraying of the DDT) and zero otherwise. The term f(c) refers to the potential nonlinear function around the cut-off point, such as:
What assumptions do we need to guarantee for the RDDs identification strategy to work? 1) We need to be sure that the reason why we see a “jump” in Future Wages is solely because of the running variable, nothing else. One way to check this (rigorously speaking, one way to be less uncertain), is to check whether the control variables (X) do not “jump” around the cut-off point. This could be easily determined by examining graphs, proposing basic regression analyses and/or using economic intuition. Generally, we need to check the graphs/figures we talked about in Tutorial 6. 2) Unconfoundedness We need to make sure the assignment of treatment happens solely because of the running variable, nothing else. And preferably we are referring to compliers only. In our case, this is likely to hold because people cannot choose their age, so they couldn’t have been influenced by the spraying of DDT because they cannot time-travel (presumably). 3) Continuity in Treatment Assignment If an individual were an adult then they did not have their walls sprayed with DDT. If they were children, then they had their houses sprayed for sure. This may be problematic because it’s hard to guarantee the individual had the treatment even if they were eligible to: in regions with poor infrastructure, it’s not uncommon for policies not to be implemented due to corruption/delays/etc.
What concerns could one have with this identification strategy?
- Any issues that could threaten the validity of the RDD would suffice. For instance, is it reasonable to “bundle up” individuals aged 0-18 together and call them children? We are implicitly assuming they have been affected equally by the policy but according to the initial diagram, individuals aged closer to 0 were much more likely to be exposed to treatment than individuals closer to 18 years of age, at the time of the spraying. - We might choose the incorrect functional form of f(c). This would lead us to believe there was a jump (or the jump was bigger/smaller than the actual jump) around the cut-off point when it was actually just caused by the nonlinear relationship between the dependent variable and running variable. - Any violation to the assumptions we discussed in the previous slide...
Question 3
Part 3 (25 Points)
Finally, you think it may help to exploit the fact that different regions received the eradication campaign at different points in time, to help identify the effect of malaria on future wages.
For instance, Figure 2 shows that the timing in the adoption of malaria eradication efforts varied widely across U.S. states. Describe a difference-in-difference strategy that exploits both the timing of eradication campaigns as well as the regional variation in adoption of eradication campaigns across states.
Your answer should start along the lines of:
To measure the impact of malaria eradication campaigns on future wages, $I$ estimate the following difference-in-differences model:
\text { Future wages }_{i s}=\beta_{0}+
followed by a description of the variables in the equation. Then describe the identification assumptions needed to attribute a causal interpretation to your key estimate of interest and ways in which you could assess these identification assumptions. Finally, discuss any concerns with the identification assumptions in this setting and how you would interpret the results. What would be the role of including state of birth fixed effects in the regression?
Part 4 (25 points)
Which of the above empirical strategies do you find the most credible and why? What type of effects would you expect of malaria erradication efforts on other dimensions of human capital such as cognitive ability or mental health? What about the potential mechanisms through which malaria eradication could affect future outcomes? (e.g., household income, school enrollment, child mortality).
Comments