Case Study: Effect of a Garbage Incinerator’s Location on Housing Prices

In this case study, we will be using the Differences in Differences (DiD) method to analyze the effect of a garbage incinerator’s location on housing prices. This method is a statistical technique used in econometrics that calculates the effect of a treatment (in this case, the placement of a garbage incinerator) on an outcome (here, housing prices) by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group. You can run and extend the analysis of this case study using the Posit cloud.

Code

library(wooldridge) # To get the data
library(tidyverse)  # For modern data science analysis
library(skimr)      # For descriptive statistics 
library(stargazer)  # For professional regression tables

1 Introduction

In their comprehensive study, Kiel and McClain (1995) delved into the effects that a newly constructed garbage incinerator had on the values of residential properties in the town of North Andover, located in Massachusetts. Their research was extensive and involved the use of data spanning several years. Additionally, they employed a comprehensive econometric analysis to interpret the data.

In our case study, we aim to conduct a similar analysis, albeit with a few modifications. Instead of using data from multiple years, we will limit our scope to two specific years. Furthermore, we will simplify our approach by using less complex models for our analysis. Despite these changes, the core objective of our study aligns with that of Kiel and McClain’s research.

The timeline of the incinerator’s construction plays a crucial role in our study. Post-1978, rumors began to circulate about the potential construction of a new incinerator in North Andover. These rumors materialized into reality in 1981 when the construction of the incinerator commenced. Initially, it was expected that the incinerator would become operational shortly after the beginning of its construction. However, due to unforeseen circumstances, the incinerator only started operating in 1985.

For our analysis, we will be using data on the prices of houses sold in two distinct years: 1978 and 1981. The year 1978 represents the period before the rumors of the incinerator began, while 1981 represents the year when the construction of the incinerator started.

The central hypothesis that we aim to test is that the prices of houses located in close proximity to the incinerator would experience a relative drop compared to the prices of houses situated further away. This hypothesis is based on the assumption that the presence of a garbage incinerator in the vicinity could potentially devalue the surrounding properties due to the associated environmental and health concerns. Through our study, we aim to provide evidence in favor this hypothesis and quantify the impact of the incinerator’s location on housing prices.

2 Data Collection

To analyze the effect of the incinerator’s location on housing prices, we need data on housing prices in the neighborhood where the incinerator was proposed (treatment group) and in a comparable neighborhood where no incinerator was proposed (control group). We collect data on housing prices for several years before and after the incinerator was proposed.

Code

data(kielmc, package='wooldridge')

Code

help("kielmc", package = "wooldridge")

A data.frame with 321 observations on 25 variables:

year: 1978 or 1981
age: age of house
agesq: age^2
nbh: neighborhood, 1-6
cbd: dist. to cent. bus. dstrct, ft.
intst: dist. to interstate, ft.
lintst: log(intst)
price: selling price
rooms: # rooms in house
area: square footage of house
land: square footage lot
baths: # bathrooms
dist: dist. from house to incin., ft.
ldist: log(dist)
wind: prc. time wind incin. to house
lprice: log(price)
y81: =1 if year == 1981
larea: log(area)
lland: log(land)
y81ldist: y81*ldist
lintstsq: lintst^2
nearinc: =1 if dist <= 15840
y81nrinc: y81*nearinc
rprice: price, 1978 dollars
lrprice: log(rprice)

3 Exploratory Data Analysis

Code

skim(kielmc[c('rprice', 'lrprice', 'nearinc', 'year')])

Data summary
Name	kielmc[c(“rprice”, “lrpri…
Number of rows	321
Number of columns	4
_______________________
Column type frequency:
numeric	4
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
rprice	1	83721.36	33118.79	26000.00	59000.00	82000.00	100230.41	300000.00	▇▇▁▁▁
lrprice	1	11.26	0.39	10.17	10.99	11.31	11.52	12.61	▁▆▇▃▁
nearinc	1	0.30	0.46	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▃
year	1	1979.33	1.49	1978.00	1978.00	1978.00	1981.00	1981.00	▇▁▁▁▆

4 Basic Regression Analysis

An inexperienced analyst might only utilize the data from 1981 and estimate a rather simplistic model:

\[ \text {rprice} = \beta_{0} + \beta_{1} \text{nearinc} + e \hspace{1cm} (1) \]

In this equation, ‘nearinc’ is a dummy variable that equals one if the house is located near the incinerator, and zero if not. In this simple regression analysis with a single dummy variable, the intercept represents the average selling price for homes not located near the incinerator. The coefficient on ‘nearinc’ signifies the difference in the average selling price between homes near the incinerator and those that are not.

Code

# Fit regression models:
model1981 <- lm(rprice ~ nearinc, data=kielmc, subset=(year==1981))
model1978 <- lm(rprice ~ nearinc, data=kielmc, subset=(year==1978))
# Get professional regression table:
stargazer(model1981, model1978, type="text")


===================================================================
                                  Dependent variable:              
                    -----------------------------------------------
                                        rprice                     
                              (1)                     (2)          
-------------------------------------------------------------------
nearinc                 -30,688.270***          -18,824.370***     
                          (5,827.709)             (4,744.594)      
                                                                   
Constant                101,307.500***           82,517.230***     
                          (3,093.027)             (2,653.790)      
                                                                   
-------------------------------------------------------------------
Observations                  142                     179          
R2                           0.165                   0.082         
Adjusted R2                  0.159                   0.076         
Residual Std. Error  31,238.040 (df = 140)   29,431.960 (df = 177) 
F Statistic         27.730*** (df = 1; 140) 15.741*** (df = 1; 177)
===================================================================
Note:                                   *p<0.1; **p<0.05; ***p<0.01

The regression results of column (1) indicate that homes closer to the incinerator were sold at a lower average price compared to those further away in 1981. The slope coefficient is highly statistically significant, allowing us to reject the hypothesis that the average home values near and far from the incinerator are identical.

However, Equation 1 does not necessarily suggest that the placement of the incinerator is the cause of the lower housing values. Interestingly, if we conduct the same regression for 1978 (prior to any mention of the incinerator), the results of column (2) align with those in column (1). That is, the slope coefficient is also negative. This means that even before the incinerator was a consideration, the average value of a home near the proposed site was already $18,824.37 less than the average value of a home not near the site ($82,517.23). This difference is statistically significant. Thus, the incinerator was constructed in an area where housing values were already lower.

5 Differences in Differences Analysis

So, how do we determine if the construction of a new incinerator has a negative impact on housing values? The answer lies in observing the change in the coefficient on ‘nearinc’ between 1978 and 1981. The difference in average housing value was significantly larger in 1981 than in 1978 ($30,688.27 versus $18,824.37), even when considered as a percentage of the average value of homes not near the incinerator site. The difference between the two coefficients on ‘nearinc’ is

\[ \hat{\delta}_{1}=-30,688.27-(-18,824.37)=-11,863.9 . \]

This number is our estimate of the impact of the incinerator on the values of homes in its vicinity. In the field of empirical economics, $\hat{\delta}_{1}$ is often referred to as the difference-in-differences (DD or DID) estimator because it can be expressed as

\[ \hat{\delta}_{1}=\left(\overline{\text { rprice }}_{81, n r}-\overline{\text { rprice }}_{81, f r}\right)-\left(\overline{\text { rprice }}_{78, n r}-\overline{\text { rprice }}_{78, f r}\right), \hspace{1cm} (2) \]

where ‘nr’ denotes “near the incinerator site” and ‘fr’ denotes “farther away from the site.” In other words, $\hat{\delta}_{1}$ is the difference over time in the average difference of housing prices in the two locations.

To test whether $\hat{\delta}_{1}$ is statistically different from zero, we need to calculate its standard error using a regression analysis. Indeed, $\hat{\delta}_{1}$ can be obtained by estimating

\[ \text { rprice }=\beta_{0}+\delta_{0} y 81+\beta_{1} \text { nearinc }+\delta_{1} y 81 \cdot \text { nearinc }+u, \hspace{1cm} (3) \]

Using the data collected over both years:

The parameter $\beta_{0}$ represents the average price of a house NOT near the incinerator in 1978.
The parameter $\delta_{0}$ accounts for changes over time. Specifically, it accounts for time changes in ALL house prices in North Andover from 1978 to 1981.
The parameter $\beta_{1}$ accounts for differences between groups. Specifically, it accounts for the differences between the groups that are NOT due to the presence of the incinerator.
The parameter $\delta_{1}$ represents the decrease in housing values due to the new incinerator. Specifically, it represents the difference in the average price of a house near the incinerator in 1981.

The estimates of Equation (3) are presented in column (1) of table below.

Code

did  <- lm(rprice ~ y81 + nearinc + y81*nearinc, data=kielmc)
stargazer(did, type="text")


===============================================
                        Dependent variable:    
                    ---------------------------
                              rprice           
-----------------------------------------------
y81                        18,790.290***       
                            (4,050.065)        
                                               
nearinc                   -18,824.370***       
                            (4,875.322)        
                                               
y81:nearinc                 -11,863.900        
                            (7,456.646)        
                                               
Constant                   82,517.230***       
                            (2,726.910)        
                                               
-----------------------------------------------
Observations                    321            
R2                             0.174           
Adjusted R2                    0.166           
Residual Std. Error    30,242.900 (df = 317)   
F Statistic           22.251*** (df = 3; 317)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

The previous results can be improved. A logarithmic speciﬁcation is more plausible since it implies a constant percentage effect on the house values (See column (1) of the table below). We can also add control variables. Kiel and McClain (1995) incorporated incorporated control variables for two compelling reasons. Firstly, the types of homes sold near the incinerator in 1981 might have been systematically different from those sold in the same area in 1978; if this is the case, it’s crucial to control for such characteristics. Secondly, even if the relevant house characteristics remained unchanged, including them can significantly reduce the error variance, which can subsequently decrease the standard error of $\hat{\delta}_{1}$. In column (2), we control for the age of the houses, using a quadratic. This considerably increases the $R$-squared (by reducing the residual variance). The coefficient on $\delta_{1}$ is now much larger in magnitude, its standard error is lower, and as a result it is statistically significant. Thus, using the logarithmic form and control variables, we estimate that houses near the incinerator depreciated in value by about $13.2 \%$.

Code

did1  <- lm(log(rprice) ~ nearinc + y81 + nearinc*y81, data=kielmc)
did2  <- lm(log(rprice) ~ nearinc + y81 + nearinc*y81 + age+I(age^2)+log(intst)+log(land)+log(area)+rooms+baths, data=kielmc)

stargazer(did1, did2, type="text")


====================================================================
                                  Dependent variable:               
                    ------------------------------------------------
                                      log(rprice)                   
                              (1)                     (2)           
--------------------------------------------------------------------
nearinc                    -0.340***                 0.032          
                            (0.055)                 (0.047)         
                                                                    
y81                        0.193***                 0.162***        
                            (0.045)                 (0.028)         
                                                                    
age                                                -0.008***        
                                                    (0.001)         
                                                                    
I(age2)                                            0.00004***       
                                                   (0.00001)        
                                                                    
log(intst)                                          -0.061*         
                                                    (0.032)         
                                                                    
log(land)                                           0.100***        
                                                    (0.024)         
                                                                    
log(area)                                           0.351***        
                                                    (0.051)         
                                                                    
rooms                                               0.047***        
                                                    (0.017)         
                                                                    
baths                                               0.094***        
                                                    (0.028)         
                                                                    
nearinc:y81                 -0.063                  -0.132**        
                            (0.083)                 (0.052)         
                                                                    
Constant                   11.285***                7.652***        
                            (0.031)                 (0.416)         
                                                                    
--------------------------------------------------------------------
Observations                  321                     321           
R2                           0.246                   0.733          
Adjusted R2                  0.239                   0.724          
Residual Std. Error    0.338 (df = 317)         0.204 (df = 310)    
F Statistic         34.470*** (df = 3; 317) 84.915*** (df = 10; 310)
====================================================================
Note:                                    *p<0.1; **p<0.05; ***p<0.01

To consolidate your understanding, let us revise the following video on the basics of the simple differences in differences estimator.

Source: Nicolai Kuminoff

6 Findings

The DiD estimate was negative, indicating that the incinerator’s location had a negative effect on housing prices in the neighborhood. This finding supports the residents’ concerns about the impact of the incinerator on their property values.

7 Limitations

It’s important to note that the DiD method assumes that, in the absence of the treatment, the average outcomes for the treatment and control groups would have followed the same trend over time. This assumption, known as the parallel trends assumption, cannot be tested directly and is a potential source of bias in DiD estimates.

8 Conclusion

The DiD method provides a powerful tool for causal inference in observational studies. In this case, it allowed us to estimate the causal effect of a garbage incinerator’s location on housing prices, providing valuable evidence in discussions about the siting of potentially harmful facilities.