March 11, 2026
In this tutorial, we are going to
Note
Disclaimer: the recap on OLS is inspired by Sciences Po’s Introduction to Econometrics with R. Interested students can check the full material here.
It seems that there is a linear relationship between these two variables
There are a priori many ways to have a better “fit” (smaller distance between observed and predicted distances)
ggplot(cars, aes(x = speed, y = dist)) +
geom_point(color = "black") +
geom_abline(intercept = 5, slope = 2.5, color = "blue") +
geom_segment(aes(x = speed,
y = dist,
xend = speed,
yend = 5 + 2.5 * speed),
arrow = arrow(length = unit(0.1, "inches")),
color = "red") +
labs(x ="Speed", y = "Stopping distance", title = "Car speed vs. stopping distance")+
theme_minimal() Minimizes the sum of squared distances (\(\sim\) error), thus the name “ordinary least square”
An affine function is defined as \(y = \beta_0 + \beta_1 x\) which in matrix form gives \(Y = X\beta\).
Dimension of \(X\)?
The error for each observation is \(\epsilon_i = y_i - \beta_0 - \beta_1 x_i\)
Hence, we look for \(\hat{\beta}\) to minimize: \[\min_\beta (Y - X\beta)'(Y - X\beta) = \min_\beta \epsilon'\epsilon \]
The OLS estimator is then: \[ \hat{\beta} = (X'X)^{-1}X'y \]
Question: Derive \(\hat{\beta}\) from the problem’s first order condition
Under the following assumptions:
The OLS estimator is BLUE: best linear unbiased estimator
Tip
Only Assumptions 1-4 are needed for unbiasedness
Francis Anscombe created four datasets with identical linear statistics: mean, variance, correlation and regression line are identical. Their data generating processes are however very different
Assumption 4 might not hold for various reasons:
In all these cases, we might have \(E(X|\epsilon) \neq 0\) and a biased estimator \(E(\hat{\beta}) \neq \beta\).
Consider the true model: \(Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon\), \(E[\epsilon | X_1, X_2] = 0\). Suppose \(X_2\) is omitted from the regression. The estimated model is:
\[ Y = \alpha_0 + \alpha_1 X_1 + u \]
where the new error term \(u\) is: \(u = \beta_2 X_2 + \epsilon\). Since \(X_2\) is omitted, we express it in terms of \(X_1\) using the linear projection:
\[ X_2 = \gamma_0 + \gamma_1 X_1 + v \]
where \(v\) is the residual such that \(E[v | X_1] = 0\). Substituting into the true model:
\[ u = \beta_2 \gamma_0 + \beta_2 \gamma_1 X_1 + \beta_2 v + \epsilon \]
Since \(u\) is correlated with \(X_1\), the OLS estimator for \(\alpha_1\) is defined by:
\[ \hat{\alpha}_1 = \frac{Cov(Y, X_1)}{Var(X_1)} \]
Substituting \(Y\) from the true model:
\[ \hat{\alpha}_1 = \frac{Cov(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon, X_1)}{Var(X_1)} \]
Expanding covariance terms:
\[ \hat{\alpha}_1 = \beta_1 + \beta_2 \frac{Cov(X_2, X_1)}{Var(X_1)} \]
Using the projection equation:
\[ \hat{\alpha}_1 = \beta_1 + \beta_2 \gamma_1 \]
Since \(\gamma_1 \neq 0\) if \(X_1\) and \(X_2\) are correlated, and \(\beta_2 \neq 0\) if \(X_2\) is relevant, it follows that:
\[ E[\hat{\alpha}_1] \neq \beta_1 \]
The main R function for OLS regression is lm. For linear model, the syntax is lm(y ~ x). It yields
Call:
lm(formula = cars$dist ~ cars$speed)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
cars$speed 3.9324 0.4155 9.464 1.49e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Assuming Gauss-Markov assumptions hold, the OLS estimator has the following Normal distribution:
\[ \frac{\hat{\beta}-\beta}{\sqrt{\sigma^2 (X' X)^{-1}}} \tilde{} N\left(0, 1\right) \]
\[ \hat{t} = \frac{\hat{\beta}-\beta}{se(\hat{\beta})} \tilde{} t(N−k) \]
\[ p_{value}(\hat{t}) = p((T < -|\hat{t}|)\cup(T > |\hat{t}|))) = 2p(T>|\hat{t}|) < 0.05 \Leftrightarrow 2p(T>|\hat{t}|) < 0.05 \]
\[ \Leftrightarrow \hat{t} = \left|\frac{\hat{\beta_k}-0}{se(\hat{\beta_k})}\right| > F^{-1}(1- 0.05/2) \approx 1.96 \]
The sample variance of \(y\) (SST - sum of squares total) can be decomposed into the explained (SSR - sum of squares regression) and residual variances (SSE - sum of squares errors):
\[ \frac{1}{n-1}\sum\limits_{i=1}^n (y_i-\overline{y})^2 = \frac{1}{n-1}\sum\limits_{i=1}^n (\hat{y}_i-\overline{y})^2 + \frac{1}{n-1}\sum\limits_{i=1}^n (\hat{u}_i)^2 \]
The R-squared is the share of the variance explained by the model: \[ R^2 = \frac{SSR}{SST} = 1- \frac{SSE }{ SST} \]
Tip
Proof comes from expanding
\[ (y_i- \overline{y} )^2 = ((y_i- \hat{y}_i)+(\hat{y}_i - \overline{y}))^2 \]
Noticing \(y_i-\hat{y}_i= \hat{u}_i\), And \(\hat{y}_i -\overline{y} = \hat{\beta}_1(x_i - \overline{x})\).
Recall that the Solow model assumes the following GDP production function: \[ Y(t) = K(t)^{\alpha} (L(t)A(t))^{1-\alpha}, 0<\alpha<1 \]
In continuous time, exogenous and constant population and technology growth can be modelled by:
\[ L(t) = L(0)e^{nt} \]
\[ A(t) = A(0)e^{gt} \]
Mankiw et al. (1994) asks: is the Solow model compatible with the data?
Recall from the class that output per effective unit reaches a steady state:
\[ \frac{Y_t}{A_tL_t} = \left(\frac{s}{n+g+\delta}\right)^{\frac{\alpha}{1-\alpha}} \]
Main issue for empirical estimation: the model is not linear. However, taking the logs yields:
\[ \ln\left(\frac{Y_t}{L_t}\right) = \ln(A_t) + \frac{\alpha}{1-\alpha} \ln(s) - \frac{\alpha}{1-\alpha} \ln(n+g+\delta) \]
The authors write as \(\ln(A_t) = \ln(A(0))+ gt = a + gt + \epsilon\), where \(\epsilon\) is a “country-specific shock”, which allows for OLS estimation:
\[ \ln\left(\frac{Y_t}{L_t}\right) = \beta_0 + \beta_1 \ln(s) + \beta_2 \ln(n+g+\delta) +\epsilon \]
Tip
Strong assumptions for estimation: \(n\) and \(s\) independent of \(\epsilon\). \(g+\delta =0.05\) whatever the country. Discuss…
Mankiw et al. focuses on an omitted variable of the baseline Solow model: human capital (half of total capital in the US in 1976). The authors assume an alternative production function:
\[ Y(t) = K(t)^{\alpha}H(t)^\beta (L(t)A(t))^{1-\alpha-\beta}, 0<\alpha + \beta<1 \]
Human capital accumulates in the same way (same cost of one unit of consumption) at saving rate \(s_h\) and depreciates at same rate \(\delta\). That implies at the steady state per effective unit:
\[ k^* = \left(\frac{s_k^{1-\beta} s_h^\beta}{n+g+\delta}\right)^{\frac{1}{1-\alpha - \beta}} \]
\[ h^* = \left(\frac{s_k^\alpha s_h^{1-\alpha}}{n+g+\delta}\right)^{\frac{1}{1-\alpha - \beta}} \]
And after log linearizing output per effective unit (show it!):
\[ \ln\left(\frac{Y_t}{L_t}\right) = \ln(A_0)+gt + \frac{\alpha}{1-\alpha-\beta} \ln(s_k) - \frac{\alpha+\beta}{1-\alpha-\beta} \ln(n+g+\delta) +\frac{\beta}{1-\alpha-\beta} \ln(s_h) \]