An Introduction To Econometrics: Part 1- Ordinary Least Squares Regression

I took a PhD level econometrics course this semester in the Applied Economics and Management department here at Cornell and I thought I’d share some of what I learned. Overall, I enjoyed the course and learned a great deal. It was very math and theory heavy, but the Professor Shanjun Li did a nice job keeping the class lively and interesting. I would recommend the class to future EWRS students who may be looking for some grounding in econometrics, provided they’ve taken some basic statics and linear algebra courses.

So lets start with the basics, what does the term “econometrics” even mean? Hansen (2010) defined econometrics as “the unified study of economic models, mathematical statistics and economic data”. After taking this introductory course, I’m inclined to add my own definition: econometrics is “a study of the problems with regression using Ordinary Least Squares (OLS) and how to solve them”. This is obviously a gross oversimplification of the field, however, regression through OLS was the primary tool used for finding insights and patterns within data, and we spent the vast majority of the course examining it. In this post I’ll briefly summarize OLS mechanics and classical OLS assumptions. In my next post, I’ll detail methods for dealing with violations of OLS assumptions. My hope is that reading this may help you understand some key terminology and the reasoning behind why certain econometric tools are employed.

OLS mechanics

Our primary interest when creating an econometric model is to estimate some dependent variable, y, using a observations from a set of independent variables, X. Usually y is a vector of length n, where n is the number of observations, and X is a matrix of size (n x k) where k is the number of explanatory variables (you can think of X as a table of observations, where each column contains a different variable and each row represents an observation of that variable). The goal of OLS regression is to estimate the coefficients, beta, for the model:

y = X\beta+\epsilon

Where beta is a k by 1 vector of coefficients on X and epsilon is a k by 1 vector of error terms.

OLS regression estimates beta by minimizing the sum of the square error term (hence the name “least squares”). Put in matrix notation, OLS estimates beta using the equation:

\hat{\beta} = argmin_{\beta} SSE_N(\beta) = \epsilon ' \epsilon

The optimal beta estimate can be found through the following equations:

\epsilon = y-X\hat{\beta}

\epsilon ' \epsilon =  (y-X\hat{\beta})'(y-X\hat{\beta})

Taking the derivative and setting it equal to zero:

2X'y+2X'X\hat{\beta} = 0

Then solving for the beta estimate:

\hat{\beta} = (X'X)^{-1}X'y


Estimation of y using OLS regression can be visualized as the orthogonal projection of the vector y onto the column space of X. The estimated error term, epsilon, is the orthogonal distance between the projection and the true vector y.  Figure 1 shows this projection for a y that is regressed on two explanatory variables, X1 and X2.


Figure 1: OLS regression as an orthogonal projection of vector y onto the column space of matrix X. The error term, \hat{\epsilon}, is the orthogonal distance between y and X\hat{\beta}. (Image source: Wikipedia commons)

 Assumptions and properties of OLS regression

The Gauss-Markov Theorem states that under a certain set of assumptions, the OLS estimator is the Best Linear Unbiased Estimator (BLUE) for vector y.

To understand the full meaning of the Gauss-Markov theorem, it’s important to define two fundamental properties that can be used to describe estimators, consistency and efficiency. An estimator is consistent if its value will converge to the true parameter value as the number of observations goes to infinity. An estimator is efficient if its asymptotic variance is no larger than the asymptotic variance of any other possible consistent estimator for the parameter. In light of these definitions, the Gauss-Markov Theorem can be restated as: estimators found using OLS will be the most efficient consistent estimator for beta as long as the classical OLS assumptions hold. The remainder of this post will be devoted to describing the necessary assumptions for the OLS estimator to be the BLUE and detailing fixes for when these assumptions are violated.

The four classical assumptions for OLS to be the BLUE are:

  1. Linearity: The relationship between X and y is linear, following the functional form:

y = X\beta+\epsilon.

2. Strict exogeneity: The error \epsilon terms should be independent of the value of the explanatory variables, X. Put in equation form, this assumption requires:

E(\epsilon_i|X) = 0

E(\epsilon_i) =0

3.  No perfect multicollinearity: columns of X should not be correlated with each other (see my earlier post on dealing with mulitcollinearity for fixes for violations of this assumption).

4. Spherical Error: Error terms should be homoskedastic, meaning they are evenly distributed around the X values. Put in equation form:

E(\epsilon_i^2|X) =\sigma^2

Where \sigma^2 is a constant value.

E(\epsilon_i \epsilon_j|X)=0

Using assumption 4, we can define the variance of \hat{\beta} as:

var(\hat{\beta}_{OLS}) = \sigma^2(X'X)^{-1}

If assumptions 1-4 hold, then the OLS estimate for beta is the BLUE, if however, any of the assumptions are broken, we must employ other methods for estimating our regression coefficients.

In my next post I’ll detail the methods econometricians use when these assumptions are violated.


Hansen, Bruce. “Econometrics”. 2010. University of Wisconsin