Plotting Probability Ellipses for Bivariate Normal Distributions

Plotting probability ellipses can be a useful way to visualize bivariate normal distributions. A probability ellipse represents a contour of constant probability within which a certain percentage of the distribution lies. The width and orientation of probability ellipses can yield information about the correlation between the two data points of interest.

To plot probability ellipses of a bivariate normal distribution, you need to have a vector containing the means of both data sets of interest as well as the covariance matrix for the two data sets. Each probability ellipse is centered around the means of the two data sets and oriented in the direction of the first eigenvector of the covariance matrix. The length of the primary axis of each ellipse is proportional to the value of the percentile of the Chi Squared distribution for the given percentile the ellipse represents.

Below, I’ve coded an example (using Matlab) that is presented in the very informative textbook Statistical Methods in the Atmospheric Sciences by Dan Wilks. The example uses one month of daily minimum temperature data from the towns of Ithaca and Canandaigua, New York. Notice that the 50% ellipse in the center of the plot encloses roughly half the data points, indicating that the contours were plotted correctly.

blog_80

Probability Ellipses for Ithaca and Canandaigua Minimum Temperatures

 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Plotting  probability ellipses of the bivariate normal distribution
% By: D. Gold
% Created: 10/28/16
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This script will plot the multivariate cumulative probability contours 
% of two data sets that are fit to a multivariate normal distribution
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%
clear all; close all; clc;

%%% Create Data Sets %%%
Ith_MinT = [19, 25, 22, -1, 4, 14, 21, 22, 23, 27, 29, 25, 29, 15, 29, 24, 0, 2, 26, 17, 19, 9, 20, -6, -13, -13, -11, -4, -4, 11, 23];

Can_MinT = [28, 28, 26, 19, 16, 24, 26, 24, 24, 29, 29, 27, 31, 26, 38, 23, 13, 14, 28, 19, 19, 17, 22, 2, 4, 5, 7, 8, 14, 14, 23];
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Calculate moments of the data sets %%%

% Observed
Ith_mean = mean(Ith_MinT); % T mean
Can_mean= mean(Can_MinT); % Td mean
CV = cov(Ith_MinT,Can_MinT); % covariance of T and Td
[Evec, Eval]=eig(CV); % Eigen values and vectors of covariance matrix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%% Plot observed multivariate contours %%

% Observed data
xCenter = Ith_mean; % ellipses centered at sample averages
yCenter = Can_mean;
theta = 0 : 0.01 : 2*pi; % angles used for plotting ellipses

% compute angle for rotation of ellipse
% rotation angle will be angle between x axis and first eigenvector
x_vec= [1 ; 0]; % vector along x-axis
cosrotation =dot(x_vec,Evec(:,1))/(norm(x_vec)*norm(Evec(:,1))); 
rotation =pi/2-acos(cosrotation); % rotation angle
R  = [sin(rotation) cos(rotation); ...
      -cos(rotation)  sin(rotation)]; % create a rotation matrix

% create chi squared vector
chisq = [1.368 4.605 3.2188  5.991]; % percentiles of chi^2 dist df=2

% size ellipses for each quantile
for i = 1:length(chisq)
    % calculate the radius of the ellipse
    xRadius(i)=(chisq(i)*Eval(1,1))^.5; % primary
    yRadius(i)=(chisq(i)*Eval(2,2))^.5; % secondary
    % lines for plotting ellipse
    x{i} = xRadius(i)* cos(theta);
    y{i} = yRadius(i) * sin(theta);
    % rotate ellipse
    rotated_Coords{i} = R*[x{i} ; y{i}];
    % center ellipse
    x_plot{i}=rotated_Coords{i}(1,:)'+xCenter;
    y_plot{i}=rotated_Coords{i}(2,:)'+yCenter;
end

% Set up plot
figure
xlabel('Ithaca Minimum Temperature (F)')
ylabel('Canandagua Minimum Temperature (F)(F)')
hold on

% Plot data points
plot(Ith_MinT,Can_MinT,'o');

% Plot contours
for j = 1:length(chisq)
    plot(x_plot{j},y_plot{j})
end
legend('Data points','50%', '80%', '90%', '99%')