Plotting probability ellipses can be a useful way to visualize bivariate normal distributions. A probability ellipse represents a contour of constant probability within which a certain percentage of the distribution lies. The width and orientation of probability ellipses can yield information about the correlation between the two data points of interest.

To plot probability ellipses of a bivariate normal distribution, you need to have a vector containing the means of both data sets of interest as well as the covariance matrix for the two data sets. Each probability ellipse is centered around the means of the two data sets and oriented in the direction of the first eigenvector of the covariance matrix. The length of the primary axis of each ellipse is proportional to the value of the percentile of the Chi Squared distribution for the given percentile the ellipse represents.

Below, I’ve coded an example (using Matlab) that is presented in the very informative textbook *Statistical Methods in the Atmospheric Sciences* by Dan Wilks. The example uses one month of daily minimum temperature data from the towns of Ithaca and Canandaigua, New York. Notice that the 50% ellipse in the center of the plot encloses roughly half the data points, indicating that the contours were plotted correctly.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Plotting probability ellipses of the bivariate normal distribution % By: D. Gold % Created: 10/28/16 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This script will plot the multivariate cumulative probability contours % of two data sets that are fit to a multivariate normal distribution %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% clear all; close all; clc; %%% Create Data Sets %%% Ith_MinT = [19, 25, 22, -1, 4, 14, 21, 22, 23, 27, 29, 25, 29, 15, 29, 24, 0, 2, 26, 17, 19, 9, 20, -6, -13, -13, -11, -4, -4, 11, 23]; Can_MinT = [28, 28, 26, 19, 16, 24, 26, 24, 24, 29, 29, 27, 31, 26, 38, 23, 13, 14, 28, 19, 19, 17, 22, 2, 4, 5, 7, 8, 14, 14, 23]; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Calculate moments of the data sets %%% % Observed Ith_mean = mean(Ith_MinT); % T mean Can_mean= mean(Can_MinT); % Td mean CV = cov(Ith_MinT,Can_MinT); % covariance of T and Td [Evec, Eval]=eig(CV); % Eigen values and vectors of covariance matrix %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Plot observed multivariate contours %% % Observed data xCenter = Ith_mean; % ellipses centered at sample averages yCenter = Can_mean; theta = 0 : 0.01 : 2*pi; % angles used for plotting ellipses % compute angle for rotation of ellipse % rotation angle will be angle between x axis and first eigenvector x_vec= [1 ; 0]; % vector along x-axis cosrotation =dot(x_vec,Evec(:,1))/(norm(x_vec)*norm(Evec(:,1))); rotation =pi/2-acos(cosrotation); % rotation angle R = [sin(rotation) cos(rotation); ... -cos(rotation) sin(rotation)]; % create a rotation matrix % create chi squared vector chisq = [1.368 4.605 3.2188 5.991]; % percentiles of chi^2 dist df=2 % size ellipses for each quantile for i = 1:length(chisq) % calculate the radius of the ellipse xRadius(i)=(chisq(i)*Eval(1,1))^.5; % primary yRadius(i)=(chisq(i)*Eval(2,2))^.5; % secondary % lines for plotting ellipse x{i} = xRadius(i)* cos(theta); y{i} = yRadius(i) * sin(theta); % rotate ellipse rotated_Coords{i} = R*[x{i} ; y{i}]; % center ellipse x_plot{i}=rotated_Coords{i}(1,:)'+xCenter; y_plot{i}=rotated_Coords{i}(2,:)'+yCenter; end % Set up plot figure xlabel('Ithaca Minimum Temperature (F)') ylabel('Canandagua Minimum Temperature (F)(F)') hold on % Plot data points plot(Ith_MinT,Can_MinT,'o'); % Plot contours for j = 1:length(chisq) plot(x_plot{j},y_plot{j}) end legend('Data points','50%', '80%', '90%', '99%')

Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog