This post is about power spectral density estimation (PSDE) using Matlab. PSDE is often used in signal processing and fluid mechanics, so you may or may not be familiar with it. The idea is to determine the frequency content of sampled time series data. This means that we can determine the amount of the variance in the time series that is contained at different frequencies (or periodicities). Let’s take a look at a simple example.
Suppose we are sampling a signal (in this case a sign wave) for about 3 minutes with a frequency of 100 hertz, meaning we take 100 measurements every second. Let’s generate our signal and plot our observations using Matlab.
x = (0:0.01:60*pi); y = sin(x); figure(1) plot(x,sin(x)) title('Signal with no Noise') xlabel('Seconds') ylabel('Observed Signal') ylim([-1.5,1.5])
From observing the data directly it’s pretty clear what the frequency and period of the data are here (1/(2*pi) and pi respectively): no need for PSDE. However, real data is rarely so well behaved. To simulate this, let’s add random noise to our original signal. To make it interesting, let’s add a lot of noise: normal distributed noise with standard deviation 10 times larger than the magnitude of the signal!
for i = 1:max(size(x)) y(i) = y(i)+10*norminv(rand); end figure(2) plot(x,y) title('Signal with Noise') xlabel('Seconds') ylabel('Observed Signal')
Now the noise completely washes out the signal, and it is not at all clear what the period or frequency of the underlying data are. So let’s compute and plot the power spectrum of the data. I leave it to the reader to review all of the math underlying computing the spectrum, but I’ll provide a very brief description of what we are doing here. PSDE involves taking the discrete Fourier transform of the data. This transforms our data from the temporal or spatial domain to the frequency domain. The power spectral density (PSD) function is simply the magnitude of the squared Fourier transformed data (often scaled). It tells us how much of the variance in our signal is explained in each discrete frequency band. A spike in the PSD function tells us that there’s a lot of variance explained in that frequency band. The frequency resolution of our analysis (width of the frequency bands) is dictated by the length of our recorded observations. Interestingly, if we integrate the PSD function over the frequency domain, we obtain the sample variance of our data. Now, let’s look at the code.
fs = (0.01)^-1; N =size(y,2); Y = fft(y); f=fs*linspace(0,1,N); G = abs(Y.^2)*(1/(fs*N)); figure(3) loglog(f(2:floor(N/2)),G(2:floor(N/2))) title('Amplitude of Spectrum of y(t)') xlabel('Frequency (Hz)') ylabel('S(f)')
The first two lines are defining the sampling frequency (fs), the number of observations (N). Next we use Matlab’s fft command to take the discrete Fourier transformation of the data. We then compute the frequency bins we are considering, then we compute and plot the PSD function. Let’s take a look at the result
There is a lot of noise, but there is a distinct peak at 0.1592, which is 1/(2*pi), indicates the frequency of the underlying signal. This is neat, even though the signal contains a huge amount of noise, we’ve distinguished the correct frequency (and inversely the period) of the signal!
So how might we as systems analysts use this?
In my PhD thesis we used PSDE to determine the critical time-dynamics of a hydropower system’s operation. This turned out to be a critical diagnostic tool. In the hydropower system I considered, both inflows and energy prices are stochastic, so it can be difficult to determine if the derived ‘optimal’ policy makes sense by only observing simulation results: the release in any one time step (here 6-hrs) can appear noisy and erratic in response to price or hydrologic shocks. The power of PSDE is the ability to cut through the noise and reveal the underlying signal.
As an example, I’ve derived an implicit ‘optimal’ policy using sampling stochastic dynamic programming. To evaluate the quality of the derived policy, I’ve simulated the system operation with this policy over the range of historical inflow series. For simplicity, we’ll assume that the energy price profile (ranging from high on-peak to low off-peak prices) is identical on every day. Now, let’s look at the PSD function for the reservoir release.
We can immediately pick out elements of the PSD function that make sense. First, there is a lot of variability at very low frequencies, corresponding to periodicity on the order of months or seasons. This is likely due to seasonal hydrologic variability between summer, spring, and winter inflows: releases are higher in the spring and early summer because more water is available, and generally lower in the winter and late summer when inflows to the reservoir are lower. Similarly the peak at frequency 1/day makes sense, because we still have a daily price cycle that the system is responding to, put we are harder pressed to explain the other peaks between frequencies 0.1 and 1 (roughly corresponding to weekly and within week periodicities). We have fixed the price profile, but hydrology shouldn’t be cycling during the week, so what is going on?
As it turns out, I had a bug in my algorithm (bad ending conditions for the SSDP) which was causing the ‘optimal’ policy to behave a bit myopically on a weekly basis. This error was propagating through the model, but was not easy to see by simply observing the simulated results (see chapter 6 of the thesis). Correcting the problem, I got the following spectrum. The unexplainable spikes have gone away, and we now see that daily price cycling and seasonal hydrology explain the majority of the systems variability (as we expect).
For our Cornell-based readers our new CEE faculty member Gregory McLaskey is offering a new class on time series analysis (including PSDE) which is highly recommended. PSDE is also used extensively (with real data) in Todd Cowen‘s experimental fluid mechanics class which also recommend because you learn a lot and get to play with lasers which, let’s be honest, is why we all became engineers.