Survival Function Plots in R

A survival function (aka survivor function or reliability function) is a function often used in risk management for visualizing system failure points. For example, it can be used to show the frequency of a coastal defense structure failure (such as a breach in a levee) in a future state of the world.

The function itself is quite simple. For a distribution of events, the survival function (SF) is 1-CDF where CDF is the cumulative distribution function. If you’re deriving the distribution empirically, you can substitute the CDF with the cumulative frequency. It is often plotted on a semi-log scale which makes tail-area analysis easier.

I’ve written some R code that creates a primitive Survival Function plot from a vector of data.  Below is the function (Note: You can find the code and an example of its usage on bitbucket https://bitbucket.org/ggg121/r_survival_function.git)

plot.sf <- function(x, xlab=deparse(substitute(x)), left.tail=F,
  ylab=ifelse(left.tail, "SF [Cum. Freq.]", "SF  [1 - Cum. Freq.]"),
  make.plot=T, ...)
{
  num.x <- length(x)
  num.ytics <- floor(log10(num.x))
  sf <- seq(1,1/num.x,by=-1/num.x)
  
  if(left.tail){
    order.x <- order(x, decreasing=T)
    order.sf <- sf[order(order.x)]
    
  }  else {
    order.x <- order(x)
    order.sf <- sf[order(order.x)]
  }
  
  if(make.plot) {
    plot(x[order.x], sf, log="y", xlab=xlab, ylab=ylab, yaxt="n", ...)
    axis(2, at=10^(-num.ytics:0), 
         label=parse(text=paste("10^", -num.ytics:0, sep="")), las=1)
  }
  invisible(order.sf)
}

Download and source the code at the start of your R script and you’re good to go. The function, by default, creates a plot in the current plotting device and invisibly returns the survival function values corresponding to the vector of data provided. The parameter left.tail sets the focus on the left-tail of the distribution (or essentially plots the CDF on a semi-log scale). By default, the function puts the focus on the right tail (left.tail = FALSE). The make.plot parameter allows you to toggle plotting of the survival function (default is on or make.plot=TRUE. This is useful when you simply need the survival function values for further calculations or custom plots. Additional parameters are passed to the plot() function. Below is an example (which is also available in the repository).

# Source the function
source("plot_sf.r")

# Set the seed
set.seed(1234)

# Generate some data to use
my.norm <- rnorm(10000, 10, 2)
my.unif <- runif(10000)
my.weib <- rweibull(10000, 20, 5)
my.lnorm <- rlnorm(10000, 1, 0.5)


# Make the plots ----------------------
par(mfrow=c(2,2), mar=c(5,4,1,1)+0.1)

# Default plot settings
plot.sf(my.norm)

# Function wraps the standard "plot" function, so you can pass
# the standard "plot" parameters to the function
plot.sf(my.unif, type="l", lwd=2, col="blue", bty="l",
        ylab="Survival", xlab="Uniform Distribution")

# If the parameter "left.tail" is true, the plot turns into 
# a cumulative frequency plot (kind of like a CDF) that's plotted
# on a log scale.  This is good for when your data exhibits a left or
# negative skew.
plot.sf(my.weib, type="l", left.tail=T, xlab="Left-tailed Weibull Dist.")

# The function invisibly returns the survival function value.
lnorm.sf <- plot.sf(my.lnorm, type="l")
points(my.lnorm, lnorm.sf, col="red")
legend("topright", bty="n", 
       legend=c("Function Call", "Using returned values"), 
       lty=c(1,NA), pch=c(NA,1), col=c("black", "red") )

# The 'make.plot' parameter toggles plotting.
# Useful if you just want the survival function values.
norm.sf <- plot.sf(my.norm, make.plot=F)

And here’s the resulting figure from this example:

survival_function_plot_example

Now you can easily show, for example, tail-area frequency of events. For example, below is a survival function plot of a normal distribution:

survival_plot_norm_didactic

For this example, we can imagine this as a distribution of flood heights (x-axis would be flood height – note that a real distribution of flood heights would likely look drastically different from a normal distribution). With this visualization, we can easily depict the “1 in 10” or the “1 in 1,000” flood height by following the appropriate survival function value over to the corresponding flood height on the plot. Alternatively, you can determine the return period of a given flood height by following the flood height up to the plot and reading off the survival function value. Comparing multiple distributions together on a single plot (think deep uncertainty) can produce interesting decision-relevant discussion about changes in return periods for a given event or the range of possible events for a given return period.

I hope this post is useful. Survival function plots are incredibly versatile and informative…and I’ve only started to scratch the surface!

Advertisements

One thought on “Survival Function Plots in R

  1. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s