A survival function (aka survivor function or reliability function) is a function often used in risk management for visualizing system failure points. For example, it can be used to show the frequency of a coastal defense structure failure (such as a breach in a levee) in a future state of the world.

The function itself is quite simple. For a distribution of events, the survival function (*SF*) is *1-CDF* where *CDF* is the cumulative distribution function. If you’re deriving the distribution empirically, you can substitute the *CDF* with the cumulative frequency. It is often plotted on a semi-log scale which makes tail-area analysis easier.

I’ve written some R code that creates a primitive Survival Function plot from a vector of data. Below is the function (Note: You can find the code and an example of its usage on bitbucket https://bitbucket.org/ggg121/r_survival_function.git)

plot.sf <- function(x, xlab=deparse(substitute(x)), left.tail=F, ylab=ifelse(left.tail, "SF [Cum. Freq.]", "SF [1 - Cum. Freq.]"), make.plot=T, ...) { num.x <- length(x) num.ytics <- floor(log10(num.x)) sf <- seq(1,1/num.x,by=-1/num.x) if(left.tail){ order.x <- order(x, decreasing=T) order.sf <- sf[order(order.x)] } else { order.x <- order(x) order.sf <- sf[order(order.x)] } if(make.plot) { plot(x[order.x], sf, log="y", xlab=xlab, ylab=ylab, yaxt="n", ...) axis(2, at=10^(-num.ytics:0), label=parse(text=paste("10^", -num.ytics:0, sep="")), las=1) } invisible(order.sf) }

Download and source the code at the start of your R script and you’re good to go. The function, by default, creates a plot in the current plotting device and invisibly returns the survival function values corresponding to the vector of data provided. The parameter `left.tail`

sets the focus on the left-tail of the distribution (or essentially plots the *CDF* on a semi-log scale). By default, the function puts the focus on the right tail (`left.tail = FALSE`

). The `make.plot`

parameter allows you to toggle plotting of the survival function (default is on or `make.plot=TRUE`

. This is useful when you simply need the survival function values for further calculations or custom plots. Additional parameters are passed to the `plot()`

function. Below is an example (which is also available in the repository).

# Source the function source("plot_sf.r") # Set the seed set.seed(1234) # Generate some data to use my.norm <- rnorm(10000, 10, 2) my.unif <- runif(10000) my.weib <- rweibull(10000, 20, 5) my.lnorm <- rlnorm(10000, 1, 0.5) # Make the plots ---------------------- par(mfrow=c(2,2), mar=c(5,4,1,1)+0.1) # Default plot settings plot.sf(my.norm) # Function wraps the standard "plot" function, so you can pass # the standard "plot" parameters to the function plot.sf(my.unif, type="l", lwd=2, col="blue", bty="l", ylab="Survival", xlab="Uniform Distribution") # If the parameter "left.tail" is true, the plot turns into # a cumulative frequency plot (kind of like a CDF) that's plotted # on a log scale. This is good for when your data exhibits a left or # negative skew. plot.sf(my.weib, type="l", left.tail=T, xlab="Left-tailed Weibull Dist.") # The function invisibly returns the survival function value. lnorm.sf <- plot.sf(my.lnorm, type="l") points(my.lnorm, lnorm.sf, col="red") legend("topright", bty="n", legend=c("Function Call", "Using returned values"), lty=c(1,NA), pch=c(NA,1), col=c("black", "red") ) # The 'make.plot' parameter toggles plotting. # Useful if you just want the survival function values. norm.sf <- plot.sf(my.norm, make.plot=F)

And here’s the resulting figure from this example:

Now you can easily show, for example, tail-area frequency of events. For example, below is a survival function plot of a normal distribution:

For this example, we can imagine this as a distribution of flood heights (x-axis would be flood height – note that a real distribution of flood heights would likely look drastically different from a normal distribution). With this visualization, we can easily depict the “1 in 10” or the “1 in 1,000” flood height by following the appropriate survival function value over to the corresponding flood height on the plot. Alternatively, you can determine the return period of a given flood height by following the flood height up to the plot and reading off the survival function value. Comparing multiple distributions together on a single plot (think deep uncertainty) can produce interesting decision-relevant discussion about changes in return periods for a given event or the range of possible events for a given return period.

I hope this post is useful. Survival function plots are incredibly versatile and informative…and I’ve only started to scratch the surface!

Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog