In this post, I will go over some simple tools which can be used to create more efficient and concise R and Python scripts. First, I will explain the apply function in R and Python. Then, I will briefly go over anonymous functions in Python.
The Apply Function in R
The apply function is used to manipulate data frames, matrices, and lists. It takes a data frame and a function as inputs, and applies that function to each row or column of the data frame. In essence, the apply function is an alternative to “for” loops.
The apply function has three main inputs: a data object, a margin variable, and a function. As mentioned earlier, the data object can have different formats. The margin variable specifies if the function applies to rows (MARGIN=1) or columns (MARGIN=2). The function can either be an built-in R function (e.g., sum or max) or a function that the user defines. The function can be defined both inside and outside the apply function.
Here I will define a simple problem as our test case. The task is to find the maximum of each column and divide all the elements of that column by the maximum. We will use the iris data set, because it is available in R and in Python’s seaborn package.
# Load the iris data set data(iris) # Assign first four columns of the iris data set to a data frame iris_df<-as.data.frame(iris[,1:4]) # Use the apply function to do the calculations of the example problem output_max=as.data.frame(apply(iris_df, MARGIN = 2, FUN = function (x) x/max(x)))
Sometimes there are other, easier ways to do these calculations. However, when what you want to do is more complicated, this method comes in handy. The apply function has some other variants such as lapply, sapply, and mapply. Refer to this post (here) for more information about these functions.
The Apply Function in Python
The pandas package for Python also has a function called apply, which is equivalent to its R counterpart; the following code illustrates how to use it. In pandas, axis=0 specifies columns and axis=1 specifies rows. Note that in this example I have defined a function outside of the apply, and imported it to calculate the maximum and the ratio-to-maximum. In the next section, I will present an alternative way of defining in-line functions in Python.
# The iris data set is available in the seaborn package in python import seaborn as sns import pandas # The following script loads the iris data set into a data frame iris = sns.load_dataset('iris') # Define an external function to calculate the ratio-to-maximum def ratio_to_max (data): maximum=max(data) print(maximum) ratio=data/maximum return ratio # Use the built-in apply function in Python to calculate the ratio-to-maximum for all columns output_df=iris.iloc[:,0:4].apply(ratio_to_max, axis=0)
Anonymous Functions in Python
Python provides an easy alternative to external functions like the one used above. This method is called an anonymous or “lambda” function. A lambda is a tool to conduct a specific task on a data object, similar to a regular function; however, it can be defined within other functions and doesn’t need to be assigned a name. Therefore, in many cases, lambdas offer a cleaner and more efficient alternative to regular functions. A history of the lambda function can be found in this post (here), which also provides a comprehensive list of lambda’s functionalities. Here is an example of the lambda function used instead of the regular function defined before:
# The iris data set is available in the seaborn package in python import seaborn as sns import pandas # The following script loads the iris data set into a data frame iris = sns.load_dataset('iris') # Here we use lambda to create an anonymous function and use that within panda's apply function output_df=iris.iloc[:,0:4].apply(lambda x:x/max(x), axis=0)
Note that, although R does not have a tool like lambda, it does provide a way of defining anonymous functions such as the one defined within the apply function. Also, there are other widely used Python built-in functions which work nicely with lambdas. For example, the map, filter, and reduce functions can take advantage of lambda’s simplicity in complex data mining tasks. You can refer to here and here for more information about these functions.