# C++ Training: Exercise 1

(Updated 1/30/12)

This example requires some data you can download here: inputdata.  Open the file in Excel and “Save As”… csv.

Write a C++ program that reads data from the csv file and then calculates the mean, the standard deviation, the 10th percentile, and the 90th percentile, of each column of data.  Output these statistics to one or more new files.  You may create one file for each statistic, i.e.

means-of-data.csv

col1, col2, col3,
mean-of-col-1, mean-of-col-2, mean-of-col-3

Or put each statistic in its own row in a single file.

For this exercise, you may need the following functions, libraries, or features.  Please look them up on cplusplus.com and make liberal use of the example code there!

math: pow, operators such as +, -, and *

input/output of data: please use ifstream and ofstream.  You’ll want to use the << and >> operator, and the function: getline

manipulating text streams: the c++ library string, and stringstream, may be helpful.  You may need to look up how to handle csv files, and how to separate the commas from the data

data: sort

control structures: if, else, while, for

One piece of advice: if you want to convert a C++ style string to an integer or vice versa, you may find the following functions handy:

```
string intToString(int input_int)

{

string s;

stringstream out;

out << input_int;

s = out.str();

out.clear();

return s;

}

int stringToInt(string input_string)

{

return atoi(input_string.c_str());

}

double stringToDouble(string input_string)

{

return strtod(input_string.c_str(), NULL);

}

```

Unit 1: Getting the Program Working

1. Write a standard deviation function that can take the standard deviation of an arbitrary number of values. Verify that the function works either with a calculator, Matlab, or Excel. Is the precision the same? What could cause differences between your answer and the calculators?
2. Learn how to use the input/output streams.  You’ll want to set up one stream to read the file, and another one to write output files. To test this, you can simply “copy” the file you read in directly into another file. Are there ways to do this without using too much system memory?
3. The easiest statistics to calculate are mean and standard deviation, since they can be done without storing much information.  The 10th and 90th percentile are more difficult, because you must sort the values first.  Don’t try to do these until you have the other two tasks completed.
4. Once you have your results, verify them using Matlab or Excel.  Are your statistics correct?

Unit 2: Intermediate Steps

After your program is working, please try the following procedures:

1. Initially, you can write all the code necessary to do this task in one file, maybe even all in the main() function. This is fine for small tasks, but as you continue, you’ll need to have more complicated programs that span multiple files and use many more functions. A makefile is a way to compile large projects using Linux. Set up a makefile for your program, and try separating the code into multiple files. There’s a short blog post to help you (log in to see it). You can also use a development environment such as Eclipse to create a makefile for you, or Visual Studio to create a project file without needing a makefile.
2. The first time you do this example, you may use (or have used) regular C-style arrays to store the data from the CSV files. This is fine, as long as you know how many rows and columns are in the file! Arrays also require you to keep track of pointers and memory management, which can make operations like sorting more painful than they need to be. C++’s answer to these problems is vectors. Vectors are like ArrayLists in Java; they can be sized and appended dynamically, and they know their own size. Consider implementing the same program using vectors for an input file with an unknown number of rows and columns.
3. Use gprof to profile your code, to determine how much time is being spent in each function. This can identify “bottlenecks” in your code. See a blog post here for more info.
4. Use valgrind to identify memory leaks in your code. This is very important, since memory issue can actually make your code fail to run. The blog post here explains.
Unit 3. Pulling everything together
1. The last step will be to combine everything you’ve done so far to create your own Matrix class. You should be able to find some good websites explaining basic C++ classes, and if you need more advice, come talk to us. Your Matrix class should use vectors to create a matrix behind the scenes, so that you can use it easily in your main() function. The class should have some helpful functions to simplify your tasks … for example, you may want to have a Matrix.readFile(…) function, a Matrix.sortColumns() function, or whatever else you want to include!
2. You may want to implement your class using a separate header (.h) and source (.cpp) file. To compile a project with multiple source files, you’ll need to learn how to use basic makefiles, which are explained in Joe’s previous post (link is given in step 2).
3. Re-implement your main() function using your new Matrix class. Make sure your output is the same as in the previous exercises. Once you do this, you will have written a general, re-usable class that could (potentially) save you lots of time in the future.

If you have questions, please ask them and if the answers are general enough we will add them to this post.  Good luck!

–Jon and Joe

## 3 thoughts on “C++ Training: Exercise 1”

1. alisharfernandez |

General question to the group:

How do you prefer to read in a file, such as a .csv file, beyond stringstream? I’ve used that to complete this task, but I am interested in learning different ways.

Thanks
-Alisha

• Joseph |

Can you please share the working code for read CSV file with me? I am learning C++ and would like to get some handson.