Using linux “cut”

The following code takes a file that has 16 columns and outputs a file with 5 of those columns.  Some notes:

  • Don’t use PATH as a variable.  The program won’t work, because PATH is a system variable!
  • Note the C++ style syntax of the loop.  Versions of bash greater than 3.0 allow you to use curly brackets, like: for i in {1..50}.  But when you want to use variables inside the range, you have to do something else, such as my example.  Others are discussed here.
  • The star of this script is the ‘cut’ command.  d tells what delimiter you’d like.  f tells what fields you want to cut.
  • Then there are some simple commands around cut.  ‘cat’ displays the contents of the file.  Then, the | operator pipes the output of cat into the next command, which is cut.  Finally, you then use the > operator to direct the output of this command into a new file.
  • Save this file on the cluster or a Linux system as myFileNameHere.sh.  Then, to run the code, simply type “sh myFileNameHere.sh”

#!/bin/bash

# Cut out only the objective function values from the CBorg output files.

MYPATH=./output/
INPUT_NAME_BASE=CBorg_LRGV_
OUTPUT_NAME_ADDENDUM=_ObjOnly
EXTENSION=.out
START_COLUMN=9
FINISH_COLUMN=13
NUM_SEEDS=50

echo "Beginning..."
for ((I=1; I<=$NUM_SEEDS; I++));
do
 echo "Processing $I"
 cat ${MYPATH}${INPUT_NAME_BASE}${I}${EXTENSION} | cut -d ' ' -f ${START_COLUMN}-${FINISH_COLUMN} > ${MYPATH}${INPUT_NAME_BASE}${I}${OUTPUT_NAME_ADDENDUM}${EXTENSION}
done
echo "Totally done."

Using gdb, and notes from the book “Beginning Linux Programming”

I just started thumbing through Beginning Linux Programming by Matthew and Stones.  It covers in great detail a lot of the issues we talk about on this blog often — how to debug code, how to program using the BASH shell, file input/output, different development environments, and making makefiles.

Using gdb

One tool I haven’t talked about much is the debugger, gdb.  It is a text-based debugging tool, but it gives you some powerful ways to step through a program.  You can set a breakpoint, and then make rules for what variables are being displayed in each iteration.

Let’s say you have a file, myCode.c that you compile into an executable, myCode.  Compile using the -g flag, and then start gdb on your code by entering “gdb ./myCode”.  If your code has command line arguments, you need to specify an argument to gdb like this:

gdb –args ./myCode -a myArgument1 -b myArgument2

The important phrase here is “–args”, two dashes and the word args, that appears after gdb.  That lets gdb know that your ./myCode program itself has arguments.

You can also set a breakpoint inside gdb (you’d need to do that before you actually run the code).  To do this, say at line 10, simply type “break 10”.  This will be breakpoint 1.  To create rules to display data at each breakpoint type “display”.  It will ask what commands you’d like… for example, to display 5 values of an array, the command is “display array[0]@5”, then “cont” to continue, and “end” to end.

After setting up your breakpoints, simply type “run” to run the code.

If your program has a segmentation fault, it will let you know what line the segmentation fault occurred at, and using “backtrace” you can see what functions called that line.

If you have a segfault and the program is halted, the nice thing is that all the memory is still valid and you can see the value of certain variables.  To see the value of variables say “print myVariableName”.  It is quite informative.  For example, if a variable has a “NAN” next to it, you know there may be something wrong with that variable, that could cause an error somewhere else.

Here’s one example of a possible problem in pseudocode:

levelA = 0;

levelB = 0;

myLevel = 0.5;

myFrac = myLevel / (levelA + levelB);

The fourth line there looks innocuous enough, but this will cause a “divide by zero” error given the levelA and levelB value.  In gdb, you may get a segfault on the fourth line, but a simple “print levelA” and “print levelB” will help you solve the problem.

Here’s a short link that explains the basics of gdb with more detail.

Other notes

Also interesting are several C preprocessor macros that can tell you what line, file, date, and time the code was compiled at.   Predictably, these are __LINE__ __FILE__ __DATE__ and __TIME__ (that’s two underscores for each).

I also like the bash scripting examples that are contained in the book.  They taught me about some Linux utilities like “cut” that are very helpful, and covered elsewhere on this blog.

Any additional tips and tricks are welcome in the comments!