Using linux “split”

Today I’d like to quickly talk about the linux command “split”.  I like writing about new simple Linux commands as evidenced here and here.

I often write customized C++ scripts to manipulate large data files.  There’s obviously a time and place for this, since you get ultimate control on every aspect of how your data looks going in and coming out.  We’ve written about this before, and I think string processing is an important skill no matter what language.  There’s a post about matlab (and another one here), some sample bash scripting, and a post about python among other things.  You should also see Matt’s series on python data analysis, since I’m doing some shameless plugging!

Anyway… little did I know that something very complicated in C++ can be easily done in linux/unix with “split”!

To split a large file into smaller files with, say, 100 lines, you use: “split -l 100 myLargerFile.txt”  There are also commands to change the filenames of the output files, and so forth.

Read the man page for split, and check out forum posts here and here to get on your way!

grep allows you to find an expression in one or more files in a folder on Linux.  I find it useful for programming.  Say, for example, I want to look for the string “nrec” in a set of source code and header files.  Maybe “nrec” is a variable and I forgot where I declared it (if this sounds a little too specific to be merely an example, you’re right. This is what I’m having to do right this second!).  The grep command is:

grep -in “nrec” *.*

What this means is, search for the “nrec” expression in every file in the folder.  There are two useful flags set here as well.  “i” means that the search is case insensitive (that is, NREC and NrEc and nrec are each treated as equal).  “n” means that the program will show me the line number of each occurrence of my desired phrase.  There are other options that I’m not using, including “inverting” a search to find all occurrences of NOT that phrase, suppressing the file name or only showing the file name, etc.

If you were curious, here’s a sample of the output:

iras.h:144: int num_flow_datapoints; //originally: NRec
SimSysClass.cpp:806: flowrecs=sysstat(nrec)

(If you’re curious, the first instance is in a header file, on line 144.  I’m translating this code from one language to another, and originally the variable was called “nrec”. So in the header file I made a note that now my variable is called something else.  In the second instance, I had copied the original code into my file as a placeholder, so now I know that I need to use my new name in its place.  Also, the “i” flag in grep is helpful since fortran is not case-sensitive, and here you can see there were two different case styles for this variable even in our simple example.)

For more info, please consult some casual reference such as this excellent post about linux command line utilities,  a similar blog post about grep, and of course the Linux man page for the command. Also look at 15 grep tips.  As usual, remember that “man [insert command here]” gives you all the low-down on each command you’d like to learn.

Thanks for reading and please comment with additional tips or questions!

4 thoughts on “Using linux “split”

  1. Thanks for bringing up grep, one of my all-time favorite commands! Grep is incredibly powerful — your search pattern can be a regular expression, you can use it in a pipeline. One of my favorite ways to use grep is to check how many jobs I have running or in queue on the cluster:

    qstat -umjw5407 | grep ” R ” | wc -l
    qstat -umjw5407 | grep “Q” | wc -l

  2. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

  3. Pingback: Simple Bash shell scripts that have made my life easier – Water Programming: A Collaborative Research Blog

  4. Pingback: More Terminal Schooling – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s