Useful Linux commands to handle text files and speed up work

Most of us, nice and normal human beings, tend to prefer programs with GUIs over typing commands on a command prompt because the former looks more “real” and is more visual than the latter. However, one thing we don’t know (or, sometimes, don’t want to know) is that learning a few terminal commands can dramatically increase productivity. These commands can save us a lot of time by sparing us from opening and closing programs, navigating through menus and windows, moving the mouse around, as well as moving the hand back and forth from the mouse to the keyboard.

This post will mention and briefly describe some useful “intermediate level” Linux commands (some basic commands are presented in this post by Jon Herman), which can be called from a Linux OS, Cygwin (mostly), or Mac. Among the numerous tedious tasks these commands can greatly simplify is the particularly interesting chore of handling text files, be they scripts or data files. Commands for other tasks are covered as well. Keep in mind that the symbol * is a wild card (character that can mean any string of characters when searching for files), which is really useful when the goal is to repeatedly apply one command to multiple files. For all commands listed here skip the “$” character.

DELIMITER SEPARATED FILES HANDLING

  • Remove columns 30 to 36 (starting from 0) from a comma separated file and export the output to another file.
    $ cut -d',' -f1-30,36 input.file >> output.file

    (More information on the post by Joe Kasprzyk)

  • Print only columns 2 and 4 (starting from 1) of a comma separated file.
    $ awk -F "," '{print $2,$4}' input.file >> output.file
  • Count number of columns in a file separated either by spaces or commas:
    $ head -1 input.file | sed 's/[^, ]//g' | wc -c
    or:
    $ awk -F "[, ]" 'END{print NF}' input.file
  • Print lines of a comma separated file in which the value in the 2nd column is lower than 100 and the value in the 5th column is higher than 0.3:
    $ awk -F "," '$2<100 && $5>0.3' input.file >> output.file
  • Print lines between 10th and 20th lines (not inclusive) of a file:
    $ awk 'NR>10 && NR<20' input.file >> output.file
  • Add a string to the end of multiple files:
    $ echo "your string" | tee -a *.set
  • Add a string to the end of one file:
    $ echo "your string" >> file

FILE SEARCHING

  • Find all text files in a folder that contain a certain string:
    $ grep -rn './folder' -e your_string
  • Find files recursively (* is a wildcard):
    $ find -type f -name name_of*.file

FILES INFO

  • See the contents of a zip/tar file without extracting it. Press q to quit.
    $ less file.tar
  • Count number of lines in a file:
    $ wc -l your.file
  • List all files with a certain extension in a directory:
    $ ls *.ext
  • Print files and folders in tree fashion:
    $ tree
  • Print the size of all subfolders and files in (sub)folders to a certain max depth in the folder hierarchy:
    $ du -h -a --max-depth=2

IMAGE HANDLING

  • Convert svg files to png (you need to have Inkscape installed):
    $ inkscape input.svg -d 300 -e output.png
  • Convert svg files to pdf-latex (you need to have Inkscape installed):
    $ inkscape input.svg --export-pdf output.pdf --export-latex
  • Rotate a picture:
    $ convert Fig6_prim.png -rotate 90 Fig6_prim_rotated.png

MISCELLANEOUS

  • See the history of commands you have typed:
    $ history
  • See a calendar (month and year optional):
    $ cal [month] [year]
  • Combine pdf files into one (you need to have pdftk installed):
    $ pdftk file1.pdf file2.pdf file3.pdf cat output newfile.pdf
    or, to merge all pdf files in a directory:
    $ pdftk *.pdf cat output newfile.pdf

    In order to see how to combine only certain pagers of pdf files, as well as how to splits all pages into separate pdf files, see this page.

  • See the manual of a command:
    $ man command

Another useful idea is that of piping outputs of a command to another command. For example, if you want print the number of files in a directory, you can pipe the output of the ls command (list all files in a directory) to the wc -l command (count the number of lines in a string). For this, use the “|” character:

$ ls | wc -l

However, you may want instead to check the number of lines in all the 20 files in a directory at once, which can also be achieved by combining the ls and wc commands with the command xargs. The command then would look like:

$ ls | xargs wc -l

The command xargs breaks down the output from ls into one string per line and then calls wc -l for each line of the output of ls.

Hope all this saves you some time!

Advertisements

4 thoughts on “Useful Linux commands to handle text files and speed up work

  1. Thanks for this, I learned things. What’s the difference between xargs and tee? They both look like good ways to avoid for loops.

    • I’m glad the post was useful! Tee -a appends a string to the end of one or multiple files, while xargs breaks down the output of a command being piped to another command.

  2. Pingback: Water Programming Blog Guide (Part I) – Water Programming: A Collaborative Research Blog

  3. Pingback: Enhance your (Windows) remote terminal experience with MobaXterm – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s