Shell utilities

Objectives

At the end of this self-learning lab, you should be able to:

Understand the Unix Standard I/O
Use |, >, <, >() and <() in Bash
Use the commands grep, tee, wc, cut, sort, uniq, diff, head, tail
Understand process exit codes and use && and || for chaining in Bash
Understand environment variables and use $ and $() in Bash

Very often, processes output a large amount of output. To extract useful information out of these output, we use text processing commands from the terminal.

Stdin, stdout and stderr

All processes have one input called "stdin" and two outputs called "stdout" and "stderr". When the process is run from command line directly in a shell, the stdout and stderr are merged together to display on a shell.

Pipe from/to file

You can pipe the stdout or stderr of a command to a file.

The following syntax writes the stdout of cmd to the file at path file.txt. The shell will only display the stderr output, and the stdout output is written to file.txt.

$ cmd >file.txt

Alternatively, write the stderr of a to error.txt:

$ cmd 2>error.txt

What does 2 mean here?

It is a standard that 0, 1, 2 refer to the stdin, stdout and stderr of any process. 2>error.txt means to write the stream 2 (the stderr stream) to error.txt.

In Linux, there is a special file called /dev/null. It is a black hole where the file always remains empty no matter how much data you write into it; all file writes are accepted successfully but silently discarded. To indicate that we want to "ignore" some output, we can pipe them to /dev/null.

Similarly, you can pipe a file into the stdin of a command using the < operator:

$ cmd <file.txt

Try it yourself

To begin with, let's create a command that outputs both stdout and stderr.

Run nano count.sh, which opens an interactive CLI text editor for the file count.sh. Then copy the following:

#!/bin/bash
i=1
while [ $i -le 3 ]; do
    echo This line writes $i to stdout
    echo This line writes $i to stderr >&2
    ((i++))
done

Use ^X-Y Enter to confirm saving the file. You can see hints for these keybindings at the bottom of the nano screen:

^G Get Help    ^O Write Out   ^W Where Is    ^K Cut Text    ^J Justify
^X Exit        ^R Read File   ^\ Replace     ^U Paste Text  ^T To Spell

Let's try making the file executable and run it:

$ chmod +x count.sh
$ ./count.sh
This line writes 1 to stdout
This line writes 1 to stderr
This line writes 2 to stdout
This line writes 2 to stderr
This line writes 3 to stdout
This line writes 3 to stderr

Let's verify whether the claims about writing to stdout/stderr are true:

$ ./count.sh >/dev/null
This line writes 1 to stderr
This line writes 2 to stderr
This line writes 3 to stderr
$ ./count.sh 2>/dev/null
This line writes 1 to stdout
This line writes 2 to stdout
This line writes 3 to stdout

Let's write the stderr to another file called error.txt:

$ ./count.sh 2>error.txt
This line writes 1 to stdout
This line writes 2 to stdout
This line writes 3 to stdout
$ cat error.txt
This line writes 1 to stderr
This line writes 2 to stderr
This line writes 3 to stderr

Pipe to process

We can pipe the stdout of one process to the stdin of another process using the | operator.

$ command1 | command2

This will result in the following data flow:

  +-------------+                        +--------------+
  | shell input |        +---------------> shell output |
  +-------------+        |               +--------------+
         |               |                   ^   ^
         |         stderr|           stderr  |   |
         |    +----------+      +----------+-+   |
         +--->| command1 +----->| command2 |-----+
              +----------+      +----------+
            stdin     stdout  stdin      stdout

`tee`

The tee command, as its name tells, pipes data in a τ shape.

This is what the command tee file.txt command does:

+-------+     +-----+     +--------+
| stdin |---->| tee |---->| stdout |
+-------+     +-----+     +--------+
                 |
                 |       +----------+
                 +------>| file.txt |
                         +----------+

In modern versions of Bash, you can use the >(another cmd) syntax, which will be resolved into a temporary file that would write data into the stdin of another cmd.

`grep`

grep means Global Regular Expression Print.

... Is that confusing? Maybe just memorize it as "grep grabs occurrences of a search". It filters the input and only outputs all lines that match the search (or all lines that don't, if you provide the -v flag).

You can use grep to find all occurrences of a word:

$ grep robot /usr/share/dict/words
robot
robot's
robotic
robotics
robotics's
robots

What is /usr/share/dict/words?

/usr/share/dict/words is a file where each line contains a valid English word. See man 5 american-english for more information.

grep treats the first argument as a "pattern", which is interpreted in various rules.

If you want grep to treat your argument as-is (so that it does not treat symbols like . differently), use grep -F.
If you want grep to treat your argument as a PCRE ("Perl-compatible Regular Expression") (which is the regular expression flvaour used by Python's re module), use grep -P.

There are many great online tutorials about regular expressions, such as Wikipedia and regular-expressions.info. (Remember we are usually using PCRE)

A few useful flags in grep to pay attention to:

-v: invert selection
-i: case insensitive search
-r: search all files in a directory
- Very useful for checking how other people are using a certain function when working on an unfamiliar project!

Hint

Remember to escape your regex pattern if you run grep on a shell. In particular, '' is a great choice to wrap your argument since backslashes inside '' do not get processed.

Info

In Git repositories, you can also use git grep instead of grep -r to grep all staged files.

Other text-processing commands

These files all have their man pages. Read the man page to see their precise usage!

wc: count the number of lines, words and bytes in the input
cut: truncats each line
- -c: Truncate specific columns
- -d and -f: split the line by a character (only one character!) and get a specific field
sort: sorts the input
uniq: removes adjacent duplicate lines (sort it before using uniq!)
- -c: count number of duplicate lines
diff: compare two files
- You can also use colordiff to get deleted lines in red and added lines in green.
head and tail: take only the first/last lines of the input

Hint

Many commands accept a file as its argument to read from, but if no file is provided, it uses stdin as the source. That's why head file.txt and head <file.txt do the same thing.

Checkpoint: How to read lines 5-8 from the file /usr/share/dict/words?

$ head -n8 </usr/share/dict/words | tail -n4

First take the first 8 lines, then filter the last 4 in the first 8. This algorithm works as long as the file is more than 8 lines long.

Info

If you want a reliable algorithm that works properly even when the file has less than 8 lines, have a look at the awk/sed commands, which are more complex.

Exit code

Every command and process exits with an "exit code", which is a (usually small) integer. A command that terminated successfully exits with the exit code 0. A non-zero exit code indicates an error in the process.

Info

Shell commands and functions like cd also have exit codes, but this is just an emulation at the shell level.

`&&` and `||`, `true` and `false`

Consider this sentence in English:

Find a line with the word "needle" or exit.

What does this mean? This means you should exit if you can't find a line with the word "needle". That's equivalent to the following line:

$ grep needle <file || exit

In many programming languages, || means "or" and && means "and". If the first command is false (exits with a non-zero code), the command behind || is run. Similar, if the first command is true (exits with 0), the command behind && is run.

The command true always exits with 0, and the command false always exits with 1. They are useful dummy commands to use when you want to coerce a certain exit code. For example, some other command || true coerces the exit code to true no matter some other command is successful or not.

Environment variables

Environment variables (env var) are used to pass string data to child processes. Env vars set in a process will be inherited by the child process (unless otherwise specified). This allows optionally passing certain settings at a global level.

In a shell, environment variables can be added/updated by running the export shell-command:

export ARG_NAME="arg value"

You can set variables local to the shell (not inherited by child processes) if you omit the export keyword. In this case, it is a shell variable rather than an environment variable.

In Bash, you can pass a variable as an argument to a command using the "$VAR_NAME" syntax:

ARG_NAME="arg value"
echo "$ARG_NAME"

Note that the variable must be wrapped with "", otherwise the argument is expanded directly and spaces in the variable would lead to separate arguments.

You can also temporarily set env vars for a single command using ARG_NAME="arg value" cmd line, which runs the command cmd line with the env var ARG_NAME set to arg value.

Try it yourself

Use the env command to see the environment variables in your current shell. You can grep FOO_BAR to see that there is no env var called FOO_BAR.

Run FOO_BAR="qux" in the shell (do not use export yet). Now run env | grep FOO_BAR again. You can see that FOO_BAR is still not an env var for env.

Now run export FOO_BAR="qux" and grep env again. There is a line FOO_BAR=qux.

Now let's check the behaviour of quoted variables. Try setting FOO="bar qux" and run mkdir $FOO. You can ls to see that two directories bar and qux are created. rmdir them and try mkdir "$FOO" instead. Now there is just one directory called bar qux.

Typically, we put env vars we want to always predefine in the .bashrc file, which is a file run automatically when you start a bash login shell.

Interpolating command output

The output of a command can be embedded in another command using the $() operator. As with environment variables, remember to wrap the syntax with "" to encapsulate spaces.

Try it yourself

Contrast <() and $():

## This command shows your Ubuntu version
$ lsb_release -cs
focal

## Let's trying passing it to echo
$ echo Ubuntu "$(lsb_release -cs)"
Ubuntu focal

## What if we used `<()` instead?
$ echo Ubuntu <(lsb_release -cs)
Ubuntu /dev/fd/63

We are passing the output of lsb_release to somewhere that does not actually read its contents, so you may get a "Broken pipe" error.

Shell utilities

Stdin, stdout and stderr

Pipe from/to file

Pipe to process

tee

grep

Other text-processing commands

Exit code

&& and ||, true and false

Environment variables

Interpolating command output

`tee`

`grep`

`&&` and `||`, `true` and `false`