Shell utilities
Objectives
At the end of this self-learning lab, you should be able to:
- Understand the Unix Standard I/O
- Use
|
,>
,<
,>()
and<()
in Bash - Use the commands
grep
,tee
,wc
,cut
,sort
,uniq
,diff
,head
,tail
- Understand process exit codes and use
&&
and||
for chaining in Bash - Understand environment variables and use
$
and$()
in Bash
Very often, processes output a large amount of output. To extract useful information out of these output, we use text processing commands from the terminal.
Stdin, stdout and stderr
All processes have one input called "stdin" and two outputs called "stdout" and "stderr". When the process is run from command line directly in a shell, the stdout and stderr are merged together to display on a shell.
Pipe from/to file
You can pipe the stdout or stderr of a command to a file.
The following syntax writes the stdout of cmd
to the file at path file.txt
.
The shell will only display the stderr output,
and the stdout output is written to file.txt
.
$ cmd >file.txt
Alternatively, write the stderr of a to error.txt
:
$ cmd 2>error.txt
What does 2
mean here?
It is a standard that 0
, 1
, 2
refer to the stdin, stdout and stderr of any process.
2>error.txt
means to write the stream 2
(the stderr stream) to error.txt
.
In Linux, there is a special file called /dev/null
.
It is a black hole where the file always remains empty
no matter how much data you write into it;
all file writes are accepted successfully but silently discarded.
To indicate that we want to "ignore" some output, we can pipe them to /dev/null
.
Similarly, you can pipe a file into the stdin of a command using the <
operator:
$ cmd <file.txt
Try it yourself
To begin with, let's create a command that outputs both stdout and stderr.
Run nano count.sh
, which opens an interactive CLI text editor for the file count.sh
.
Then copy the following:
#!/bin/bash
i=1
while [ $i -le 3 ]; do
echo This line writes $i to stdout
echo This line writes $i to stderr >&2
((i++))
done
Use ^X-Y
Enter to confirm saving the file.
You can see hints for these keybindings at the bottom of the nano screen:
^G Get Help ^O Write Out ^W Where Is ^K Cut Text ^J Justify
^X Exit ^R Read File ^\ Replace ^U Paste Text ^T To Spell
Let's try making the file executable and run it:
$ chmod +x count.sh
$ ./count.sh
This line writes 1 to stdout
This line writes 1 to stderr
This line writes 2 to stdout
This line writes 2 to stderr
This line writes 3 to stdout
This line writes 3 to stderr
Let's verify whether the claims about writing to stdout/stderr are true:
$ ./count.sh >/dev/null
This line writes 1 to stderr
This line writes 2 to stderr
This line writes 3 to stderr
$ ./count.sh 2>/dev/null
This line writes 1 to stdout
This line writes 2 to stdout
This line writes 3 to stdout
Let's write the stderr to another file called error.txt
:
$ ./count.sh 2>error.txt
This line writes 1 to stdout
This line writes 2 to stdout
This line writes 3 to stdout
$ cat error.txt
This line writes 1 to stderr
This line writes 2 to stderr
This line writes 3 to stderr
Pipe to process
We can pipe the stdout of one process to the stdin of another process using the |
operator.
$ command1 | command2
This will result in the following data flow:
+-------------+ +--------------+
| shell input | +---------------> shell output |
+-------------+ | +--------------+
| | ^ ^
| stderr| stderr | |
| +----------+ +----------+-+ |
+--->| command1 +----->| command2 |-----+
+----------+ +----------+
stdin stdout stdin stdout
tee
The tee
command, as its name tells, pipes data in a τ shape.
This is what the command tee file.txt
command does:
+-------+ +-----+ +--------+
| stdin |---->| tee |---->| stdout |
+-------+ +-----+ +--------+
|
| +----------+
+------>| file.txt |
+----------+
In modern versions of Bash, you can use the >(another cmd)
syntax,
which will be resolved into a temporary file
that would write data into the stdin of another cmd
.
grep
grep
means Global Regular Expression Print.
... Is that confusing?
Maybe just memorize it as "grep
grabs occurrences of a search".
It filters the input and only outputs all lines that match the search
(or all lines that don't, if you provide the -v
flag).
You can use grep
to find all occurrences of a word:
$ grep robot /usr/share/dict/words
robot
robot's
robotic
robotics
robotics's
robots
What is /usr/share/dict/words
?
/usr/share/dict/words
is a file where each line contains a valid English word.
See man 5 american-english
for more information.
grep
treats the first argument as a "pattern",
which is interpreted in various rules.
- If you want
grep
to treat your argument as-is (so that it does not treat symbols like.
differently), usegrep -F
. - If you want
grep
to treat your argument as a PCRE ("Perl-compatible Regular Expression") (which is the regular expression flvaour used by Python'sre
module), usegrep -P
.
There are many great online tutorials about regular expressions, such as Wikipedia and regular-expressions.info. (Remember we are usually using PCRE)
A few useful flags in grep
to pay attention to:
-v
: invert selection-i
: case insensitive search-r
: search all files in a directory- Very useful for checking how other people are using a certain function when working on an unfamiliar project!
Hint
Remember to escape your regex pattern if you run grep
on a shell.
In particular, ''
is a great choice to wrap your argument
since backslashes inside ''
do not get processed.
Info
In Git repositories, you can also use git grep
instead of grep -r
to grep all staged files.
Other text-processing commands
These files all have their man pages. Read the man page to see their precise usage!
wc
: count the number of lines, words and bytes in the inputcut
: truncats each line-c
: Truncate specific columns-d
and-f
: split the line by a character (only one character!) and get a specific field
sort
: sorts the inputuniq
: removes adjacent duplicate lines (sort
it before usinguniq
!)-c
: count number of duplicate lines
diff
: compare two files- You can also use
colordiff
to get deleted lines in red and added lines in green.
- You can also use
head
andtail
: take only the first/last lines of the input
Hint
Many commands accept a file as its argument to read from,
but if no file is provided, it uses stdin as the source.
That's why head file.txt
and head <file.txt
do the same thing.
Checkpoint: How to read lines 5-8 from the file /usr/share/dict/words
?
$ head -n8 </usr/share/dict/words | tail -n4
First take the first 8 lines, then filter the last 4 in the first 8. This algorithm works as long as the file is more than 8 lines long.
Info
If you want a reliable algorithm that works properly
even when the file has less than 8 lines,
have a look at the awk
/sed
commands, which are more complex.
Exit code
Every command and process exits with an "exit code", which is a (usually small) integer.
A command that terminated successfully exits with the exit code 0
.
A non-zero exit code indicates an error in the process.
Info
Shell commands and functions like cd
also have exit codes,
but this is just an emulation at the shell level.
&&
and ||
, true
and false
Consider this sentence in English:
Find a line with the word "needle" or exit.
What does this mean? This means you should exit if you can't find a line with the word "needle". That's equivalent to the following line:
$ grep needle <file || exit
In many programming languages, ||
means "or" and &&
means "and".
If the first command is false (exits with a non-zero code),
the command behind ||
is run.
Similar, if the first command is true
(exits with 0),
the command behind &&
is run.
The command true
always exits with 0,
and the command false
always exits with 1.
They are useful dummy commands to use when you want to coerce a certain exit code.
For example, some other command || true
coerces the exit code to true
no matter some other command
is successful or not.
Environment variables
Environment variables (env var) are used to pass string data to child processes. Env vars set in a process will be inherited by the child process (unless otherwise specified). This allows optionally passing certain settings at a global level.
In a shell, environment variables can be added/updated by running the export
shell-command:
export ARG_NAME="arg value"
You can set variables local to the shell (not inherited by child processes)
if you omit the export
keyword.
In this case, it is a shell variable rather than an environment variable.
In Bash, you can pass a variable as an argument to a command using the "$VAR_NAME"
syntax:
ARG_NAME="arg value"
echo "$ARG_NAME"
Note that the variable must be wrapped with ""
,
otherwise the argument is expanded directly and spaces in the variable would lead to separate arguments.
You can also temporarily set env vars for a single command
using ARG_NAME="arg value" cmd line
,
which runs the command cmd line
with the env var ARG_NAME
set to arg value
.
Try it yourself
Use the env
command to see the environment variables in your current shell.
You can grep FOO_BAR
to see that there is no env var called FOO_BAR
.
Run FOO_BAR="qux"
in the shell (do not use export
yet).
Now run env | grep FOO_BAR
again.
You can see that FOO_BAR
is still not an env var for env
.
Now run export FOO_BAR="qux"
and grep env
again.
There is a line FOO_BAR=qux
.
Now let's check the behaviour of quoted variables.
Try setting FOO="bar qux"
and run mkdir $FOO
.
You can ls
to see that two directories bar
and qux
are created.
rmdir
them and try mkdir "$FOO"
instead.
Now there is just one directory called bar qux
.
Typically, we put env vars we want to always predefine in the .bashrc file, which is a file run automatically when you start a bash login shell.
Interpolating command output
The output of a command can be embedded in another command using the $()
operator.
As with environment variables, remember to wrap the syntax with ""
to encapsulate spaces.
Try it yourself
Contrast <()
and $()
:
## This command shows your Ubuntu version
$ lsb_release -cs
focal
## Let's trying passing it to echo
$ echo Ubuntu "$(lsb_release -cs)"
Ubuntu focal
## What if we used `<()` instead?
$ echo Ubuntu <(lsb_release -cs)
Ubuntu /dev/fd/63
We are passing the output of lsb_release
to somewhere
that does not actually read its contents,
so you may get a "Broken pipe" error.