|
When you login to a Unix system, a program called a shell process is run for you.
A shell process is a command interpreter that provides you with an interface to the operating system.
A shell script is just a file of commands, normally executed by a shell process that
was spawned to run the script. The contents of the script file can just be ordinary commands as would
be entered at the command prompt, but all standard command interpreters also support a scripting
language to provide control flow and other capabilities analogous to those of high level languages.
Programming shell scripts for the Korn Shell is the subject of this discussion.
|
There is more than one shell process available on most Unix systems. The most popular ones in use are the
- Bourne shell (sh) - Unix System V (developed by Steve Bourne at Bell Laboratories; the "grandfather of all UNIX shells")
- C shell (csh) - Berkeley Unix; so known because it's commands are C like (created as an alternative to the "bare bones" provided by the Bourne shell)
- Korn shell (ksh) - extends the Bourne shell (developed by David Korn at Bell Laboratories, sites often have the 1988 version rather than the 1993 version)
- GNU Bourne-Again SHell (bash) - extends the Bourne shell and also has features from the Korn shell
- Z shell (zsh) - designed by Paul Falstad of Princeton; a superset of the Korn Shell, with added C shell features
- Enhanced C shell (tcsh) - an extension of the C shell [note: there are strong dissents regarding the C shell ("Csh Programming Considered Harmful")]
|
You can determine what shell you are running by executing the
(process status) command;
if you are using the Korn shell, the status information will be flagged by ksh. Regardless of
shell, if your system supports the Korn shell, any Korn shell script can be run from the command
line by entering
although it is probably advisable to learn the script language for the shell you want to normally use.
You will find that there are more similarities than differences among the commonly used shells.
While it's certainly not something to do casually, the
command changes the login shell for your username. The chsh command
tells you the current login shell and then prompts for the new one. The new login shell must be one of the
approved shells listed in the /etc/shells file
(if you have superuser privileges, you can of course use some shell of your own creation).
You can run any of these shells from within your login shell; e.g.,
puts you into the C shell. Usually the
command will terminate the shell you are in, regardless of flavor.
The command prompt string is typically set up to indicate which shell you are using. For the C shell the
indicator is a % symbol; for the Korn shell it is a
$ symbol. If you set your own command prompt string via a shell
startup script, it is advisable to append the shell's default prompt symbol to the end of the string as a
visual reference for which shell is active, since each shell has its own characteristics.
|
It is important to remember that the shell is a program that accepts lines of ASCII text one at a time and
interprets them, whether entered one by one from the command prompt or one by one from a script file.
More to the point, this means that each line of a script file must have an interpretation.
Shell variables serve as text repositories, which means that numeric processing requires special
handling.
The shell allows complex constructions that can be difficult to repeat accurately, which makes them
natural candidates for simple scripts. These kinds of constructions typically come up in the context of
redirection, pipes, and filters.
- Redirecting Terminal I/O
Regardless of what other files are opened, whenever a shell command runs, three standard
file streams are opened:
- "standard in" for input;
- "standard out" and
"standard error" for output.
|
[for C programming these are referenced as
stdin,
stdout, and
stderr, respectively].
Unless directed otherwise, the shell assigns these streams to the user terminal.
The shell feature known as redirection is used to assign files
other than the user terminal to a standard stream. The formats for redirection are:
| <command> < <file-name> |
[<command> is to get its standard input from |
| |
<file-name> rather than the terminal] |
| <command> > <file-name> |
[<command> is to send its standard output to |
| |
<file-name> rather than the terminal] |
| <command> >> <file-name> |
[<command> is to append its standard output to |
| |
<file-name>] |
To illustrate simple redirection, recall that the shell command
sort with no argument takes its input from standard input, and otherwise
combines and sorts the list of files presented to it. So if you just enter
on the command line, each subsequent line input from the terminal is fed to
sort until you enter an EOF
(<Ctrl> d) to signal the
end of standard input. The command then proceeds to sort your input lines and sends the result to
standard output. With redirection, you can send the standard output to a file; e.g.,
captures the sorted result in a file named mysortedfile
(created if it doesn't exist, and overwritten if it does).
- sort file1 file2 > mysortedfile
|
will combine the two files, file1 and
file2, and send the sorted result to mysortedfile.
There are three standard streams. The construction 0<
(or <) will redirect standard input,
1>
(or >) will redirect standard output,
and 2> will redirect standard error.
For example, the command
- sort
0<filein 1>mysortedfile
2>errlist
|
redirects each stream for the command. The stream number cannot be followed by a space; however, the
stream number is not needed for stream 0 (standard input) or stream 1 (standard output). Since the
sort command uses standard input only if no files are provided,
redirection of standard input is unnecessary, and the above command is equivalent to
- sort
filein > mysortedfile 2> errlist
|
While few users would bother with stream numbers 0 (standard input) and 1 (standard output) when
entering commands at the command prompt, a script derived command might well use them for an action such
as choosing between sending something to standard output or to standard error. Standard error can
be appended to standard output (or vice-versa) by using
>&; e.g., the command
- sort
filein >mysortedfile 2>&1
|
appends standard error to standard output, so all output that could come to the terminal ends up
in mysortedfile.
Two commands can be executed from a single line; e.g.,
Parentheses can be used to redirect the standard output from both of the commands; e.g.,
- Pipes
Redirection involves providing a shell command with an alternative to the user terminal for standard input, standard output,
or standard error. You can't take standard output and turn it directly into standard input for another
command without going through an intermediate file; for example, the intent of doing
"sort < who" would need to be accomplished by
To get around this need to pass data through temporary files, the shell provides a means of taking
standard output from one command and making it standard input for another. This is called a
pipe. The notation
- <command-1> | <command-2>
|
is used to denote that standard output from <command-1> is to be
piped to <command-2> as standard input.
For the above example, you would simply enter
or more practically, perhaps
(standard output from who is piped to
sort as standard input from which standard output is piped to
more as standard input from which standard output is the terminal).
You can of course do things such as
which would allow you to examine a sorted version of the file without creating a permanent, sorted version
of your data.
If for some reason you want to capture intermediate information flowing through a pipe,
there is a utility provided for this purpose (it is not a shell command). In particular,
- who |
sort | tee whosorted |
more
|
does the same thing as the earlier construction, except the sorted output is
"teed" into whosorted as well as to
standard output. tee is not a shell command since it is only
used on the receiving end of a pipe.
- Filters
sort is an example of a type of shell command called a
filter. A filter is a command that takes data from a file
and performs some simple transformation on it, the result of which is sent on to some other file.
Examples:
- sort - sort files
- grep (and its derivatives) - search for keyword information in the file
- head - output lines from the front end of the file
- tail - output lines from the tail end of the file
- wc - count words, lines, and/or characters in the file
- crypt - encrypt the file
(use with caution!)
| |
We've already looked at sort.
grep (global regular expression - print) is a shell command
that matches patterns as represented by limited regular expressions against the input character stream.
The related command, egrep, allows for the full range of regular
expressions (regular expressions are covered in detail in the study of compilers).
grep is limited to the same regular expressions as the basic
Unix editor, ed. The story (as told by Kernighan and Pike) is that
grep was actually created in an evening by doing a little surgery
on ed!
The most fundamental use for grep is
to locate occurrences of a single word; e.g.,
- grep -n 'symbval' pass2.c
|
gives the line numbers and prints the lines containing the specific symbol
"symbval".
Metacharacters are used for the mechanisms employed by regular expressions to represent complex
patterns; e.g.
- ^ for the beginning of a line; e.g.,
'^t' = lines beginning with "t".
- $ for the end of a line; e.g.,
't$' = lines ending with "t".
- . matches any single character;
'^.t' = lines with 1st character anything and 2nd character "t".
- * goes with the preceding character to
represent 0 or more repetitions of the character.
- + is like *,
but is for 1 or more repetitions (egrep only).
- \ turns off any special meaning for the character
that follows it.
| |
Construction rules for regular expressions provide means for using simpler regular expressions to define
more complex regular expressions; e.g.,
- [...] match is a regular expression: match is to any character listed; allows ranges such
as a-x.
- [^...] not match is a regular expression: match is to any character not listed;
also allows ranges.
- <r1><r2> juxtaposition of regular expression <r1>
and regular expression <r2> is a regular expression
- <r1>|<r2> or of
regular expression <r1> with
regular expression <r2> is a regular expression (egrep only).
- (<r>) a nested regular expression is a regular expression
(egrep only).
| |
You could almost go to school on grep. For example, here's a typical
entry from the system password file /etc/passwd
- imauser:x:121:101:Ima User:/home/imauser:/bin/ksh
|
The following shell command searches this file for users without passwords:
- grep -n '^[^:]*::' /etc/passwd
|
The pattern ^[^:]*:: is a juxtaposition of regular
expressions and is interpreted as follows:
- The pattern match requires a beginning (^) comprised
of characters not the ":" character
([^:]), 0 or more of these
(*), followed by two ":"
characters in succession (::).
| |
In other words, to match this pattern, the entry can have any number of characters prior to the
first ":", after which there is immediately a second
":". When a password is present, it is encrypted between these two separators.
Remark: it is highly unlikely that system administration will allow accounts without passwords;
note that while the password file is completely accessible (it has to be), encryption protects the passwords.
If two users should be using the same password, the encryption routine will encrypt them differently.
Since there are decryption techniques capable of breaking passwords if given enough time,
the best advice is to change your password regularly.
The filter construction
lists the first 15 lines of myfile and
lists the last 15 lines.
If no number is specified, the default is 10.
The filter construction
counts lines, words, and characters in the file.
If any one of l, w,
or c is omitted, that count is not provided.
The filter construction
- crypt <password>
< myfile > cryptedmyfile
|
inputs myfile and encrypts it using the
"<password>" supplied, storing the encrypted
file in cryptedmyfile.
- crypt <password>
< cryptedmyfile
|
reverses the encryption.
In either case, if no password is given (i.e., you don't want it visible), the user is prompted to enter one.
There are further generalizations of the filter grep,
most notably awk, that are considered to be
"programmable filters", because the transformation is constructed via a
program in a simple language. awk is named after its authors,
Aho, Kernighan, and Weinberger, all of Bell labs.
One of the programmable filters commonly used in shell scripts is
sed, the streaming version of the basic Unix text editor,
ed. Editor commands that do not process multiple lines or look
backward will generally work with sed.
sed simply makes a sequential pass through the input stream.
The editor command script passed to sed selectively processes input
lines (selection is by position or by pattern match). After a selected line is transformed it is passed
to standard output. By default, lines not selected are also passed on to standard output.
For example,
- sed
'/./s/^/<tab>/' <file-name>
|
modifies lines from the file by indenting non-empty lines with a tab (the results being
sent to standard output). Its command script
('/./s/^/<tab>/')
uses a pattern match to select lines.
An explanation for how the script works is as follows:
- the initial "/./" is a regular expression for any
character, so if the line has none, it is not selected; otherwise,
ed's search and replace
(s) is applied,
replacing the null front of the line given by the regular expression
/^/ with a tab.
| |
If sed is invoked with the
-n option,
only the selected lines are sent to standard output (after having been transformed). For the
example above, the effect would be to purge the empty lines while indenting the rest.
In contrast, the command script for
- sed 3q
<file-name> (or equivalently
sed '3q' <file-name>)
|
selects only line 3. The effect, however, is to send the 1st 3 lines of the file to standard out because
sed must pass through lines 1 and 2 to select line 3. When
line 3 is processed, the quit command for sed is issued ending
the output with line 3.
Note: There are other useful shell commands for processing files, but which are not classified as filters.
The shell command find is one of these; for example,
searches the directory tree down from the current level (.)
looking for file names that end in ".doc".
WARNING: Unfortunately, the use of
* in this context is quite different from
* as used with grep.
- Shell Programming
The shell is actually much more that just a user interface, since it incorporates a programming language.
The programming language for the shell is complex enough to handle many kinds of things that might
otherwise be done in a language such as C, but in contrast is interpretive rather than compiled.
The basic virtues are that
- shell programs provide the means for encapsulating groups of shell commands that need to be executed;
- systems administration has many functions best described using the decision logic of a programming
language (moreover, interpretive code will port (since it's just text), whereas compiled code will not);
- it is often easier to accomplish a systems function in the shell using the shell programming
capability (after all, it is part of the operating system) than it is to write a high-level
language program for the same purpose.
| |
At its simplest, a shell script (a program written in the shell programming language) is just one or more
shell commands as you would enter them at the Unix prompt. For example, if you build a file named
users whose contents consist of
then executing
will execute the shell script, running the command line as if it had come from the keyboard.
In this sense, it is like the keystroke "macros" common to many application products.
For the Korn shell, if the first line of users is the comment line
and execute permission for users has been enabled via the shell command
chmod
then the script can be executed just by entering
at the command prompt. This will work for other shells so long as the first line identifies the shell to use.
Shell scripts are often used to capture in simpler syntax the result of something hard to remember.
For this circumstance, it is useful to be able to pass in arguments to the shell script. These are
referenced inside the script as the positional parameters
$1, $2,
$3, . . . .
For example, we might want to capture the sed procedure for
indenting a file in this manner; i.e., the shell script indent
might simply consist of
#!/usr/bin/ksh
sed '/./s/^/<tab>/'
$1 |
As with any programming language, the power of a shell programming language lies in the ability
to use decision logic for defining a shell script's behavior. The constructions used for this
purpose are analogous to those of standard languages such as C. They include "if-then-else",
"case", and the loop controls "for", "while" and "until". The control
structures is one area where there are significant differences among shells, although there is
significant similarity. Our focus is on the Korn Shell, which is an extension of the Bourne Shell
and which provides capabilities that go beyond the C Shell as well.
- if-then, if-then-else, if-then-elif-then-else Constructions
The if-then-elif-then-else statement for the Korn Shell has the basic syntax:
if <condition>
then
<if-part>
elif <condition>
then
<elif-part>
else
<else-part>
fi |
The elif part can be omitted if a strictly if-then-else
structure is needed. The else part can also be
omitted to obtain a strictly if-then construction.
Example of an if-then-elif-then-else construction:
echo "Enter: \c"; read name
if [[ $name = "ifcase" ]]
then echo "in if case"
elif [[ $name = "elifcase" ]]
then echo "in elif case"
else
echo "in else case"
fi |
The "[[" is a Korn Shell command that initiates a condition
test (which is concluded by "]]"). Note that commands have to be
separated from their arguments by one or more spaces, so a space must follow
"[[" and precede "]]". In the comparison
$name = "ifcase", the quote marks are not actually needed for this
particular example, but are included because they are needed in more complex cases that are
text comparisons. Spaces are needed around "=" since
its interpretation otherwise would not fit this context. The comparison operators
"<" and ">" have
the expected meanings for text. The "not" operator is given by
"!". Less than or equal is handled by "not >" and
greater than or equal by "not <"; e.g.,
is a less than or equal condition check. Compound condition checks
can be created by using parentheses with AND/OR combinations
where "&&" is AND and "||" is OR;
e.g.,
- [[ (! $name = "x") &&
($name > "s") ]]
|
forms an "AND" test of two conditions. There are other condition tests (numeric, files),
which will be employed later, but not covered in detail.
- case Construction
The case statement for the Korn Shell has the basic syntax:
case <test-value> in
<case1-val>)
<commands>
;;
<case2-val>)
<commands>
;;
. . .
*) ««- case that catches anything else
<commands>
;;
esac |
Example of a case construction:
echo "Enter: \c"; read name
case "$name" in
Bill)
echo "case Bill"
;;
Alice)
echo "case Alice
;;
*)
echo "no case for this one"
;;
esac |
The identifier for each case must be on its own line and each case terminates when a
double ";" is encountered.
- Loop Constructions
- for loop
The for loop for the Korn Shell has the basic syntax:
for <item>
in <list>
do
<loop-body>
done |
Example of a for loop construction:
list="item1 item2 item3"
for name in $list
do
echo "item: $name"
done |
If the in clause is omitted, the list used is the parameter
list passed in to the script.
- while loop
The while loop for the Korn Shell has the basic syntax:
while <condition>
do
<loop-body>
done |
Example of a for loop construction:
ans="init"
while [[ !
$ans = "quit" ]]
do
echo "current: $ans"
echo "enter new: \c"
read ans
done |
The "\c" on the prompt line causes
echo to suppress the newline that it would otherwise issue on
its return.
The Korn Shell also supports until loops, which differ from
while loops only in that the condition test is not performed until
the loop-body has been processed once.
The only other notable decision structure for the Korn Shell is an extension of the case
structure called the select structure, which we won't cover here.
So long as you remember what constitutes shell commands, you can put structures (other than
case) on a single line by using
";" to terminate command lines, for example,
- if <condition>; then
<command>; fi
|
is the same as the script
- if <condition>
- then
- <command>
- fi
|
except that it is all on one line.
- Example: Developing a Useful Script
Suppose that you have a machine named SIC/XE for which you have a simulator program named
sicsim and a cross-assembler named sicasm.
The process of assembling a program for the SIC/XE machine and running on the simulator is
tedious, so you want a shell scripts to automate this 2-step process by
- assembling a SIC/XE program
- passing the object file into the SIC simulator as its DEVF1 input device.
|
As a first attempt you might have:
#!/usr/bin/ksh
echo ">>>executing sicasm"
sicasm $1
echo ">>>copying $1.obj to DEVF1"
cp $1.obj DEVF1
echo ">>>launching the SIC simulator"
echo
sicsim |
This has the evident weakness of not checking to see if the user has entered a parameter.
The Shell variable $# counts the number of positional parameters,
so if can be checked to see if it is non-zero. A 2nd version of the script to take advantage of this
might be:
#!/usr/bin/ksh
if [[ $# = 0 ]]
then echo "Usage: sicasmrun <src>"
exit 1
fi
echo ">>>executing sicasm"
sicasm $1
echo ">>>copying $1.obj to DEVF1"
cp $1.obj DEVF1
echo ">>>launching the SIC simulator"
echo
sicsim |
The next point of advantage would be to determine if the file supplied actually existed.
The Bourne Shell has a command "test" that can check conditions
regarding files. Under the Korn shell, "test" can be replaced
by the command structure
[[ <condition> ]]
that was used earlier. The criteria
-a <file-name> is true if a file exists.
A 3rd version of the script which uses this capability is given by:
#!/usr/bin/ksh
if [[ $# = 0 ]]
then echo "Usage: sicasmrun <src>"
exit 1
fi
if [[ ! -a $1 ]]
then echo "can't find file $1"
exit 1
else
echo ">>>executing sicasm"
sicasm $1
echo ">>>copying $1.obj to DEVF1"
cp $1.obj DEVF1
echo ">>>launching the SIC simulator"
echo
sicsim
fi |
The whitespace around the elements used in the if test should be noted.
What if the assembly fails? It would be desirable for the script to test for this and halt if it
should happen. As might be expected, if the assembly fails, sicasm
returns a non-zero exit code.
The shell variable $? has the exit code for the most recently
completed command.
A 4th enhancement to the script illustrates how to take advantage of this information:
#!/usr/bin/ksh
if [[ $# = 0 ]]
then echo "Usage: sicasmrun <src>"
exit 1
fi
if [[ ! -a $1 ]]
then echo "can't find file $1"
exit 1
else
echo ">>>executing sicasm"
sicasm $1
if [[ ! $? = 0 ]]
then echo "*** Assembly failed ***"
exit 1
fi
echo ">>>copying $1.obj to DEVF1"
cp $1.obj DEVF1
echo ">>>launching the SIC simulator"
echo
sicsim
fi |
A 5th (and final) improvement is to accommodate a syntax such as
- sicasmrun prog.sic data1.sic data2.sic
|
where "sicasmrun" is the name of the shell script,
prog.sic is the assembler source file,
and the remainder are data input files. This information is sufficient to bypass the commands
used to configure the simulator, so a technique is also illustrated for sending the first 3 commands
for the simulator in via the shell script, and then letting the user take over.
In this case, the added feature is to sequentially copy any data files given on the command line to other
file names (DEVF2, DEVF3, ...) recognized by the simulator.
#!/usr/bin/ksh
if [[ $# = 0 ]]
then echo "Usage: sicasmrun <src> [<data>]"
exit 1
fi
if [[ ! -a $1 ]]
then echo "can't find file $1"
exit 1
else
echo ">>>executing sicasm"
sicasm $1
if [[ ! $? = 0 ]]
then echo "*** Assembly failed ***"
exit 1
fi
echo ">>>copying $1.obj to DEVF1"
cp $1.obj DEVF1
cnt=2
while [[ ! $# = 1 ]]
# shift moves the argv pointer to the
# next parameter (can't be "unshifted")
do shift
if [[ ! -a $1 ]]
then echo "*** can't find file $1 \
to copy to DEVF$cnt ***"
else
echo ">>>copying $1 to DEVF$cnt"
cp $1 DEV$cnt
fi
let cnt=cnt+1
done
echo ">>>launching the SIC simulator"
echo ">>>enter <Crtl>d to quit"
echo
(echo a; echo s; echo "h \
9999"; cat 2>/dev/null) | sicsim
fi |
Commands and constructions that haven't come up before are part of the additions to
the script. First the shell command
shift has the effect of changing the
address used to access the argv array, in effect repointing
$1 to the next parameter each time invoked. Note that
$# correspondingly decrements by 1 with each shift.
shift has to be used with care, since there is no "unshift".
It provides a way to access the parameter list in a variable fashion, and is the means for accessing
more than 9 positional parameters since $10 expands as
$1 concatenated with 0.
Second, the shell command let provides a means for
performing arithmetic on shell variables which have values that can be interpreted as integer. An
equivalent arithmetic construction to the let statement used above,
employs a more versatile arithmetic form for computation, one for which the result
[$((count+1)] could equally be used for a purpose
other than variable assignment (it could be echoed, for example).
Finally, the construction
- (echo a;
echo s; echo "h 9999";
cat 2>/dev/null) | sicsim
|
takes advantage of the fact that when cat has no argument
it draws from standard input. The effect is that everything entered is sent immediately to standard
output. Recall from the discussion of redirection that when two commands are enclosed in parentheses
their standard output is combined, so what happens is that the output of the 3
echo commands is combined with that of
cat and the result piped on to
sicsim as standard input. The effect
is to drive the sicsim interface from
cat so that it continues to behave as it would normally, except
that echo issues the first 3 commands for
sicsim rather than the user. The other difference is that
cat will close the pipe on receiving <Ctrl>d, which
will kill sicsim as well since its standard input will
no longer be valid. If sicsim terminates on its own
(presumably under user control), the other end of the pipe will be broken and anything else from
cat will find a non-existent target, causing an error and command
termination. For this reason, the standard error for cat
has been redirected so that if this situation does occur, it doesn't cause heartburn. For all
practical purposes, the specified file /dev/null is
equivalent to oblivion and is what is typically used for this kind of action.
The only other way to accomplish this task would have been to modify the program for
sicsim, which would have compromised its conceptual
integrity. This illustration should help explain why Unix users employ shell scripts
for accomplishing much their work.
- Script Debugging
A script with multiple lines brings with it the same kinds of debugging issues that occur in
programming languages. Under the Korn Shell, the set command
is used to turn on (or off) the command trace feature used for script debugging. The feature is
enabled from within the script, by simply adding the line
- set -x
to start tracing. While trace is in effect, every command encountered is echoed to standard error
before it is executed. Output from tracing is identified by being preceded by the value of the
keyword variable "PS4" (which is
"+ " by default). Trace is turned off when the script ends or
when
- set -
is executed.
It is generally useful to establish PS4 in the profile script
(.profile) or via the environment variable
(ENV). For example,
- PS4='+ On line $LINENO: '
- export PS4
will provide the line numbers for each traced command.
|