Unix Shell Programming

When you login to a Unix system, a program called a shell process is run for you. A shell process is a command interpreter that provides you with an interface to the operating system. A shell script is just a file of commands, normally executed by a shell process that was spawned to run the script. The contents of the script file can just be ordinary commands as would be entered at the command prompt, but all standard command interpreters also support a scripting language to provide control flow and other capabilities analogous to those of high level languages. Programming shell scripts for the Korn Shell is the subject of this discussion.

There is more than one shell process available on most Unix systems. The most popular ones in use are the

  • Bourne shell (sh) - Unix System V (developed by Steve Bourne at Bell Laboratories; the "grandfather of all UNIX shells")
  • C shell (csh) - Berkeley Unix; so known because it's commands are C like (created as an alternative to the "bare bones" provided by the Bourne shell)
  • Korn shell (ksh) - extends the Bourne shell (developed by David Korn at Bell Laboratories, sites often have the 1988 version rather than the 1993 version)
  • GNU Bourne-Again SHell (bash) - extends the Bourne shell and also has features from the Korn shell
  • Z shell (zsh) - designed by Paul Falstad of Princeton; a superset of the Korn Shell, with added C shell features
  • Enhanced C shell (tcsh) - an extension of the C shell [note: there are strong dissents regarding the C shell ("Csh Programming Considered Harmful")]
You can determine what shell you are running by executing the

ps
(process status) command; if you are using the Korn shell, the status information will be flagged by ksh. Regardless of shell, if your system supports the Korn shell, any Korn shell script can be run from the command line by entering

ksh <script-name>
although it is probably advisable to learn the script language for the shell you want to normally use. You will find that there are more similarities than differences among the commonly used shells.

While it's certainly not something to do casually, the

chsh
command changes the login shell for your username. The chsh command tells you the current login shell and then prompts for the new one. The new login shell must be one of the approved shells listed in the /etc/shells file (if you have superuser privileges, you can of course use some shell of your own creation).

You can run any of these shells from within your login shell; e.g.,

csh
puts you into the C shell. Usually the

exit
command will terminate the shell you are in, regardless of flavor.

The command prompt string is typically set up to indicate which shell you are using. For the C shell the indicator is a % symbol; for the Korn shell it is a $ symbol. If you set your own command prompt string via a shell startup script, it is advisable to append the shell's default prompt symbol to the end of the string as a visual reference for which shell is active, since each shell has its own characteristics.

It is important to remember that the shell is a program that accepts lines of ASCII text one at a time and interprets them, whether entered one by one from the command prompt or one by one from a script file. More to the point, this means that each line of a script file must have an interpretation. Shell variables serve as text repositories, which means that numeric processing requires special handling.

The shell allows complex constructions that can be difficult to repeat accurately, which makes them natural candidates for simple scripts. These kinds of constructions typically come up in the context of redirection, pipes, and filters.

  1. Redirecting Terminal I/O

    Regardless of what other files are opened, whenever a shell command runs, three standard file streams are opened:
    • "standard in" for input;
    • "standard out" and "standard error" for output.
    [for C programming these are referenced as stdin, stdout, and stderr, respectively]. Unless directed otherwise, the shell assigns these streams to the user terminal.

    The shell feature known as redirection is used to assign files other than the user terminal to a standard stream. The formats for redirection are:
    <command> <file-name> [<command> is to get its standard input from
      <file-name> rather than the terminal]
    <command> <file-name> [<command> is to send its standard output to
      <file-name> rather than the terminal]
    <command> >> <file-name> [<command> is to append its standard output to
      <file-name>]

    To illustrate simple redirection, recall that the shell command sort with no argument takes its input from standard input, and otherwise combines and sorts the list of files presented to it. So if you just enter
    sort
    on the command line, each subsequent line input from the terminal is fed to sort until you enter an EOF (<Ctrl> d) to signal the end of standard input. The command then proceeds to sort your input lines and sends the result to standard output. With redirection, you can send the standard output to a file; e.g.,
    sort > mysortedfile
    captures the sorted result in a file named mysortedfile (created if it doesn't exist, and overwritten if it does).
    sort file1 file2 > mysortedfile
    will combine the two files, file1 and file2, and send the sorted result to mysortedfile.

    There are three standard streams. The construction 0< (or <) will redirect standard input, 1> (or >) will redirect standard output, and 2> will redirect standard error. For example, the command
    sort 0<filein 1>mysortedfile 2>errlist
    redirects each stream for the command. The stream number cannot be followed by a space; however, the stream number is not needed for stream 0 (standard input) or stream 1 (standard output). Since the sort command uses standard input only if no files are provided, redirection of standard input is unnecessary, and the above command is equivalent to
    sort filein > mysortedfile 2> errlist
    While few users would bother with stream numbers 0 (standard input) and 1 (standard output) when entering commands at the command prompt, a script derived command might well use them for an action such as choosing between sending something to standard output or to standard error. Standard error can be appended to standard output (or vice-versa) by using >&; e.g., the command
    sort filein >mysortedfile 2>&1
    appends standard error to standard output, so all output that could come to the terminal ends up in mysortedfile.

    Two commands can be executed from a single line; e.g.,
    date; cal
    Parentheses can be used to redirect the standard output from both of the commands; e.g.,
    (date; cal) > myfile

  2. Pipes

    Redirection involves providing a shell command with an alternative to the user terminal for standard input, standard output, or standard error. You can't take standard output and turn it directly into standard input for another command without going through an intermediate file; for example, the intent of doing "sort < who" would need to be accomplished by
    who > temp
    sort < temp

    To get around this need to pass data through temporary files, the shell provides a means of taking standard output from one command and making it standard input for another. This is called a pipe. The notation
    <command-1> | <command-2>
    is used to denote that standard output from <command-1> is to be piped to <command-2> as standard input.

    For the above example, you would simply enter
    who | sort
    or more practically, perhaps
    who | sort | more
    (standard output from who is piped to sort as standard input from which standard output is piped to more as standard input from which standard output is the terminal).

    You can of course do things such as
    sort myfile | more
    which would allow you to examine a sorted version of the file without creating a permanent, sorted version of your data.

    If for some reason you want to capture intermediate information flowing through a pipe, there is a utility provided for this purpose (it is not a shell command). In particular,
    who | sort | tee whosorted | more
    does the same thing as the earlier construction, except the sorted output is "teed" into whosorted as well as to standard output. tee is not a shell command since it is only used on the receiving end of a pipe.

  3. Filters

    sort is an example of a type of shell command called a filter. A filter is a command that takes data from a file and performs some simple transformation on it, the result of which is sent on to some other file.

    Examples:
    • sort - sort files
    • grep (and its derivatives) - search for keyword information in the file
    • head - output lines from the front end of the file
    • tail - output lines from the tail end of the file
    • wc - count words, lines, and/or characters in the file
    • crypt - encrypt the file (use with caution!)
     

    We've already looked at sort.

    grep (global regular expression - print) is a shell command that matches patterns as represented by limited regular expressions against the input character stream. The related command, egrep, allows for the full range of regular expressions (regular expressions are covered in detail in the study of compilers). grep is limited to the same regular expressions as the basic Unix editor, ed. The story (as told by Kernighan and Pike) is that grep was actually created in an evening by doing a little surgery on ed!

    The most fundamental use for grep is to locate occurrences of a single word; e.g.,
    grep -n 'symbval' pass2.c
    gives the line numbers and prints the lines containing the specific symbol "symbval".

    Metacharacters are used for the mechanisms employed by regular expressions to represent complex patterns; e.g.
    ^ for the beginning of a line; e.g., '^t' = lines beginning with "t".
    $ for the end of a line; e.g., 't$' = lines ending with "t".
    . matches any single character; '^.t' = lines with 1st character anything and 2nd character "t".
    * goes with the preceding character to represent 0 or more repetitions of the character.
    + is like *, but is for 1 or more repetitions (egrep only).
    \ turns off any special meaning for the character that follows it.
     

    Construction rules for regular expressions provide means for using simpler regular expressions to define more complex regular expressions; e.g.,
    [...] match is a regular expression: match is to any character listed; allows ranges such as a-x.
    [^...] not match is a regular expression: match is to any character not listed; also allows ranges.
    <r1><r2> juxtaposition of regular expression <r1> and regular expression <r2> is a regular expression
    <r1>|<r2> or of regular expression <r1> with regular expression <r2> is a regular expression (egrep only).
    (<r>) a nested regular expression is a regular expression (egrep only).
     

    You could almost go to school on grep. For example, here's a typical entry from the system password file /etc/passwd
    imauser:x:121:101:Ima User:/home/imauser:/bin/ksh
    The following shell command searches this file for users without passwords:
    grep -n '^[^:]*::' /etc/passwd
    The pattern ^[^:]*:: is a juxtaposition of regular expressions and is interpreted as follows:
    The pattern match requires a beginning (^) comprised of characters not the ":" character ([^:]), 0 or more of these (*), followed by two ":" characters in succession (::).
     
    In other words, to match this pattern, the entry can have any number of characters prior to the first ":", after which there is immediately a second ":". When a password is present, it is encrypted between these two separators.

    Remark: it is highly unlikely that system administration will allow accounts without passwords; note that while the password file is completely accessible (it has to be), encryption protects the passwords. If two users should be using the same password, the encryption routine will encrypt them differently. Since there are decryption techniques capable of breaking passwords if given enough time, the best advice is to change your password regularly.

    The filter construction
    head -15 myfile
    lists the first 15 lines of myfile and
    tail -15 myfile
    lists the last 15 lines.

    If no number is specified, the default is 10.

    The filter construction
    wc -lwc myfile
    counts lines, words, and characters in the file.

    If any one of l, w, or c is omitted, that count is not provided.

    The filter construction
    crypt <password> < myfile > cryptedmyfile
    inputs myfile and encrypts it using the "<password>" supplied, storing the encrypted file in cryptedmyfile.
    crypt <password> < cryptedmyfile
    reverses the encryption.

    In either case, if no password is given (i.e., you don't want it visible), the user is prompted to enter one.

    There are further generalizations of the filter grep, most notably awk, that are considered to be "programmable filters", because the transformation is constructed via a program in a simple language. awk is named after its authors, Aho, Kernighan, and Weinberger, all of Bell labs.

    One of the programmable filters commonly used in shell scripts is sed, the streaming version of the basic Unix text editor, ed. Editor commands that do not process multiple lines or look backward will generally work with sed. sed simply makes a sequential pass through the input stream. The editor command script passed to sed selectively processes input lines (selection is by position or by pattern match). After a selected line is transformed it is passed to standard output. By default, lines not selected are also passed on to standard output.

    For example,
    sed '/./s/^/<tab>/' <file-name>
    modifies lines from the file by indenting non-empty lines with a tab (the results being sent to standard output). Its command script ('/./s/^/<tab>/') uses a pattern match to select lines.

    An explanation for how the script works is as follows:
    the initial "/./" is a regular expression for any character, so if the line has none, it is not selected; otherwise, ed's search and replace (s) is applied, replacing the null front of the line given by the regular expression /^/ with a tab.
     

    If sed is invoked with the -n option, only the selected lines are sent to standard output (after having been transformed). For the example above, the effect would be to purge the empty lines while indenting the rest.

    In contrast, the command script for
    sed 3q <file-name> (or equivalently sed '3q' <file-name>)
    selects only line 3. The effect, however, is to send the 1st 3 lines of the file to standard out because sed must pass through lines 1 and 2 to select line 3. When line 3 is processed, the quit command for sed is issued ending the output with line 3.

    Note: There are other useful shell commands for processing files, but which are not classified as filters. The shell command find is one of these; for example,
    find . -name "*.doc"
    searches the directory tree down from the current level (.) looking for file names that end in ".doc".

    WARNING: Unfortunately, the use of * in this context is quite different from * as used with grep.

  4. Shell Programming

    The shell is actually much more that just a user interface, since it incorporates a programming language. The programming language for the shell is complex enough to handle many kinds of things that might otherwise be done in a language such as C, but in contrast is interpretive rather than compiled. The basic virtues are that
    • shell programs provide the means for encapsulating groups of shell commands that need to be executed;
    • systems administration has many functions best described using the decision logic of a programming language (moreover, interpretive code will port (since it's just text), whereas compiled code will not);
    • it is often easier to accomplish a systems function in the shell using the shell programming capability (after all, it is part of the operating system) than it is to write a high-level language program for the same purpose.
     

    At its simplest, a shell script (a program written in the shell programming language) is just one or more shell commands as you would enter them at the Unix prompt. For example, if you build a file named users whose contents consist of
    who | sort | more
    then executing
    sh users
    will execute the shell script, running the command line as if it had come from the keyboard. In this sense, it is like the keystroke "macros" common to many application products.

    For the Korn shell, if the first line of users is the comment line
    #!/usr/bin/ksh
    and execute permission for users has been enabled via the shell command chmod then the script can be executed just by entering
    users
    at the command prompt. This will work for other shells so long as the first line identifies the shell to use.

    Shell scripts are often used to capture in simpler syntax the result of something hard to remember. For this circumstance, it is useful to be able to pass in arguments to the shell script. These are referenced inside the script as the positional parameters $1, $2, $3, . . . . For example, we might want to capture the sed procedure for indenting a file in this manner; i.e., the shell script indent might simply consist of
    #!/usr/bin/ksh
    sed '/./s/^/
    <tab>/' $1

    As with any programming language, the power of a shell programming language lies in the ability to use decision logic for defining a shell script's behavior. The constructions used for this purpose are analogous to those of standard languages such as C. They include "if-then-else", "case", and the loop controls "for", "while" and "until". The control structures is one area where there are significant differences among shells, although there is significant similarity. Our focus is on the Korn Shell, which is an extension of the Bourne Shell and which provides capabilities that go beyond the C Shell as well.

  5. if-then, if-then-else, if-then-elif-then-else Constructions

    The if-then-elif-then-else statement for the Korn Shell has the basic syntax:
    if <condition>
      then
      <if-part>
    elif <condition>
      then
      <elif-part>
    else
      <else-part>
    fi
    The elif part can be omitted if a strictly if-then-else structure is needed. The else part can also be omitted to obtain a strictly if-then construction.

    Example of an if-then-elif-then-else construction:
    echo "Enter: \c"; read name
    if [[ $name = "ifcase" ]]
      then echo "in if case"
    elif [[ $name = "elifcase" ]]
      then echo "in elif case"
    else
      echo "in else case"
    fi

    The "[[" is a Korn Shell command that initiates a condition test (which is concluded by "]]"). Note that commands have to be separated from their arguments by one or more spaces, so a space must follow "[[" and precede "]]". In the comparison $name = "ifcase", the quote marks are not actually needed for this particular example, but are included because they are needed in more complex cases that are text comparisons. Spaces are needed around "=" since its interpretation otherwise would not fit this context. The comparison operators "<" and ">" have the expected meanings for text. The "not" operator is given by "!". Less than or equal is handled by "not >" and greater than or equal by "not <"; e.g.,
    [[ ! $name > "ifcase" ]]
    is a less than or equal condition check. Compound condition checks can be created by using parentheses with AND/OR combinations where "&&" is AND and "||" is OR; e.g.,
    [[ (! $name = "x") && ($name > "s") ]]
    forms an "AND" test of two conditions. There are other condition tests (numeric, files), which will be employed later, but not covered in detail.

  6. case Construction

    The case statement for the Korn Shell has the basic syntax:
    case <test-value> in
      <case1-val>)
        <commands>
        ;;
      <case2-val>)
        <commands>
        ;;
      . . .
      *)      ««- case that catches anything else
        <commands>
        ;;
    esac

    Example of a case construction:
    echo "Enter: \c"; read name
    case "$name" in
      Bill)
        echo "case Bill"
        ;;
      Alice)
        echo "case Alice
        ;;
      *)
        echo "no case for this one"
        ;;
    esac

    The identifier for each case must be on its own line and each case terminates when a double ";" is encountered.

  7. Loop Constructions

    • for loop

      The for loop for the Korn Shell has the basic syntax:
      for <item> in <list>
        do
          <loop-body>
        done

      Example of a for loop construction:
      list="item1 item2 item3"
      for name in $list
        do
          echo "item: $name"
        done

      If the in clause is omitted, the list used is the parameter list passed in to the script.

    • while loop

      The while loop for the Korn Shell has the basic syntax:
      while <condition>
        do
          <loop-body>
        done

      Example of a for loop construction:
      ans="init"
      while [[ ! $ans = "quit" ]]
        do
          echo "current: $ans"
          echo "enter new: \c"
          read ans
        done

      The "\c" on the prompt line causes echo to suppress the newline that it would otherwise issue on its return.

      The Korn Shell also supports until loops, which differ from while loops only in that the condition test is not performed until the loop-body has been processed once.

    The only other notable decision structure for the Korn Shell is an extension of the case structure called the select structure, which we won't cover here.

    So long as you remember what constitutes shell commands, you can put structures (other than case) on a single line by using ";" to terminate command lines, for example,
    if <condition>; then <command>; fi
    is the same as the script
    if <condition>
      then
        <command>
    fi
    except that it is all on one line.

  8. Example: Developing a Useful Script

    Suppose that you have a machine named SIC/XE for which you have a simulator program named sicsim and a cross-assembler named sicasm. The process of assembling a program for the SIC/XE machine and running on the simulator is tedious, so you want a shell scripts to automate this 2-step process by
    1. assembling a SIC/XE program
    2. passing the object file into the SIC simulator as its DEVF1 input device.

    As a first attempt you might have:

    #!/usr/bin/ksh

    echo ">>>executing sicasm"
    sicasm $1
    echo ">>>copying $1.obj to DEVF1"
    cp $1.obj DEVF1
    echo ">>>launching the SIC simulator"
    echo
    sicsim

    This has the evident weakness of not checking to see if the user has entered a parameter. The Shell variable $# counts the number of positional parameters, so if can be checked to see if it is non-zero. A 2nd version of the script to take advantage of this might be:

    #!/usr/bin/ksh

    if [[ $# = 0 ]]
      then echo "Usage: sicasmrun <src>"
      exit 1
    fi

    echo ">>>executing sicasm"
    sicasm $1
    echo ">>>copying $1.obj to DEVF1"
    cp $1.obj DEVF1
    echo ">>>launching the SIC simulator"
    echo
    sicsim

    The next point of advantage would be to determine if the file supplied actually existed. The Bourne Shell has a command "test" that can check conditions regarding files. Under the Korn shell, "test" can be replaced by the command structure [[ <condition> ]] that was used earlier. The criteria -a <file-name> is true if a file exists.

    A 3rd version of the script which uses this capability is given by:

    #!/usr/bin/ksh

    if [[ $# = 0 ]]
      then echo "Usage: sicasmrun <src>"
      exit 1
    fi

    if [[ ! -a $1 ]]
      then echo "can't find file $1"
      exit 1
    else

      echo ">>>executing sicasm"
      sicasm $1
      echo ">>>copying $1.obj to DEVF1"
      cp $1.obj DEVF1
      echo ">>>launching the SIC simulator"
      echo
      sicsim

    fi

    The whitespace around the elements used in the if test should be noted.

    What if the assembly fails? It would be desirable for the script to test for this and halt if it should happen. As might be expected, if the assembly fails, sicasm returns a non-zero exit code. The shell variable $? has the exit code for the most recently completed command.

    A 4th enhancement to the script illustrates how to take advantage of this information:

    #!/usr/bin/ksh

    if [[ $# = 0 ]]
      then echo "Usage: sicasmrun <src>"
      exit 1
    fi
    if [[ ! -a $1 ]]
      then echo "can't find file $1"
      exit 1
    else
      echo ">>>executing sicasm"
      sicasm $1

      if [[ ! $? = 0 ]]
        then echo "*** Assembly failed ***"
        exit 1
      fi

      echo ">>>copying $1.obj to DEVF1"
      cp $1.obj DEVF1
      echo ">>>launching the SIC simulator"
      echo
      sicsim
    fi

    A 5th (and final) improvement is to accommodate a syntax such as
    sicasmrun prog.sic data1.sic data2.sic
    where "sicasmrun" is the name of the shell script, prog.sic is the assembler source file, and the remainder are data input files. This information is sufficient to bypass the commands used to configure the simulator, so a technique is also illustrated for sending the first 3 commands for the simulator in via the shell script, and then letting the user take over.

    In this case, the added feature is to sequentially copy any data files given on the command line to other file names (DEVF2, DEVF3, ...) recognized by the simulator.

    #!/usr/bin/ksh

    if [[ $# = 0 ]]
      then echo "Usage: sicasmrun <src>
    [<data>]"
      exit 1
    fi
    if [[ ! -a $1 ]]
      then echo "can't find file $1"
      exit 1
    else
      echo ">>>executing sicasm"
      sicasm $1
      if [[ ! $? = 0 ]]
        then echo "*** Assembly failed ***"
        exit 1
      fi
      echo ">>>copying $1.obj to DEVF1"
      cp $1.obj DEVF1

      cnt=2
      while [[ ! $# = 1 ]]
      # shift moves the argv pointer to the
      # next parameter (can't be "unshifted")
            
        do shift
          if [[ ! -a $1 ]]
            then echo "*** can't find file $1 \
              to copy to DEVF$cnt ***"
          else
            echo ">>>copying $1 to DEVF$cnt"
            cp $1 DEV$cnt
          fi
          let cnt=cnt+1
        done

      echo ">>>launching the SIC simulator"

      echo ">>>enter <Crtl>d to quit"

      echo

      (echo a; echo s; echo "h \
              9999"; cat 2>/dev/null) | sicsim

    fi

    Commands and constructions that haven't come up before are part of the additions to the script. First the shell command shift has the effect of changing the address used to access the argv array, in effect repointing $1 to the next parameter each time invoked. Note that $# correspondingly decrements by 1 with each shift.

    shift has to be used with care, since there is no "unshift". It provides a way to access the parameter list in a variable fashion, and is the means for accessing more than 9 positional parameters since $10 expands as $1 concatenated with 0.

    Second, the shell command let provides a means for performing arithmetic on shell variables which have values that can be interpreted as integer. An equivalent arithmetic construction to the let statement used above,
    count=$((count+1))
    employs a more versatile arithmetic form for computation, one for which the result [$((count+1)] could equally be used for a purpose other than variable assignment (it could be echoed, for example).

    Finally, the construction
    (echo a; echo s; echo "h 9999"; cat 2>/dev/null) | sicsim
    takes advantage of the fact that when cat has no argument it draws from standard input. The effect is that everything entered is sent immediately to standard output. Recall from the discussion of redirection that when two commands are enclosed in parentheses their standard output is combined, so what happens is that the output of the 3 echo commands is combined with that of cat and the result piped on to sicsim as standard input. The effect is to drive the sicsim interface from cat so that it continues to behave as it would normally, except that echo issues the first 3 commands for sicsim rather than the user. The other difference is that cat will close the pipe on receiving <Ctrl>d, which will kill sicsim as well since its standard input will no longer be valid. If sicsim terminates on its own (presumably under user control), the other end of the pipe will be broken and anything else from cat will find a non-existent target, causing an error and command termination. For this reason, the standard error for cat has been redirected so that if this situation does occur, it doesn't cause heartburn. For all practical purposes, the specified file /dev/null is equivalent to oblivion and is what is typically used for this kind of action.

    The only other way to accomplish this task would have been to modify the program for sicsim, which would have compromised its conceptual integrity. This illustration should help explain why Unix users employ shell scripts for accomplishing much their work.

  9. Script Debugging

    A script with multiple lines brings with it the same kinds of debugging issues that occur in programming languages. Under the Korn Shell, the set command is used to turn on (or off) the command trace feature used for script debugging. The feature is enabled from within the script, by simply adding the line

    set -x
    to start tracing. While trace is in effect, every command encountered is echoed to standard error before it is executed. Output from tracing is identified by being preceded by the value of the keyword variable "PS4" (which is "+ " by default). Trace is turned off when the script ends or when
    set -
    is executed.

    It is generally useful to establish PS4 in the profile script (.profile) or via the environment variable (ENV). For example,

    PS4='+ On line $LINENO: '
    export PS4
    will provide the line numbers for each traced command.

cwinton: 9/1/2006

Copyright © 2007 University of North Florida, School of Computing - All rights reserved. [UNF Privacy Policy]