Course  
  Menu  
  Course  
 Description 
  Course  
  Outline  
 Assignments 
 & Doc Specs 
  Software  
  Summary  
  Class  
  Notes  
 Pseudocode 
  Modules  
  SIC/XE  
  Reference  
 Course  
 Supplement 
CNW Home
Fall Term, 2006  .

Assignment #4

Programming exercises: Due by noon on Friday, October 27, 2006.

  1. An assembler employs several types of symbol tables, some of which are static (e.g., the op code table) and some of which are created dynamically as source code is processed (e.g., the symbol table of statement labels). If care is taken to structure all tables in the same format, table processing needs can be taken care of by a common set of subroutines. The first of these, findlast, was constructed in the prior assignment. For dynamically creating tables an insert subroutine is needed. The first part of this assignment is to construct such a subroutine. The subroutine is to insert new entries in the table so as to maintain sort order on the symbols. For a duplicate symbol case, the item is to be installed so that all items with this same symbol are in ascending order of value, any further ties to be broken first by casenmbr, and then by otherinfo. The item is to be rejected if all 4 fields are the same.

    Subroutine specifications:

    Structures
       #ifndef SYMBSIZE
         #define SYMBSIZE 10
       #endif
       typedef struct tabinfo
         {
           int  val;
           int  type;
           int  info;
         } tabdata;
       typedef struct tablemem
         {
           char    symbol[SYMBSIZE];
           tabdata symbdata;
         } tabletype;
       extern int findlast(); // previously defined
       int insert(newentry, tabl, tabsizep)
         struct tablemem newentry;
         struct tablemem tabl[];
         int             *tabsizep;
    Parameters
    1. Item to insert
      1. newentry < === > a structure containing item data
    2. Table characteristics
      1. tabl     < === > table name
      2. tabsizep < === > the address of the variable holding the number of entries currently in the table
    Returned value
    Return the value of the index at which the new item is stored unless the item is a complete duplicate; ie., all 4 fields are the same, in which case return -1 (to flag that the symbol table is now invalid).
    File name
    insert.c

    Note that table entries are compatible with your findlast routine.

    Tables to be dynamically created usually have an initial table size of 0, signifying an empty table. The table size is passed by reference via the tabsizep parameter, a pointer variable. Note that it is the responsibility of the insert routine to increment the table size whenever a new item is added.

    Procedure:

    • Design and construct an insert subroutine that observes the requirements of the above specifications. The basic idea is to add the newentry to the table at the index given by the table size, then pair-swap it into its sort position, increase *tabsizep by 1 and return the index at which the item was stored. Other symbol table strategies (such as a hash table) require a different table organization than the one being employed, and so do not have to be considered.

      Although duplicate labels are not allowed in a program section, working tables used by the assembler may have entries with duplicate symbol fields. In particular, if multiple independent sections are supported, then the same symbol may appear in different sections, with a value entry being used to keep track of the sections the symbol is used in. Be careful to install duplicate symbol entries according to the stipulation given above.

    • A driver COP3601-4.o for this assignment has been provided in the assgn-4 subdirectory of the class directory. It serves as a test mechanism for your programs for both part A and part B. It also will provide you with the experience of linking a program that you have written to one written by someone else. The driver provides means for invoking your insert routine to add an item, means for dumping the table as currently constituted, and means for searching the table for an item (using your findlast subroutine).

    • Devise a make file named <student-id>-4.makefile to compile your insert routine and link it with the supplied driver, executable named <student-id>-4. Since the makefile is to support both part A and part B, you can supply a program stub for the part B program (breakup.c) until you are ready to work on it. The basic command lines needed are:
           cc -c insert.c -o insert.o
           cc -c breakup.c -o breakup.o
           cc COP3601-4.o insert.o breakup.o findlast.o -o <student-id>-4
      
      The hlib functions are linked in with the supplied driver COP3601-4.o.

      Note that a viable stub for breakup can be as simple as a breakup.c file whose contents consist of a "nil" function
           void breakup() {}
      to provide cc with something that will compile and whose .o file will link with the driver.

    • Devise a test plan for your subroutine that fully employs the driver's capabilities. At a minimum, you should plan to
      1. add at least 6 items to the table, including some duplicate symbol entries
      2. dump the table contents to verify contents, particularly that the spec'd order is present
      3. search the table to verify findlast is working according to spec.

      When you are satisfied that you have an effective search plan, remove hlib.log and then run your tests. When finished, move hlib.log to a file named <student-id>-4.A.log for use in your Part A documentation in a manner analogous to that for prior assignments.

  2. In processing source lines, the assembler needs to break them into constituent pieces for analysis. The task for this part of the assignment is to produce a breakup subroutine to decompose a line of source into 4 components: label, op code, operand 1, and operand 2, and identify the n,i,x, and e bit settings.

    Example:
    Components and bits for the following 4 examples of source lines are as follows:
    1. EXAMPLE +LDA STUFF,X .MAIN label: "EXAMPLE" op code: "LDA" operand 1: "STUFF" operand 2: "X" <Note: the assembler doesn't need to look beyond the operands> nixbpe: 111??1
    2. RSUB NO OP label: "" op code: "RSUB" operand 1: "NO" <Note: for RSUB, the assembler won't access operands> operand 2: "OP" nixbpe: 110??0
    3. CMT BYTE @C'RESERVED BLOCK',XYZ SET ASIDE label: "CMT" op code: "BYTE" operand 1: "C'RESERVED BLOCK'" operand 2: "XYZ" <Note: X as the 1st character of the 2nd operand causes the x bit to be set> nixbpe: 10l??0
    4. RMO A,X label: "" op code: "RMO" operand 1: "A" operand 2: "X" nixbpe: 10l??0 <Note: the assembler will ignore nixbpe values for 2-byte>
    nixbpe bits marked "?" remain as received from the calling program; the prefix for the op code determines the e-bit; the prefix for the 1st operand determines the n and i bits; the first character of the 2nd operand determines the x bit. The b and p bits are set elsewhere. The default values for a component is the empty string. Default for the "ni??pe" bits is "00??00". Optionally, you may treat "*" as an op code prefix (signaling that n and i bits are to be 0; ie., a simple SIC instruction).

    Subroutine specifications:

    Structures
       typedef struct toklist
         {
           char  *lbl;
           char  *opc;
           char  *op1;
           char  *op2;
         } component_addresses;
       typedef struct bits
         {
           int   *ni;
           int   *xbpe;
         } nixbpe_addresses;
    
       void breakup(sourceline, comptrs, qptrs)
         char sourceline[];
         component_addresses comptrs;
         nixbpe_addresses qptrs;
         
    Parameters
    1. Source code line to process
      1. sourceline < === > a string of source elements separated by white space
    2. Calling program variables corresponding to source line elements
      1. comptrs.lbl  < === > address of label variable
      2. comptrs.opc  < === > address of opcode variable
      3. comptrs.op1  < === > address of operand1 variable
      4. comptrs.op2  < === > address of operand2 variable
      5. qptrs.ni   < === > address of the variable holding the ni values
        (note that the variable is in the range 0-3 with default value of 3; you may set ni to 0 if the opcode is prefixed with "*")
      6. qptrs.xbpe < === > address of the variable holding the xbpe values (note that the variable is in the range 0-15).
    Returned value
    none
    File name
    breakup.c

    You are advised to take advantage of the string library function strtok (discussed below) as a primary means for extracting elements of the source line being processed.

    Procedure

    • Design and construct a breakup subroutine that observes the requirements of the above specifications. Example: calling program setup
        component_addresses comptrs; nixbpe_addresses qptrs;
        char label[81], opcode[81], operand[81], comment[81];
        int ni, xbpe;
      
        comptrs.lbl = label;
        comptrs.opc = opcode;
        comptrs.opr = operand1;
        comptrs.cmt = operand2;
        qptrs.ni   = &ni;
        qptrs.xbpe = &xbpe; 
      To copy into a component involves code such as
      strcpy(comptrs.lbl, "hello");
      which copies "hello" into the calling program's string variable named label just as surely as
      strcpy(label, "hello");
      would.

      Setting one of the "nixbpe" bits involves code such as

      xbpe += 8;
      which sets the value of the "x" bit (think in terms of the xbpe bits as being at the 8's, 4's, 2's, and digit's positions of xbpe).

    • Devise a test plan for your subroutine. Using the driver already discussed in part A plan to decompose lines of various types, including things such as
            +STUPID   #C'ONE TWO',X    THREE FOUR    
      
        label:         ""
        op code:       "STUPID"
        operand 1:     "C'ONE TWO'"
        operand 2:     X
      
        nixbpe:        011??1

      Be sure that your test entries fully exercise your breakup code. When you are satisfied that you have an effective search plan, remove hlib.log and then run your tests. When finished, move hlib.log to a file named <student-id>-4.B.log for use in your Part B documentation in a manner analogous to that for prior assignments.

Construct Word documentation as specified on the assignments page. Remember that your documentation file is to have 2 distinct parts, a write-up for Part A and a write-up for Part B, each with its own executive summary and set of appendices as noted above. Use the submit shell script from /usr/public/cop3601/cwinton to "shar" your

  • documentation (<student-id>-4.doc)
  • insert source code (insert.c)
  • breakup source code (breakup.c)
  • make file (<student-id>-4.makefile)
  • Part A log file (<student-id>-4.A.log)
  • Part B log file (<student-id>-4.B.log)
  • findlast source code (findlast.c)
and "turnin" by noon on Friday, October 27 to cnw.3601.4. Late submissions can be submitted until the due date for the next assignment via submit to cnw.3601.4L (late points assessed as stipulated on the assignments page).


strtok

For writing programs which must decompose strings into their various components, it is very useful to become thoroughly acquainted with the string tokenizing function strtok. For example, suppose the text between the single quote marks for the following needs to be extracted:

MESSAGE BYTE C'THIS IS A MESSAGE'
Assuming the string variable sourceline holds the text, the solution using strtok is
/* char sourceline[81], workingstring[81], *ptr; */
strcpy(workingstring, sourceline);
ptr = strtok(workingstring, "'");
ptr = strtok(NULL, "'");
printf("\nThe text is %s", ptr);
Some explanation is obviously required.

In general for a string of delimiter characters in variable "delim"
strtok(s, delim)
  • scans the string "s" to the 1st character not in string "delim"
  • sets the return value to the address of this character
  • then continues to scan until a character in "delim" is found, replacing it with '\0'
strtok(NULL, delim) scans from the string immediately following the most recent strtok replacement. The delimiter string does not have to remain the same.
Hence, for the above example, The 1st call to strtok locates the 1st character not a single quote mark, which is 'M' setting the return value as a pointer to this character, then scans on until a single quote mark is found, replacing it with '\0'. The effect is to extract and point to the string
"MESSAGE BYTE C"
When strtok is called the 2nd time against a NULL pointer rather than "workingstring", it resumes the previous scan from the point at which it left off. For this reason the 2nd call to strtok points to string
"THIS IS A MESSAGE" ( this time it is the final single quote character that gets replaced by '\0')

Note that strtok actually modifies the contents of the character string "workingstring", replacing the single quote characters found in the two calls with the '\0' character. For this reason, as illustrated by the example, strtok should typically be used on a working string rather than actual data.

strtok will not cross the string terminator character '\0' (the end of the original string). Once strtok reaches the end of the string, any further calls against NULL simply return a NULL pointer (this is an important consideration, since a NULL pointer is quite different from a pointer to an empty string - Warning: since the pointer returned by strtok is typically passed to strcpy, this returned pointer from a call against NULL typically needs to be checked to be sure it isn't a NULL pointer (calling strcpy with a NULL will cause it to crash).

Next consider the need to decompose a line of text such as

LOOP LDA TABLE,X
into tokens, where tokens are delimited by spaces and/or commas. Assuming that the string variable "sourceline" holds the text, a solution using strtok is
/* char sourceline[81], workingstring[81], *ptr;
   char token1[81], token2[81], token3[81], token4[81]; */
strcpy(workingstring, sourceline);
token1[0] = '\0'; token2[0] = '\0'; token3[0] = '\0'; token4[0] = '\0';
if (workingstring[0] == ' ')
  {
    if ((ptr = strtok(workingstring," ,")) != NULL)
    strcpy(token2,ptr);
  }
else
  {
    if ((ptr = strtok(workingstring," ,")) != NULL)
    strcpy(token1, ptr);
    if ((ptr = strtok(NULL," ,")) != NULL)
    strcpy(token2, ptr);
  }
if ((ptr = strtok(NULL," ,")) != NULL) strcpy(token3, ptr);
if ((ptr = strtok(NULL," ,")) != NULL) strcpy(token4, ptr);
If the assembly language statement has no label, token1 is set to the empty string, so regardless, token2, token3, and token4 receive the op code, operand, and qualifier elements of the line of assembly code being decomposed. Large sizes are allocated to token variables to avoid a strcpy overrun if the text being scanned is not a normal source line. Note that the pointer returned by a call against NULL is checked before it is passed to strcpy. Most (but not all) of the elements necessary to construct a viable breakup subroutine are captured in this illustration.

The constructions employed for other forms of source lines will require variations of the strategies exhibited here. Note that when a space is part of the delimiter string, simply including the white space characters (e.g., '\t') in the delimter string will cause all white space to be handled as if it were comprised of spaces.

Programming tip: when reading sourcelines from the typical text file using fgets, the newline character '\n' is included at the end of the input string. You can remove it using strtok by the simple construction
   strtok(sourceline,"\n");