Course  
  Menu  
  Course  
 Description 
  Course  
  Outline  
 Assignments 
 & Doc Specs 
  Software  
  Summary  
  Class  
  Notes  
 Pseudocode 
  Modules  
  SIC/XE  
  Reference  
 Course  
 Supplement 
CNW Home
Fall Term, 2006  .

Term project

The project is due by noon on Friday, December 8, 2006. No late submissions will be accepted after midnight on Wednesday, December 13, 2006.

The term project is to be a (cross) assembler for (a subset of) SIC/XE, written in C, producing code for the absolute loader used in the SIC programming assignments.

Specifications

  1. The assembler is to execute by entering
    assemble <source-file-name>
    and is to be constructed using the subroutines developed as exercises (specifically, findlast, insert, breakup, opcodeincr, storageincr, getsource, and pass1) supplemented by the two subroutines doinstruct and dostorage specified below.

  2. The source file for the main program is to be named assemble.c.

  3. The assembler is to accept syntax equivalent to that of the assembly language described in the course textbook.

  4. The assembler is to be capable of handling source lines that are instructions, storage declaratives, comments, and assembler directives (a directive that is not implemented should be ignored possibly with a warning)

    1. For instructions, the assembler is to minimally be capable of decoding 2, 3 and 4-byte instructions as follows:
      1. 2-byte with 1 symbolic register reference (e.g., TIXR A)
      2. RSUB (ignoring any operand or perhaps issuing a warning)
      3. 3-byte PC-relative with symbolic operand to include immediate, indirect, and indexed addressing
      4. 3-byte absolute with non-symbolic operand to include immediate, indirect, and indexed addressing
      5. 4-byte absolute with symbolic or non-symbolic operand to include immediate, indirect, and indexed addressing

    2. The assembler is to handle all storage directives (BYTE, WORD, RESW, and RESB) and should generate multiple modules as implied by RESW and RESB to handle large RESB and RESW directives.

  5. Any text following the operand component on a line of source is assumed to be a comment.

  6. Pass 2 errors are to be flagged in the same "errors" list used in pass 1 (you may abort pass 2 if there are pass 1 errors, so no conflicting error conditions should arise).

  7. The assembler is to output the symbol table at the end of pass 1.

  8. The assembler is to produce a report at the end of pass 2 (or prior to quitting if pass 2 is aborted). Pass 1 or pass 2 errors should be included as part of the assembler report, exhibiting both the offending line of source code and the error (from the "errors" list). The assembler report (at the end of pass 2) is to show in parallel columns the location counter, any generated object code (which could be blank if pass 2 were aborted), and source code in the precise column format as indicated in the following example:

    Example format for the assembler report

    Column:
         0       8                   29
         |       |                    |
         Loc     Object code          Source code
         ---     -----------          -----------
         0003A0                       TERMPROJ  START   3A0   COP 3601
         0003A0                       . THIS IS A COMMENT
         0003A0  4C4F4E47205445585420           BYTE    C'LONG TEXT IS TRUNCATED'
         0003B6                       DEV       BYTE    X'FG'
         ******* ERROR 7: Bad operand for BYTE/WORD directive
         0003B7                                 RESB    4
         0003BB  03200F               TOP       LDA     ZERO
         0003BE  0520A1                         LDX    #INDX
                         . . .
    
    Note that the object code field is sufficiently wide for all generated code except possibly that produced by storage directives. Where a storage directive produces more than 20 hex digits, truncate to 20 characters on the report (but include all of it on the object file for the SIC/XE simulator!).

  9. If there are no errors, the object code (including loader information) is to be written to a file named according to the convention "<source-file-name>.obj" for subsequent input to the simulator "sicsim". Your project write-up is to include at least one case which shows both the code produced by your assembler and the result of executing it under sicsim.

  10. General documentation standards as employed in course assignments remain in effect but will require slight modification since this submission is your term project. Organize as follows:
    • Cover page as per the documentation template (Template.doc)
    • Table of contents
    • Executive summary:
      1. comprehensive overview of your assembler as a software product (don't indulge in a blow-by-blow of the bits and pieces of incrementally building the product).
      2. List of features implemented (including degree of completion); delineate each bonus feature prominently as a separate entry and point to its test listing in the appendix
      3. Constructions employed - overview of what was built, including a hierarchical organization chart exhibiting the relationship among the various program modules.
      4. Testing plan, summary and discussion - nature of EACH test conducted (what it tested) and what it showed
      5. Wrap-up assessing overall functionality of the product as shown by your tests plus qualitative issues (i.e., degree of error reporting, robustness, etc).
    • Appendices:
      1. Source code listings arranged according to the hierarchy given in your executive summary (commented in-line or by wordprocessor annotation features)
      2. Test files listing/results, beginning from the representative selection of SIC/XE test files installed in the
        /usr/public/cop3601/cwinton/assgn-P/project.test.files directory.
        For the first of these (proj-t1), run your object file through the SIC simulator (using the supplied file proj-t1.input for the input file DEVF2). Include the SIC log file from this run in your documentation, commented and high-lighted as appropriate, and renamed as <student-id>-P.proj-t1.log .

        For each bonus feature test, specify the feature tested and why the test was passed. These should be grouped together rather than strung out among other tests.

        Supplement the provided test routines with at least one test module (named <student-id>-P.test) of your own design that will assemble and which fully exercises the features implemented in your assembler (other than error diagnostics). You do not need to run the object module through the SIC simulator, (i.e., your test program does not need to have a viable semantic). Include your assembler report from this file in your documentation, commented and high-lighted as appropriate, and renamed as <student-id>-P.test.lst

General course doumentation standards still apply. Your written documentation is to be prepared as a Word document file named <student-id>-P.doc. Please remember that your documentation is to be a complete, coherent description of your project work.

Use a current version of the submit shell script from /usr/public/cop3601/cwinton for turning in your work. The file /usr/public/cop3601/cwinton/assgn-P/spec lists names of all required files (omitting any <student-id> prefix). The submit script provides the opportunity to include any additional files required by your particular implementation. In particular, if you opt to have a pass2.c file, you must use this feature to manually include it with your submission. Usage is submit P

Bonuses: If you do any of these, notate their presence prominently in the executive summary. Provide highlighting and commentary in your test listings and be careful to identify the bonus category the test goes with.
Warning: If you don't claim a bonus feature I won't test for it. If you do claim it and it tests poorly, you will lose points you wouldn't have otherwise.

(1 point each on final average)

  1. Base/displacement addressing (including BASE, NOBASE directives)
  2. 2-byte instructions that have two operands (e.g., SHIFTL)
  3. Simple SIC capability where an opcode prefixed with "*" signals that the instruction is to be decoded in 3-byte simple SIC format
  4. Robustness of error reporting (including any bonuses that are present: points vary from 0 to 1 depending on quality - this includes things like over-long symbols, warnings, and memory overrun)

(2 points on the final average)

  1. Literals (including LTORG)
    =C'<ascii-text> ', =X'<hex-text> ', and =<decimal-value> forms.
  2. USE blocks
  3. EQU to include simple (A <op> B) operand arithmetic, where is one of +,-,*,/ and no spaces surround the operation; e.g., A+B.
  4. Relocation dictionary (RLD) and External Symbol Dictionary (ESD); append these to the end of your object module using the format:
    <object module> 
    @ 
    <op>XXXXXX 
    ... each RLD entry is a 6 hex character location
    <op>XXXXXX
    preceded by an arithmetic operation
    @ 
    XXXXXXAAAAAAA each EXTDEF entry is:
    ...
    6 hex character location followed by
    XXXXXXAAAAAAA
    a symbol of up to 7 ASCII characters
    @ 
    <op>XXXXXXAAAAAAA each EXTREF entry is:
    ...
    an arithmetic operation followed by
    <op>XXXXXXAAAAAAA
    a 6 hex character location followed by
    !
    a symbol of up to 7 ASCII characters

(5 points on the final average)

Construct a macro preprocessor that handles simple macro expansions at the level of the PUTC and PUTMSG macros discussed in class, including the "$" system variable but not conditional assembly or SET variables.

Required new subroutines for use in pass 2.

  1. Devise a C function named doinstruct (source file named doinstruct.c) to produce a line of object code for an instruction line. Have your function return 0 if successful and -1 (or suitable error code) if the operand is in error.

    Subroutine syntax:

      #ifndef SYMBSIZE
        #define SYMBSIZE 10
      #endif
      typedef struct tabinfo
        {
          int  val;
          int  type;
          int  info;
        } tabdata;
      typedef struct tablemem
        {
          char    symbol[SYMBSIZE];
          tabdata symbdata;
        } tabletype;
    /*
    
    add in any extern references you need for routines to be linked in separately
    */
      extern tabletype symbtab[];
      extern int symbtabsize;
      extern tabletype codetable[];
      extern int codetabsize;
      extern char objline[];
      int doinstruct(locctr, opcode, operand, ni, xbpe)
            int locctr, ni, xbpe;
            char opcode[], operand[];
    
    external variables are the same as in earlier homework; "locctr" is the location counter (needed for the PC relative computation). Your routine should first locate the op code in opcodetab and determine its type.
    1. For the 2-byte case
      • convert operand to a register number
      • use sprintf to format the op code and register number in hex character form and concatenate them together in objline.
    2. For the 3-byte case
      • add the ni value to the opcode value
      • use findlast to locate operand in symbtab
      • if the xbpe value is even, compute the displacement of the operand value from locctr and check to see if it is in range (if not this is where the BASE would get handled). Set the b and p bits for xbpe.
      • if the xbpe value is odd (the 4 byte case), you simply use the operand value from the symbol table as the displacement. Set the b and p bits for xbpe to 0 in this case.
      • appropriately format the opcode, xbpe, and displacement values using sprintf and concatenate them together in objline. This requires a little bit of maneuvering, since a negative displacement doesn't format to 3 hex digit form under the "%03X" format specification of sprintf.
        By adding enough space to the format phrase; e.g.,
        sprintf(hex,"%011X",x);
        the location of the desired hex digits can be predicted. For this example, hex always formats with 11 hex characters (so hex[8], hex[6], and hex[5] give hex representation of the 12 bit 2's complement, the 20 bit 2's complement, and the 24 bit 2's complement forms for the integer x).

  2. Devise a C function named dostorage (source file name dostorage.c) to produce a line of object code for a source line representing a storage directive. Have your function return 0 if successful and -1 (or suitable error code) if some error is detected in the operand field.

    Subroutine syntax:

      int dostorage(opcode, operand)
            char opcode[], operand[];
    
    Parameters are as discussed previously. As for part A, if successful, the function should place the generated storage in the string "objline".

    The discussion below covers both the generation of the ASCII representation of text (needed for BYTE C'<text>') and type checking for hex characters (needed for BYTE X'<hex-text>').

    The WORD directive can be handled by applying "sscanf" to the operand followed by applying "sprintf" with "%011X" formatting to the derived result. As discussed above, this provides means for obtaining the 24 bit 2's complement representation in its 6 hex character form.

    For RESW and RESB the printf format "%0<n>d" will produce a string of <n> '0' characters, an easy way to generate a string of legitimate hex characters to provide "garbage" bytes that reserve the specified storage if you aren't doing multiple modules.

Reminder: From the Course Supplement

From Section 2.3 (page 18): Converting text to ASCII

If a character value is assigned to an integer variable, it is converted to binary as an integer with leading 0's and its ASCII code in the rightmost byte. This "pure binary" form can then be converted to text using "sprintf" to generate the hex representation of the ASCII code ("sprintf" is discussed further in Section 2.4); for example,

   /* int ic; char s[81]; */
   ic = 'a';
   sprintf(s, "%X", ic);
places the text "61" (the ASCII representation of 'a' in hex) in the string "s". In fact, the abbreviated form
   sprintf(s, "%X", 'a');
has the same effect. By using this tactic in a loop, a text string can be easily converted to show its ASCII representation in hex; for example,
   /* int i, j;
      char s[81], t[81];  */
   for (i=0, j=0; i < strlen(s); i++, j+=+1)
     {
       sprintf(&t[j], "%X", s[i]);
     }
places in the string "t" the ASCII representation in hex of the characters in the string "s". This is exactly what is needed to resolve an assembly language storage instruction such as
   C2EX     BYTE     C'This is a message'
into object code; i.e., the ASCII form of the text "This is a message" can be generated using the above strategy.

From Section 2.4 (page 30): Type checking

It may be desirable for an assembler source statement such as

   DEV         BYTE    X'F1'
to check the data between the quote marks as to its type. This is accomplished by using "isxdigit" character by character on the data; e.g.,
   i=0;
   while (isxdigit(data[i])) i++;
   if (strlen(data) != i) ...
The type checking functions are "isdigit" is similarly used to test if a character is a decimal digit.

From Section 2.3(page 20): Passing command line arguments to "main"

Suppose you wish to execute a program for assembling code in the file "prog.sic" by using a Unix command line as follows:

   assemble prog.sic
In general, when a program is invoked, Unix implicitly provides the "main" module with two arguments via which command line arguments can be determined (they are ignored unless the programmer establishes code to utilize them). The first argument is a count (at least 1) of the number of tokens on the command line and the second is an array of string pointers, one for each token. Each of these pointers addresses a string of characters which comprise the token. For example, suppose the main module is
   #include 
   main(argc, argv)
     int argc;
     char *argv[];
       {
         char sourcefl[15];
         if (argc < 2)
           {
             askuser(sourcefl);
           }
         else
           {
             strcpy(sourcefl, argv[1]);
           }
         sicassemble(sourcefl);
       }
stored in the Unix file "myassembler.c" and compiled by
   cc myassembler.c askuser.o sicassemble.o -o assemble
Recall that the option "-o assemble" names the execute module "assemble" instead of "a.out". Then the command
   assemble prog.sic
will generate "argc" with value 2 and "argv[0]" pointing to the string "assemble" (the 1st token) and "argv[1]" pointing to the string "prog.sic" (the 2nd token). In this case the "else" portion of the above code will be executed.

If the user simply enters

   assemble
then "argc" has value 1 and the routine "askuser" in the "if" portion of the code is executed, presumably to interactively obtain the name of the source file from the user.