|
The project is due by noon on Friday, December 8, 2006.
No late submissions will be accepted after midnight on Wednesday, December 13, 2006.
The term project is to be a (cross) assembler for (a subset of) SIC/XE, written in C,
producing code for the absolute loader used in the SIC programming assignments.
Specifications
- The assembler is to execute by entering
| assemble <source-file-name>
|
and is to be constructed using the subroutines developed as exercises (specifically,
findlast, insert, breakup, opcodeincr, storageincr, getsource, and
pass1) supplemented by the two subroutines
doinstruct and
dostorage specified below.
- The source file for the main program is to be named
assemble.c.
- The assembler is to accept syntax equivalent to that of the assembly language described
in the course textbook.
- The assembler is to be capable of handling source lines that are instructions,
storage declaratives, comments, and assembler directives
(a directive that is not implemented should be ignored possibly with a warning)
- For instructions, the assembler is to minimally be capable of decoding 2, 3 and 4-byte instructions as follows:
- 2-byte with 1 symbolic register reference (e.g., TIXR A)
- RSUB (ignoring any operand or perhaps issuing a warning)
- 3-byte PC-relative with symbolic operand to include immediate, indirect, and indexed addressing
- 3-byte absolute with non-symbolic operand to include immediate, indirect, and indexed addressing
- 4-byte absolute with symbolic or non-symbolic operand to include immediate, indirect, and indexed addressing
- The assembler is to handle all storage directives (BYTE,
WORD, RESW,
and RESB) and should generate multiple
modules as implied by RESW and RESB to handle large RESB and RESW directives.
- Any text following the operand component on a line of source is assumed to be a comment.
- Pass 2 errors are to be flagged in the same "errors" list used
in pass 1 (you may abort pass 2 if there are pass 1 errors, so no conflicting error conditions
should arise).
- The assembler is to output the symbol table at the end of pass 1.
- The assembler is to produce a report at the end of pass 2 (or prior to quitting if pass 2 is
aborted). Pass 1 or pass 2 errors should be included as part of the assembler report,
exhibiting both the offending line of source code and the error (from
the "errors" list). The assembler report (at the end of pass 2) is
to show in parallel columns the location counter, any generated object code (which could be
blank if pass 2 were aborted), and source code in the precise column format as indicated in the
following example:
Example format for the assembler report
Column:
0 8 29
| | |
Loc Object code Source code
--- ----------- -----------
0003A0 TERMPROJ START 3A0 COP 3601
0003A0 . THIS IS A COMMENT
0003A0 4C4F4E47205445585420 BYTE C'LONG TEXT IS TRUNCATED'
0003B6 DEV BYTE X'FG'
******* ERROR 7: Bad operand for BYTE/WORD directive
0003B7 RESB 4
0003BB 03200F TOP LDA ZERO
0003BE 0520A1 LDX #INDX
. . .
Note that the object code field is sufficiently wide for all generated code except possibly
that produced by storage directives. Where a storage directive produces more than 20 hex digits,
truncate to 20 characters on the report (but include all of it on the object file for the SIC/XE
simulator!).
- If there are no errors, the object code (including loader information) is to be written to a
file named according to the convention "<source-file-name>.obj"
for subsequent input to the simulator "sicsim". Your project write-up
is to include at least one case which shows both the code produced by your assembler and the
result of executing it under sicsim.
- General documentation standards as employed in course assignments remain in effect but will
require slight modification since this submission is your term project. Organize as follows:
- Cover page as per the documentation template
(Template.doc)
- Table of contents
- Executive summary:
- comprehensive overview of your assembler as a software product
(don't indulge in a blow-by-blow of the bits and pieces of incrementally building the product).
- List of features implemented (including degree of completion);
delineate each bonus feature prominently as a separate entry and point to its test listing in the
appendix
- Constructions employed - overview of what was built, including a
hierarchical organization chart exhibiting the relationship among the various program modules.
- Testing plan, summary and discussion - nature of EACH test conducted (what it tested) and what it showed
- Wrap-up assessing overall functionality of the product as shown by your tests plus qualitative
issues (i.e., degree of error reporting, robustness, etc).
- Appendices:
- Source code listings arranged according to the hierarchy given in your executive summary
(commented in-line or by wordprocessor annotation features)
- Test files listing/results, beginning from the representative selection of SIC/XE test files
installed in the
/usr/public/cop3601/cwinton/assgn-P/project.test.files
directory.
|
For the first of these (proj-t1), run your object file
through the SIC simulator
(using the supplied file proj-t1.input for
the input file DEVF2). Include the SIC log file from this run in your documentation,
commented and high-lighted as appropriate, and renamed as
<student-id>-P.proj-t1.log
.
For each bonus feature test, specify the feature tested and why the test was passed. These should
be grouped together rather than strung out among other tests.
Supplement the provided test routines with at least one test module
(named <student-id>-P.test) of your own design that
will assemble and which fully exercises the features implemented in your assembler (other
than error diagnostics). You do not need to run the object module through the SIC
simulator, (i.e., your test program does not need to have a viable semantic). Include your assembler
report from this file in your documentation, commented and high-lighted as appropriate, and renamed
as <student-id>-P.test.lst
|
General course doumentation standards still apply. Your written documentation is to be prepared
as a Word document file named
<student-id>-P.doc. Please remember that your documentation
is to be a complete, coherent description of your project work.
Use a current version of the submit shell script
from /usr/public/cop3601/cwinton for turning in your
work. The file
/usr/public/cop3601/cwinton/assgn-P/spec lists
names of all required files (omitting any <student-id> prefix).
The submit script provides the opportunity to include
any additional files required by your particular implementation. In particular, if you
opt to have a pass2.c file, you must use this feature to manually include it with your
submission. Usage is submit P
Bonuses: If you do any of these, notate their presence prominently in the executive
summary. Provide highlighting and commentary in your test listings and be careful to identify
the bonus category the test goes with.
Warning: If you don't claim a bonus feature I won't test for it.
If you do claim it and it tests poorly, you will lose points you wouldn't have otherwise.
(1 point each on final average)
- Base/displacement addressing (including BASE, NOBASE directives)
- 2-byte instructions that have two operands (e.g., SHIFTL)
- Simple SIC capability where an opcode prefixed with "*" signals that the instruction is to be decoded in 3-byte simple SIC format
- Robustness of error reporting (including any bonuses that are present: points vary from 0 to 1 depending on quality - this includes things like over-long symbols, warnings, and memory overrun)
(2 points on the final average)
- Literals (including LTORG)
- =C'<ascii-text>
', =X'<hex-text>
', and =<decimal-value> forms.
|
- USE blocks
- EQU to include simple
(A <op> B) operand arithmetic,
where is one of +,-,*,/ and no spaces surround the operation;
e.g., A+B.
- Relocation dictionary (RLD) and External Symbol Dictionary (ESD);
append these to the end of your object module using the format:
| <object module> | |
| @ | |
| <op>XXXXXX | |
| ... |
each RLD entry is a 6 hex character location |
| <op>XXXXXX |
- preceded by an arithmetic operation
|
| @ | |
| XXXXXXAAAAAAA |
each EXTDEF entry is: |
| ... |
- 6 hex character location followed by
|
| XXXXXXAAAAAAA |
- a symbol of up to 7 ASCII characters
|
| @ | |
| <op>XXXXXXAAAAAAA |
each EXTREF entry is: |
| ... |
- an arithmetic operation followed by
|
| <op>XXXXXXAAAAAAA |
- a 6 hex character location followed by
|
| ! |
- a symbol of up to 7 ASCII characters
|
|
(5 points on the final average)
- Construct a macro preprocessor that handles simple macro expansions at the level of the PUTC and PUTMSG macros discussed in class, including the "$" system variable but not conditional assembly or SET
variables.
Required new subroutines for use in pass 2.
- Devise a C function named doinstruct
(source file named doinstruct.c) to produce a line
of object code for an instruction line. Have your function return
0 if successful and -1 (or suitable
error code) if the operand is in error.
Subroutine syntax:
#ifndef SYMBSIZE
#define SYMBSIZE 10
#endif
typedef struct tabinfo
{
int val;
int type;
int info;
} tabdata;
typedef struct tablemem
{
char symbol[SYMBSIZE];
tabdata symbdata;
} tabletype;
/*
add in any extern references you need
for routines to be linked in separately
*/
extern tabletype symbtab[];
extern int symbtabsize;
extern tabletype codetable[];
extern int codetabsize;
extern char objline[];
int doinstruct(locctr, opcode, operand, ni, xbpe)
int locctr, ni, xbpe;
char opcode[], operand[];
external variables are the same as in earlier homework;
"locctr" is the location counter (needed
for the PC relative computation). Your routine should first
locate the op code in opcodetab and determine its type.
- For the 2-byte case
- convert operand to a register number
- use sprintf to format the op code
and register number in hex character form and concatenate them together in
objline.
- For the 3-byte case
- add the ni value to the opcode value
- use findlast to locate
operand in symbtab
- if the xbpe value is even, compute the displacement of the operand value from
locctr and check to see if it is in
range (if not this is where the BASE would get handled).
Set the b and
p bits for xbpe.
- if the xbpe value is odd (the 4 byte case), you simply use the operand value
from the symbol table as the displacement. Set the
b and p bits for
xbpe to 0 in this case.
- appropriately format the opcode,
xbpe, and displacement values using
sprintf and
concatenate them together in objline. This requires a little
bit of maneuvering, since a negative displacement doesn't format to 3 hex digit form
under the "%03X" format specification of
sprintf.
By adding enough space to the format phrase; e.g.,
the location of the desired hex digits can be predicted. For this example,
hex always formats with 11 hex characters (so
hex[8], hex[6], and hex[5]
give hex representation of the 12 bit 2's complement, the 20 bit 2's complement, and the
24 bit 2's complement forms for the integer x).
- Devise a C function named dostorage (source
file name dostorage.c) to produce a line of object
code for a source line representing a storage directive. Have your function
return 0 if successful and -1 (or
suitable error code) if some error is detected in the operand field.
Subroutine syntax:
int dostorage(opcode, operand)
char opcode[], operand[];
Parameters are as discussed previously. As for part A, if successful, the function should
place the generated storage in the string "objline".
The discussion below covers both the generation of the ASCII representation of text (needed
for BYTE C'<text>') and
type checking for hex characters (needed
for BYTE X'<hex-text>').
The WORD directive can be handled by applying
"sscanf" to the operand followed by applying
"sprintf" with "%011X" formatting
to the derived result. As discussed above, this provides means for
obtaining the 24 bit 2's complement representation in its 6 hex character form.
For RESW and RESB
the printf format
"%0<n>d"
will produce a string of <n> '0' characters, an easy way to
generate a string of legitimate hex characters to provide "garbage" bytes
that reserve the specified storage if you aren't doing multiple modules.
Reminder: From the Course Supplement
From Section 2.3 (page 18): Converting text to ASCII
If a character value is assigned to an integer variable, it is converted to binary as an integer with leading 0's and its ASCII code in the
rightmost byte. This "pure binary" form can then be converted to text using "sprintf" to
generate the hex representation of the ASCII code ("sprintf" is discussed further in
Section 2.4); for example,
/* int ic; char s[81]; */
ic = 'a';
sprintf(s, "%X", ic);
places the text "61" (the ASCII representation of 'a' in hex) in the string "s".
In fact, the abbreviated form
sprintf(s, "%X", 'a');
has the same effect. By using this tactic in a loop, a text string can be easily converted to show its ASCII representation in hex; for
example,
/* int i, j;
char s[81], t[81]; */
for (i=0, j=0; i < strlen(s); i++, j+=+1)
{
sprintf(&t[j], "%X", s[i]);
}
places in the string "t" the ASCII representation in hex of the characters in the string
"s". This is exactly what is needed to resolve an assembly language storage instruction
such as
C2EX BYTE C'This is a message'
into object code; i.e., the ASCII form of the text "This is a message" can be generated
using the above strategy.
From Section 2.4 (page 30): Type checking
It may be desirable for an assembler source statement such as
DEV BYTE X'F1'
to check the data between the quote marks as to its type. This is accomplished by using "isxdigit" character by character on the data; e.g.,
i=0;
while (isxdigit(data[i])) i++;
if (strlen(data) != i) ...
The type checking functions are "isdigit" is similarly used to test if a character is a
decimal digit.
From Section 2.3(page 20): Passing command line arguments to "main"
Suppose you wish to execute a program for assembling code in the file "prog.sic" by using a
Unix command line as follows:
assemble prog.sic
In general, when a program is invoked, Unix implicitly provides the "main" module with two
arguments via which command line arguments can be determined (they are ignored unless the programmer establishes code to utilize them). The
first argument is a count (at least 1) of the number of tokens on the command line and the second is an array of string pointers, one for
each token. Each of these pointers addresses a string of characters which comprise the token.
For example, suppose the main module is
#include
main(argc, argv)
int argc;
char *argv[];
{
char sourcefl[15];
if (argc < 2)
{
askuser(sourcefl);
}
else
{
strcpy(sourcefl, argv[1]);
}
sicassemble(sourcefl);
}
stored in the Unix file "myassembler.c" and
compiled by
cc myassembler.c askuser.o sicassemble.o -o assemble
Recall that the option "-o assemble" names the execute module "assemble" instead of "a.out".
Then the command
assemble prog.sic
will generate "argc" with value 2 and "argv[0]" pointing to the string "assemble" (the 1st token) and "argv[1]" pointing to the string "prog.sic" (the 2nd
token). In this case the "else" portion of the above code will be executed.
If the user simply enters
assemble
then "argc" has value 1 and the routine "askuser" in the "if" portion of the code is executed, presumably to
interactively obtain the name of the source file from the user.
|