Spring Term, 2002 .
Assignment #4
|
Programming exercises: Due by noon on Friday, March 15, 2002.
An assembler employs several types of symbol tables,
some of which are static (e.g., the op code table) and some of which are created dynamically
as source code is processed (e.g., the symbol table of statement labels). If care is taken to
structure all tables in the same format, table processing needs can be taken care of by
a common set of subroutines. The first of these, findfirst, was constructed in the prior
assignment. For dynamically creating tables an insert subroutine is needed. The first
part of this assignment is to construct such a subroutine. The subroutine is to insert new entries
in the table so as to maintain sort order on the symbols. For a duplicate symbol case, the
item is to be installed so that all items with this same symbol are in ascending order of
value, any further ties to be broken first by casenmbr, and then
by otherinfo. The item is to be rejected if all 4 fields are the same.
Subroutine specifications:
- Structures
#ifndef SYMBSIZE
#define SYMBSIZE 10
#endif
typedef struct tablemem
{
char symbol[SYMBSIZE];
int value;
int casenmbr;
int otherinfo;
} tabletype;
int findfirst (tabl, n, x)
tabletype tabl[];
int n;
char x[];
int insert(newentry, tabl, tabsizep)
struct tablemem newentry;
struct tablemem tabl[];
int *tabsizep;
- Parameters
- Item to insert
- newentry < === > a structure containing item data
- Table characteristics
- tabl < === > table name
- tabsizep < === > the address of the variable holding
the number of entries currently in the table
- Returned value
- Return the value of the index at which the new item is stored; return -1 if all 4 fields are the same.
- File name
- insert.c
Note that table entries are compatible with your findfirst routine.
Tables to be dynamically created usually have an initial table size of 0, signifying an empty
table. The table size is passed by reference via the tabsizep parameter, a pointer variable.
Note that it is the responsibility of the insert routine to
increment the table size whenever a new item is added.
Procedure:
- Design and construct an insert subroutine that observes the requirements of the above
specifications. The basic idea is to add the newentry to the table at the index
given by the table size, then pair-swap it into its sort position, increase
*tabsizep by 1 and return the index at which the item was stored. Other symbol table strategies
(such as a hash table) require a different table organization than the one being employed, and
so do not have to be considered.
Although duplicate labels are not allowed in a program section, working tables used by the
assembler may have entries with duplicate symbol fields. In particular, if multiple independent
sections are supported, then the same symbol may appear in different sections, with a
value entry being used to keep track of the sections the symbol is used in.
Be careful to install duplicate symbol entries according to the stipulation
given above.
- A driver COP3601-4.o for this
assignment has been provided in the assgn-4 subdirectory of the
class directory. It serves as a test mechanism for your programs for both part A and part B. It also will provide
you with the experience of linking a program that you
have written to one written by someone else. The driver provides means for invoking your
insert routine to add an item, means for dumping the table as currently constituted,
and means for searching the table for an item (using your findfirst subroutine).
- Devise a make file named <student-id>-4.makefile to compile
your insert routine and link it with the supplied driver, executable named
<student-id>-4.
Since the makefile is to support both part A and part B, you can supply a program stub for the part B program
(breakup.c) until you are ready to work on it. The basic command lines needed are:
- cc -c insert.c -o insert.o
- cc -c breakup.c -o breakup.o
- cc COP3601-4.o
insert.o breakup.o findfirst.o -o <student-id>-4
The hlib functions are linked in with the supplied driver
COP3601-4.o.
Note that a viable stub for breakup can be as simple as a
breakup.c file whose contents consist of a "nil" function
void breakup() {}
to provide cc with something that will compile and whose .o file will link with the driver.
- Devise a test plan for your subroutine that fully employs the driver's capabilities. At a
minimum, you should plan to
- add at least 6 items to the table, including some duplicate symbol entries
- dump the table contents to verify contents, particularly that the spec'd order is present
- search the table to verify findfirst is working according to spec.
When you are satisfied that you have an effective search plan, remove hlib.log and then
run your tests. When finished, move hlib.log to a file named
<student-id>-4.A.log for use in your Part A
documentation in a manner analogous to that for prior assignments.
In processing source lines, the assembler needs to break them
into constituent pieces for analysis. The task for this part of the assignment
is to produce a breakup subroutine to decompose a line of source into
4 components: label, op code,
operand, and comment,
and identify the n,i,x,
and e bit settings.
Example:
- Components and bits for the following 3 examples of source lines are as follows:
- EXAMPLE +LDA STUFF,X .MAIN
label: "EXAMPLE"
op code: "LDA"
operand: "STUFF"
comment: ".MAIN"
nixbpe: 111??1
- RSUB NO OP
label: "" label: ""
op code: "RSUB" ..OR.. op code: "RSUB"
operand: "" operand: "NO"
comment: "NO OP" comment: "OP"
nixbpe: 110??0
- CMT BYTE @C'RESERVED BLOCK',X SET ASIDE
label: "CMT"
op code: "BYTE"
operand: "C'RESERVED BLOCK'"
comment: "SET ASIDE"
nixbpe: 10l??0
nixbpe bits marked "?" remain as
received from the calling program; the prefix for the op code determines the
e-bit; the prefix for the operand determines
the n and i bits;
the suffix for the operand determines the x bit.
The b and p bits are set elsewhere.
No semantic checking is expected in this routine (so in particular, both interpretations of the
RSUB statement above are OK). Default values for components is the empty string.
Default for the "ni??pe" bits is "00??00".
Subroutine specifications:
- Structures
struct toklist
{
char *lbl;
char *opc;
char *opr;
char *cmt;
int *ni
int *xbpe;
} ;
void breakup(sourceline, components)
char sourceline[];
struct toklist components;
- Parameters
- Source code line to process
- sourceline < === > a string of source elements separated by white space
- Calling program variables corresponding to source line elements
- components.lbl < === > address of label variable
- components.opc < === > address of opcode variable
- components.opr < === > address of operand variable
- components.cmt < === > address of comment variable
- components.ni < === > address of the variable holding the ni values (note that the variable is 0, 1, 2, or 3)
- components.xbpe < === > address of the variable holding the xbpe values (note that the variable is in the range 0-15).
- Returned value
- none
- File name
- breakup.c
You are advised to take advantage of the string library function strtok
(discussed below) as a primary means for extracting elements of the source line being processed.
Procedure
- Design and construct a breakup subroutine that observes the requirements of the above
specifications.
Example: calling program setup
struct toklist components;
char label[81], opcode[81], operand[81], comment[81];
int ni, xbpe;
components.lbl = label;
components.opc = opcode;
components.opr = operand;
components.cmt = comment;
components.ni = ∋
components.xbpe = &xbpe;
To copy into a component involves code such as
- strcpy(components.lbl, "hello");
which copies "hello" into the calling program's string
variable named label just as surely as
- strcpy(label, "hello");
would.
Setting one of the "nixbpe" bits involves code such as
- xbpe += 8;
which sets the value of the "x" bit (think in terms of the
xbpe bits as
being at the 8's, 4's, 2's, and digit's positions of xbpe).
- Devise a test plan for your subroutine. Using the driver already discussed in part A
plan to decompose lines of various types, including things such as
+STUPID #C'ONE TWO',X THREE FOUR
label: ""
op code: "STUPID"
operand: "C'ONE TWO'"
comment: "THREE FOUR"
nixbpe: 011??1
Be sure that your test entries fully exercise your breakup code.
When you are satisfied that you have an effective search plan, remove hlib.log and then
run your tests. When finished, move hlib.log to a file named
<student-id>-4.B.log for use in your Part B
documentation in a manner analogous to that for prior assignments.
Construct Word documentation as specified on the assignments page.
Remember that your documentation file is to have 2 distinct parts, a write-up for Part A and
a write-up for Part B, each with its own executive summary and set of appendices as noted above.
Use the submit shell script
from /usr/public/cop3601/cwinton to "shar" your
- documentation (<student-id>-4.doc)
- insert source code (insert.c)
- breakup source code (breakup.c)
- make file (<student-id>-4.makefile)
- Part A log file (<student-id>-4.A.log)
- Part B log file (<student-id>-4.B.log)
- findfirst source code (findfirst.c)
and "turnin" by noon on Friday, March 15 to cnw.3601.4.
Late submissions can be submitted until the due date for the next assignment via submit
to cnw.3601.4L (late points
assessed as stipulated on the assignments page).
|
|
strtok
For writing programs which must decompose strings into their various components, it is very
useful to become thoroughly acquainted with the string tokenizing function
strtok.
For example, suppose the text between the single quote marks for the following needs to be
extracted:
- MESSAGE BYTE C'THIS IS A MESSAGE'
Assuming the string variable sourceline holds the text, the
solution using strtok is
-
/* char sourceline[81], workingstring[81], *ptr; */
strcpy(workingstring, sourceline);
ptr = strtok(workingstring, "'");
ptr = strtok(NULL, "'");
printf("\nThe text is %s", ptr);
Some explanation is obviously required.
In general for a string of delimiter characters in variable "delim"
-
strtok(s, delim)
- scans the string "s" to the 1st character not in string "delim"
- sets the return value to the address of this character
- then continues to scan until a character in "delim" is
found, replacing it with '\0'
strtok(NULL, delim) scans from the string immediately
following the most recent strtok replacement. The delimiter string does
not have to remain the same.
|
Hence, for the above example,
The 1st call to strtok locates the 1st character not
a single quote mark, which is 'M' setting
the return value as a pointer to this character, then scans on until a
single quote mark is found, replacing it with '\0'.
The effect is to extract and point to the string
- "MESSAGE BYTE C"
When strtok is called the 2nd time against a NULL
pointer rather than "workingstring", it resumes the previous scan from the point
at which it left off.
For this reason the 2nd call to strtok points to string
- "THIS IS A MESSAGE" ( this time it is the final single
quote character that gets replaced by '\0')
Note that strtok actually modifies the contents
of the character string "workingstring", replacing the single quote characters found in the
two calls with the '\0' character.
For this reason, as illustrated by the example,
strtok should typically be used on a working string rather than actual data.
strtok will not cross the string terminator character
'\0' (the end of the original string).
Once strtok reaches the end of the string, any further
calls against NULL simply return a NULL
pointer (this is an important consideration, since a NULL pointer
is quite different from a pointer to an empty string - Warning:
since the pointer returned by strtok is typically passed to strcpy,
this returned pointer from a call against NULL
typically needs to be checked to be sure it isn't a NULL pointer
(calling strcpy with a NULL will cause it to crash).
Next consider the need to decompose a line of text such as
- LOOP LDA TABLE,X
into tokens, where tokens are delimited by spaces and/or commas.
Assuming that the string variable "sourceline" holds the text, a solution
using strtok is
/* char sourceline[81], workingstring[81], *ptr;
char token1[81], token2[81], token3[81], token4[81]; */
strcpy(workingstring, sourceline);
token1[0] = '\0'; token2[0] = '\0'; token3[0] = '\0'; token4[0] = '\0';
if (workingstring[0] == ' ')
{
if ((ptr = strtok(workingstring," ,")) != NULL)
strcpy(token2,ptr);
}
else
{
if ((ptr = strtok(workingstring," ,")) != NULL)
strcpy(token1, ptr);
if ((ptr = strtok(NULL," ,")) != NULL)
strcpy(token2, ptr);
}
if ((ptr = strtok(NULL," ,")) != NULL) strcpy(token3, ptr);
if ((ptr = strtok(NULL," ,")) != NULL) strcpy(token4, ptr);
If the assembly language statement has no label, token1 is set
to the empty string, so regardless, token2,
token3, and token4
receive the op code, operand, and qualifier elements
of the line of assembly code being decomposed. Large sizes are allocated to token variables
to avoid a strcpy overrun if the text being scanned is not a normal source line.
Note that the pointer returned by a call against NULL is checked
before it is passed to strcpy. Most (but not all) of the elements necessary
to construct a viable breakup subroutine are captured in this illustration.
The constructions employed for other forms of source lines will require variations of the
strategies exhibited here.
Note that when a space is part of the delimiter string, simply including the white space
characters (e.g., '\t') in the delimter string will cause all white
space to be handled as if it were comprised of spaces.
Programming tip: when reading sourcelines from the typical text file using fgets,
the newline character '\n' is included at the end of the input string.
You can remove it using strtok by the simple construction
strtok(sourceline,"\n");
|
|