Mite’s object format
6th November 2000
Mite’s assembler writes object modules in the format given below, which is a direct encoding of the concrete syntax.
Hexadecimal numbers are followed by an “h”; for example, 100=64h. Binary numbers are similarly suffixed “b”.
The encoding is presented diagrammatically. Boxes representing bit fields are concatenated to make bytes or larger words:
These in turn are listed vertically. The contents of a bit field is given either as literal binary digits (01101001), or as a name. Fields are usually labelled with their width:
The most significant bit is at the left-hand end of the word, and the least significant at the right-hand end. Multi-byte words are stored with their bytes in little-endian order.
Boxes labelled in ordinary type (box) represent units which themselves have internal structure. Boxes labelled in italics (scatola) are numbers (see section 3). Optional units are shown as dashed boxes:
Lists are denoted by a stack:
Unsigned numbers are encoded as follows:
- A list of 7-bit words is formed by repeatedly removing the least significant seven bits of the number until none of the remaining bits is set.
- The 7-bit words are turned into bytes by the addition of a bit at the most significant end, which is zero for all the quantities except the first.
- The bytes are stored in the reverse order to that in which they were generated.
Signed numbers are encoded in the same way except that the list is formed by repeatedly removing the least significant seven bits of the number until all the remaining bits are the same as the most significant bit of the previous 7-bit word. Three-component numbers are encoded as three consecutive numbers.
Widths of quantities are encoded as
All strings are ASCII-encoded, preceded by a number giving their length.
Stack items are encoded as a number (see section 3).
Local addresses give the number of a label (see section 10), and are stored as a number. External addresses are stored as an identifier.
Address types are encoded as
The type of a manifest quantity is given by an op type field, which is encoded as
|Operand type||op type|
|local label plus offset||101b|
|external label plus offset||111b|
Constants are encoded as a single byte; for ashift the byte is 00h. A label expression is represented as the number or name of the label followed by the offset, which is a three-component number (see section 3).
The list elements are stored consecutively. The length is encoded as a number directly before the list elements.
Instructions are encoded as the opcode followed by the operands, encoded in order from left to right. The operands are encoded as in the preceding sections; lists of items and types enclosed in brackets are stored as lists.
9.1 MOV and DEF
A MOV instruction whose second operand is a register is encoded as 0000 0000b. DEF and MOV with a manifest second operand are encoded as
where the inst bit is clear for MOV and set for DEF, and the op type field gives the type of the value, encoded as in section 7.
9.2 Data processing
The inst field indicates the instruction:
The dest bit is set if the destination is present.
The inst field indicates the instruction:
The quot bit is set if the first destination is present, and the rem bit if the second destination is present.
The inst bit is clear for LD and set for ST. The off bit is set if a third (offset) register is given. The width field gives the width of the quantities being transferred, encoded as in section 3.1.
The adr field encodes the address type as in section 6. The condition is encoded as
9.5 Call and return
CALL is encoded as
where the f bit is set for CALLF, the c bit is set if the C modifier is used, and the v bit if the V modifier is used. The adr field encodes the address type as in section 6. The argument types are alternately 1 and 3-component numbers.
RET is encoded as
where the f bit is set for RETF.
SYNC is encoded as a separate instruction immediately following the instruction to which it is attached. Its opcode is 0001 0101b. It is not counted as a separate directive in the count in the module header (see section 12).
If the c bit is set, a chunk is being declared.
The width field gives the width of the literals. The op type field gives the literal type, encoded as in section 7. The list of manifests follows.
The zero bit is set if the space is zero-initialised; the width field gives the width of the words being reserved.
9.9 Other instructions
The remaining instructions are encoded thus:
Labellings, handlers and subroutines are encoded thus:
The lab type field gives the type of label, encoded as
|Label type||lab type|
If the ind bit is set the label is indirectable. A public label has an identifier after the opcode byte, starting with an underscore. (This means that the encoding is ambiguous, as the underscore could also represent part of another instruction.)
The labels are numbered consecutively from one.
Functions are encoded
where the l bit is set if the function is a leaf, the v bit is set if it is variadic, the c bit is set if it returns a chunk, and the pub bit if the label is public. A public function has an identifier after the opcode byte.
If the ro bit is set the following data is read-only; otherwise it is read-write. If the pub bit is set it is public, otherwise it is private. A public data label has an identifier after the opcode byte.
A module consists of a header and a list of directives.
The header starts with a magic number. Next comes a byte containing the version number of the encoding. The current version number is 0. Next comes the length of the module in bytes excluding the header, and finally the number of labels.
This document was translated from LATEX by HEVEA.
Last updated 2006/06/02