Mite’s object formatReuben Thomas6th November 2000 |
1 Introduction
Mite’s assembler writes object modules in the format given below, which is a direct encoding of the concrete syntax.
2 Presentation
Hexadecimal numbers are followed by an “h”; for example, 100=64h. Binary numbers are similarly suffixed “b”.
The encoding is presented diagrammatically. Boxes representing bit fields are concatenated to make bytes or larger words:
These in turn are listed vertically. The contents of a bit field is given either as literal binary digits (01101001), or as a name. Fields are usually labelled with their width:
The most significant bit is at the left-hand end of the word, and the least significant at the right-hand end. Multi-byte words are stored with their bytes in little-endian order.
Boxes labelled in ordinary type (box) represent units which themselves have internal structure. Boxes labelled in italics (scatola) are numbers (see section 3). Optional units are shown as dashed boxes:
Lists are denoted by a stack:
3 Number
Unsigned numbers are encoded as follows:
- A list of 7-bit words is formed by repeatedly removing the least significant seven bits of the number until none of the remaining bits is set.
- The 7-bit words are turned into bytes by the addition of a bit at the most significant end, which is zero for all the quantities except the first.
- The bytes are stored in the reverse order to that in which they were generated.
Signed numbers are encoded in the same way except that the list is formed by repeatedly removing the least significant seven bits of the number until all the remaining bits are the same as the most significant bit of the previous 7-bit word. Three-component numbers are encoded as three consecutive numbers.
3.1 Width
Widths of quantities are encoded as
Width | Code |
1 | 00b |
2 | 01b |
4 | 10b |
a | 11b |
4 Identifier
All strings are ASCII-encoded, preceded by a number giving their length.
5 Item
Stack items are encoded as a number (see section 3).
6 Address
Local addresses give the number of a label (see section 10), and are stored as a number. External addresses are stored as an identifier.
Address types are encoded as
Type | Code |
Register | 00b |
Local label | 01b |
Global label | 10b |
7 Manifest
The type of a manifest quantity is given by an op type field, which is encoded as
Operand type | op type |
number | 000b |
3-component number | 001b |
constant | 010b |
local label | 100b |
local label plus offset | 101b |
external label | 110b |
external label plus offset | 111b |
Constants are encoded as a single byte; for ashift the byte is 00h. A label expression is represented as the number or name of the label followed by the offset, which is a three-component number (see section 3).
8 Lists
The list elements are stored consecutively. The length is encoded as a number directly before the list elements.
9 Instruction
Instructions are encoded as the opcode followed by the operands, encoded in order from left to right. The operands are encoded as in the preceding sections; lists of items and types enclosed in brackets are stored as lists.
9.1 MOV and DEF
A MOV instruction whose second operand is a register is encoded as 0000 0000b. DEF and MOV with a manifest second operand are encoded as
where the inst bit is clear for MOV and set for DEF, and the op type field gives the type of the value, encoded as in section 7.
9.2 Data processing
9.2.1 Three-operand
The inst field indicates the instruction:
Instruction | inst |
ADD | 000b |
SUB | 001b |
AND | 010b |
OR | 100b |
XOR | 101b |
The dest bit is set if the destination is present.
9.2.2 Four-operand
The inst field indicates the instruction:
Instruction | inst |
DIV | 00b |
DIVS | 01b |
DIVSZ | 10b |
The quot bit is set if the first destination is present, and the rem bit if the second destination is present.
9.3 Memory
The inst bit is clear for LD and set for ST. The off bit is set if a third (offset) register is given. The width field gives the width of the quantities being transferred, encoded as in section 3.1.
9.4 Branch
The adr field encodes the address type as in section 6. The condition is encoded as
Condition | condition |
AL | 0001b |
EQ | 0010b |
NE | 0011b |
MI | 0100b |
PL | 0101b |
CS | 0110b |
CC | 0111b |
VS | 1000b |
VC | 1001b |
HI | 1010b |
LS | 1011b |
LT | 1100b |
GE | 1101b |
LE | 1110b |
GT | 1111b |
9.5 Call and return
CALL is encoded as
where the f bit is set for CALLF, the c bit is set if the C modifier is used, and the v bit if the V modifier is used. The adr field encodes the address type as in section 6. The argument types are alternately 1 and 3-component numbers.
RET is encoded as
where the f bit is set for RETF.
9.6 SYNC
SYNC is encoded as a separate instruction immediately following the instruction to which it is attached. Its opcode is 0001 0101b. It is not counted as a separate directive in the count in the module header (see section 12).
9.7 NEW
If the c bit is set, a chunk is being declared.
9.8 Datum
9.8.1 Literal
88manifest
The width field gives the width of the literals. The op type field gives the literal type, encoded as in section 7. The list of manifests follows.
9.8.2 Space
The zero bit is set if the space is zero-initialised; the width field gives the width of the words being reserved.
9.9 Other instructions
The remaining instructions are encoded thus:
Instruction | inst |
UNDEF | 0000 0001b |
SWAP | 0000 0010b |
NEG | 0000 0100b |
NOT | 0000 0101b |
MUL | 0000 1000b |
SL | 0000 1001b |
SRL | 0000 1010b |
SRA | 0001 0000b |
COPY | 0001 0001b |
CATCH | 0001 0010b |
THROW | 0001 0100b |
KILL | 0010 0000b |
RANK | 0010 0001b |
REBIND | 0010 0010b |
ESC | 0010 0100b |
10 Location
Labellings, handlers and subroutines are encoded thus:
[d]8name
The lab type field gives the type of label, encoded as
Label type | lab type |
ordinary | 00b |
handler | 01b |
subroutine | 10b |
leaf subroutine | 11b |
If the ind bit is set the label is indirectable. A public label has an identifier after the opcode byte, starting with an underscore. (This means that the encoding is ambiguous, as the underscore could also represent part of another instruction.)
The labels are numbered consecutively from one.
Functions are encoded
[d]8name
where the l bit is set if the function is a leaf, the v bit is set if it is variadic, the c bit is set if it returns a chunk, and the pub bit if the label is public. A public function has an identifier after the opcode byte.
11 Data
[d]8name
If the ro bit is set the following data is read-only; otherwise it is read-write. If the pub bit is set it is public, otherwise it is private. A public data label has an identifier after the opcode byte.
12 Module
11directive
A module consists of a header and a list of directives.
8version
24length
32labels
The header starts with a magic number. Next comes a byte containing the version number of the encoding. The current version number is 0. Next comes the length of the module in bytes excluding the header, and finally the number of labels.
This document was translated from LATEX by HEVEA.
Last updated 2006/06/02