Mite’s object format

Reuben Thomas

6th November 2000

1 Introduction

Mite’s assembler writes object modules in the format given below, which is a direct encoding of the concrete syntax.

2 Presentation

Hexadecimal numbers are followed by an “h”; for example, 100=64h. Binary numbers are similarly suffixed “b”.

The encoding is presented diagrammatically. Boxes representing bit fields are concatenated to make bytes or larger words:

8 [r]33[r]11[r]44

These in turn are listed vertically. The contents of a bit field is given either as literal binary digits (01101001), or as a name. Fields are usually labelled with their width:

8 [r]55[r]33

The most significant bit is at the left-hand end of the word, and the least significant at the right-hand end. Multi-byte words are stored with their bytes in little-endian order.

Boxes labelled in ordinary type (box) represent units which themselves have internal structure. Boxes labelled in italics (scatola) are numbers (see section 3). Optional units are shown as dashed boxes:

1 [dr]11

Lists are denoted by a stack:

1 1[r]11

3 Number

Unsigned numbers are encoded as follows:

  1. A list of 7-bit words is formed by repeatedly removing the least significant seven bits of the number until none of the remaining bits is set.
  2. The 7-bit words are turned into bytes by the addition of a bit at the most significant end, which is zero for all the quantities except the first.
  3. The bytes are stored in the reverse order to that in which they were generated.

Signed numbers are encoded in the same way except that the list is formed by repeatedly removing the least significant seven bits of the number until all the remaining bits are the same as the most significant bit of the previous 7-bit word. Three-component numbers are encoded as three consecutive numbers.

3.1 Width

Widths of quantities are encoded as

WidthCode
100b
201b
410b
a11b

4 Identifier

All strings are ASCII-encoded, preceded by a number giving their length.

5 Item

Stack items are encoded as a number (see section 3).

6 Address

Local addresses give the number of a label (see section 10), and are stored as a number. External addresses are stored as an identifier.

Address types are encoded as

TypeCode
Register00b
Local label01b
Global label10b

7 Manifest

The type of a manifest quantity is given by an op type field, which is encoded as

Operand typeop type
number000b
3-component number001b
constant010b
local label100b
local label plus offset101b
external label110b
external label plus offset111b

Constants are encoded as a single byte; for ashift the byte is 00h. A label expression is represented as the number or name of the label followed by the offset, which is a three-component number (see section 3).

8 Lists

The list elements are stored consecutively. The length is encoded as a number directly before the list elements.

9 Instruction

Instructions are encoded as the opcode followed by the operands, encoded in order from left to right. The operands are encoded as in the preceding sections; lists of items and types enclosed in brackets are stored as lists.

9.1 MOV and DEF

A MOV instruction whose second operand is a register is encoded as 0000 0000b. DEF and MOV with a manifest second operand are encoded as

8 1030111inst3op type

where the inst bit is clear for MOV and set for DEF, and the op type field gives the type of the value, encoded as in section 7.

9.2 Data processing

9.2.1 Three-operand

8 103inst30111dest

The inst field indicates the instruction:

Instructioninst
ADD000b
SUB001b
AND010b
OR100b
XOR101b

The dest bit is set if the destination is present.

9.2.2 Four-operand

8 102inst30111quot1rem

The inst field indicates the instruction:

Instructioninst
DIV00b
DIVS01b
DIVSZ10b

The quot bit is set if the first destination is present, and the rem bit if the second destination is present.

9.3 Memory

8 101inst30111off2width

The inst bit is clear for LD and set for ST. The off bit is set if a third (offset) register is given. The width field gives the width of the quantities being transferred, encoded as in section 3.1.

9.4 Branch

8 2112adr4condition

The adr field encodes the address type as in section 6. The condition is encoded as

Conditioncondition
AL0001b
EQ0010b
NE0011b
MI0100b
PL0101b
CS0110b
CC0111b
VS1000b
VC1001b
HI1010b
LS1011b
LT1100b
GE1101b
LE1110b
GT1111b

9.5 Call and return

CALL is encoded as

8 30111f1c1v2adr

where the f bit is set for CALLF, the c bit is set if the C modifier is used, and the v bit if the V modifier is used. The adr field encodes the address type as in section 6. The argument types are alternately 1 and 3-component numbers.

RET is encoded as

8 4100030111f

where the f bit is set for RETF.

9.6 SYNC

SYNC is encoded as a separate instruction immediately following the instruction to which it is attached. Its opcode is 0001 0101b. It is not counted as a separate directive in the count in the module header (see section 12).

9.7 NEW

8 4100130111c

If the c bit is set, a chunk is being declared.

9.8 Datum

9.8.1 Literal

8 30112width3op type
88manifest

The width field gives the width of the literals. The op type field gives the literal type, encoded as in section 7. The list of manifests follows.

9.8.2 Space

8 20030111zero2width

The zero bit is set if the space is zero-initialised; the width field gives the width of the words being reserved.

9.9 Other instructions

The remaining instructions are encoded thus:

Instructioninst
UNDEF0000 0001b
SWAP0000 0010b
NEG0000 0100b
NOT0000 0101b
MUL0000 1000b
SL0000 1001b
SRL0000 1010b
SRA0001 0000b
COPY0001 0001b
CATCH0001 0010b
THROW0001 0100b
KILL0010 0000b
RANK0010 0001b
REBIND0010 0010b
ESC0010 0100b

10 Location

Labellings, handlers and subroutines are encoded thus:

8 21030112lab type1ind
[d]8name

The lab type field gives the type of label, encoded as

Label typelab type
ordinary00b
handler01b
subroutine10b
leaf subroutine11b

If the ind bit is set the label is indirectable. A public label has an identifier after the opcode byte, starting with an underscore. (This means that the encoding is ambiguous, as the underscore could also represent part of another instruction.)

The labels are numbered consecutively from one.

Functions are encoded

8 1130111l1v1c1pub
[d]8name

where the l bit is set if the function is a leaf, the v bit is set if it is variadic, the c bit is set if it returns a chunk, and the pub bit if the label is public. A public function has an identifier after the opcode byte.

11 Data

8 310030111ro1pub
[d]8name

If the ro bit is set the following data is read-only; otherwise it is read-write. If the pub bit is set it is public, otherwise it is private. A public data label has an identifier after the opcode byte.

12 Module

1 1header
11directive

A module consists of a header and a list of directives.

32 32AD2BC0DEh
8version
24length
32labels

The header starts with a magic number. Next comes a byte containing the version number of the encoding. The current version number is 0. Next comes the length of the module in bytes excluding the header, and finally the number of labels.


This document was translated from LATEX by HEVEA.

Last updated 2006/06/02