Mite’s assembly language

Reuben Thomas

25th October 2000

1  Introduction

Mite’s assembly language is based on the abstract syntax. Where the two correspond exactly the semantics are the same; the semantics of departures from and extensions to the abstract syntax are given below.

2  Metagrammar

The grammar is described in a BNF-like notation. Terminal tokens are shown thus, and non-terminal tokens thus. Space or lack of it between tokens, including line breaks, is significant. Terms are formed from tokens and the following operators, given in decreasing order of precedence:

Zero or more repetitions
of a term are denoted by appending an asterisk, thus: A.
One or more repetitions
of a term are denoted by appending a plus sign, thus: A.
are denoted by a single terminal character before a repetition symbol: for example, ship, denotes a comma-separated list of one or more ships.
is denoted by textual concatenation, thus: AB.
is denoted by a vertical bar, thus: AB.
Optional terms
are enclosed in brackets: A cat’s-tail causes wounds!

Parentheses may be used to override precedence: for example, ABC means “A or B, followed by C”.

A production consists of the non-terminal being defined, followed by an equals sign, followed by the defining term: insect = headthoraxabdomen.

3  Identifier

[alphanumeric] 0123456789 d-digitABCDEF a-1ex…-1exzA-1ex…-1exZ letterd-digit_. letter_alphanumeric

An identifier is a string of letters, numbers, underscores and full stops, starting with a letter or underscore.

4  Number

h-digit:bodh -natural natural@natural@natural integer@integer@integer 124a

A natural number is a string of hex digits (see section 3) optionally followed by a colon and a base (b for binary, o for octal, d for decimal and h for hexadecimal); numbers may only contain digits allowed by the base. If there is no base the number is decimal. An integer is a natural with optional initial minus sign. Three-component numbers have the components separated by @.

Widths are given in bytes; a represents A/8.

5  Item

natural item

A reg is a register; T is not directly accessible.

6  Label

[] .identifier xl-label l-labelx-label label+size

A local label (l-label) is the address of a location (see section 9). An external label (x-label) refers to a label in another module.

When a label is used as an address the semantics of the instruction are preceded by Tl, and the label is replaced by T in the instruction’s signature.

7  Manifest

ashift #offsetconstantlabel-exp

The value of the constant ashift is log2A/8.

8  Instruction

[instruction] assignmentdataprocmemorybranchcallretthrowcatstackescapedatum

8.1  Assignment

[assignment] MOV reg,regmanifest
DEF reg,manifest
SWAP reg,reg

The instruction MOV r,m, where m is a manifest, has the semantics


The instruction DEF r,m has the same semantics as MOV r,m, but the register is made constant. The instruction UNDEF r makes r non-constant, as does MOV when applied to a constant register. Constant registers may only be modified by DEF, UNDEF, or MOV.

8.2  Data processing

[dataproc] 2-op reg,reg
3-op reg,reg,reg
4-op reg,reg,reg,reg arithmeticlogicalshift NEGNOT

When a destination is omitted, it is T. The only 3-operand instructions whose destination may be omitted are SUB, AND and XOR. At most one destination may be omitted in a 4-operand instruction.

8.2.1  Arithmetic

[arithmetic] ADDSUBMUL DIVSZ

The instruction DIVSZ q,r,x,y has the semantics


q and r must be distinct.

8.2.2  Logical


8.3  Memory

[memory] LD_width reg,[reg,reg]
ST_width reg,[reg,reg]
COPY_size item,item

LD or ST_w r1,[r2,r3] has the semantics


LD or ST(w,r1,T)

8.4  Branch

[condition] ALEQNEMIPLCSCCVSVCHILSLTGELEGT reglabel Bcondition address

A branch to the value of a register must be to an indirectable label (see section 9.2). The types of stack items active at both branch and destination must match (including the constancy of registers and values of constants).

8.5  Call and return

[natural,size,natural] [reg,] CALLFV address,natural,type-list sync
CALLFCV address,natural,item sync
RETF item,reg-list

In the instruction CALL a,p,[t1,… ,tn], a must be the value of a subroutine label (see section 9.4). If the address is a register, it must hold the address of an indirectable subroutine (see section 9.2). p is the number of parameters. The type list gives the format of the return values, from bottom-most to top-most on the stack. The list items give alternately a number of registers followed by the size of a chunk. All chunk sizes must be non-zero. Any registers returned are ranked in descending order from the top of the stack downwards, and the return values are ranked above registers already on the stack. sync is described in section 8.6.

CALLF has the same effect as CALL except that the system calling convention is used, and the number of return values must be zero or one. The first argument goes on top of the stack. CALLFV and CALLFCV are used to call a variadic function (see section 9.4); in this case the second operand is the total number of parameters being passed. CALLFC and CALLFCV are used when the function returns a chunk, and the third operand gives either a register holding the address to which the return value should be copied, or the chunk in which it should be stored.

In the instruction RET c,[r1,… ,rn], c must be the chunk placed on the stack on entry to the subroutine or function. The register list gives the return values, which must be in ascending stack order, and match the types in the corresponding CALL instruction. A RET is assumed to return from the textually most recently declared subroutine or function. RET must be used to return from subroutines, and RETF from functions.

8.6  Catch and throw

[throwcat] CATCH reg,l-label
THROW reg,reg,reg sync SYNC l-label

In the instruction CATCH s,l, l must be a handler’s label (see section 9.3) in the current subroutine. s is set to the corresponding stack pointer. In THROW l,s,c, l must be the value of a handler’s label, and s the address returned by CATCH for that label. The CATCH must have been executed in the current subroutine or function, or one of its callers.

A SYNC is performed before the semantics of the instruction to which it is attached. In SYNC l, l must be a handler’s label in the current subroutine. When a handler is reached via a THROW instruction, registers other than the top-most stack item have the same value as just before the last SYNC performed for that handler’s label in the instantiation of the subroutine or function which is thrown to, provided they have not been altered since.

8.7  Stack

RANK reg,natural

The instruction NEW creates a register with undefined value. The instruction NEW_s creates an s-byte chunk. KILL kills the top-most item on the stack. Items further down may not be killed.

Registers are ranked, the ranking giving the order in which they would ideally be assigned to physical registers. The rankings are distinct and contiguous, the highest being 1. The instruction RANK r,n changes the rank of register r to n. n must be between 1 and the number of registers. When a register is killed or ranked, the rankings of the other registers are adjusted accordingly. Newly created registers have rank 1. REBIND causes the bindings of virtual to physical registers to be updated to reflect the current ranking.

Stack instructions are interpreted statically: the NEW and KILL for a register must textually enclose all other uses. Apart from NEW and KILL the only instruction that affects the state of the stack seen by the textually next directive (see section 10) is CALL, which kills the parameters and creates the return values.

8.8  Escape

ESC #natural

ESC performs arbitrary actions.

8.9  Datum

LIT_width manifest,
SPACEZ_width size

The instruction LIT_w v1,… ,vn places the values v1 to vn in contiguous locations, starting at the next w-aligned address after the preceding datum, if any. Label values may only be used when w is a.

The instruction SPACEZ_w n reserves n w-words, starting at the next w-aligned address after the preceding datum, if any. If the Z modifier is used the space is zero-initialised.

9  Location

[location] codehandlersubroutinefunctiondata

A location assigns the address of the next piece of code or data to the given label.

A datum (see section 8.9) may only appear after a data, and other instructions may only appear after a non-data location. The flow of control must never fall through to a location other than a code or handler labelling.

9.1  Labelling


If p is used the label is public, and may be visible outside the module.

9.2  Code


If i is used the label may be used as the target of an indirect (register) branch.

9.3  Handler


A handler is the same as a labelling, except that it may also be given as the label for a CATCH instruction (see section 8.6). The top-most stack item at a handler must be a non-constant register.

9.4  Subroutine and function

[subroutine] sllabelling flcvlabelling

The code in a subroutine or function extends from its labelling to the next subroutine or function labelling. The state of the stack directly before the subroutine or function specifies the number and type of its parameters (with the exception of a function’s variadic parameters). Subroutines must be reached by CALL, and functions by CALLF.

If l is used in a subroutine or function labelling, the subroutine or function is a leaf routine, and may not perform any CALL instructions.

If c is used in a function labelling, the function returns a chunk. If v is used, the function is variadic, and the V modifier must be added to CALLF. On entry to a variadic function the variadic arguments are stored in chunk 1, which should be declared with size 0. Their layout is system-dependent. The non-variadic arguments should be declared as normal.

The return chunk, which is placed on top of the stack on entry to a function or subroutine, must not be written to.

9.5  Data


If r is used the data up to the next data location are read-only; otherwise they are writable. The data in a program define the initial contents of the memory. A data labelling has the same alignment as the first datum following it (see section 8.9).

10  Directive


11  Module



A comment, starting with a semicolon, may be placed at the end of any line or on a line by itself.

This document was translated from LATEX by HEVEA.

Last updated 2006/06/02