Doing it better 1999-06-10
###

I was playing with ML and search algorithms for life the other day, when I
realised that ML doesn't do something which it could, and which would be
very useful. That thing is:

        Guarantee that when the same constructor is applied to the
        same arguments, the result will share the same memory location.

Or equivalently,

        Represent data so that the equality test reduces to pointer
        comparison.

This can be done for directed acyclic graphs (we're talking reference
counts), such as those which represent ML datatypes, very easily.

[detail starts]

My best plan to date is to give every storage cell an extra field which
serves as the root of a binary search tree which contains all cells of
which this is the left operand, sorted on the other operands.

The cons routine is then O(log(n)), as follows: We start with pointers to
the operands. We start at the left operand, and search the tree for a
match with the other operands. If we find one, we increment its reference
count and return it, otherwise we construct a new cell and add it in place
of a leaf, and increment the reference counts of the operands.

The garbage (or finalise) routine is also O(log(n)), as follows: When a
reference count reaches zero, decrement the reference counts of the
operands, and replace the cell with an indirection node which points to
the result of merging the two sub-trees. There is an additional cost
later, as every time somebody finds the indirection node they must
short-circuit it until it becomes redundant.

[detail ends]

This has the following costs and benefits:

 - Cons is slower but often uses no memory.
 - Equality testing is now O(1).

It is also possible to extend the scheme to directed cyclic graphs, since
a directed cyclic graph can be treated as a finite state automaton, and we
can reduce FSAs to a canonical form. The analogy is as follows: From any
node of the graph, we can obey a string of instructions each of which
selects one of the operands. We accept any string which leads to another
node, and reject any that lead to non-existant operands (e.g. trying to go
below a base type). Nodes which accept the same set of strings can be
shared.

To construct the canonical representation of a FSA is a simple algorithm,
normally O(n) I think. To do it incrementally (updating after every cons
or finalise operation) is harder, but not impossible.

[details omitted]

This scheme has the same costs (constant multiplying factor) and benefits
as the previous one, but additionally allows letrecs and function bindings
to be shared. It is not quite up to the optimal reduction kind of sharing,
because you still can't share an expression without sharing its
sub-expressions, but this doesn't matter as it is actually better than
optimal reduction as explained below...


I propose not to distinguish cons from apply. Writing what looks like a
function application therefore merely constructs a tuple (which is exactly
what would happen in a lazy language anyway). To force the computation to
occur, you have to use a built in instruction EVAL on the tuple.

This is great for several reasons. First, it will recognise when the same
function is applied to the same argument. This is run-time common
sub-expression elimination, which is much more powerful than optimal
reduction because it spots things that can be shared despite having
completely different origins.

[detail starts]

To evaluate an application in the acyclic scheme, I propose to construct a
tuple of EVAL and the tuple of the function and its argument. This tuple
will have an extra field pointing to the result of the function. If ever
the same result is needed again, it will be looked up instead of being
recalculated. This drops straight out of the cons routine.

Cacheing every result (which is effectively what this achieves) could
waste memory, but this is not serious as it is only a cache, and it can be
flushed if memory becomes tight. Often, the results must be kept hanging
around anyway (e.g. because they were included in another data structure).

Doing something similar in the acyclic scheme allows ludicrous things to
happen. For example, if you write a Y-combinator using lambdas, and apply
it to something, it will magically become a let-rec!

[detail ends]

So where does the Forth come in? When defining the functions. A function
of type a*b*c -> d*e takes three items of types a, b and c off a data
stack and replaces them with two items of types d and e. The language used
to write the function looks quite imperative, since it consists of a
; separated list of stack permutation, reverse-polish arithmetic, cons and
eval instructions. However, there is no memory access (except via cons) so
the language is side-effect free, and therefore deserves to be called
functional.

The type-checking performed by an ordinary ML interpreter can be applied
just as easily to the stack machine code, including all the fancy
parameterised types and things, because ML is just a different way of
representing precisely the same programs. The advantage of the ML way of
doing things is that you can name the arguments, which makes for readable
code. The advantage of the Forth-like way of doing things is that it can
be implemented in a day.

If you learn another lesson from Forth, and provide a primitive function
COMPILE which accepts a datatype which represents a program and returns a
usuable function, then you can write a proper (pure-functional) ML
interpreter within functional Forth. I'm sure that the only reason ML
itself doesn't have a compile function is that it can't be typed. We ought
to talk about mixed static and dynamic type checking, when you have a
leisurely moment.


I think I'm going to implement this language as an excercise in semantics,
which I am learning at the moment. I'm sure SPJ would be interested. What
do you think?

Alistair