Doing it better 1999-06-10 ### I was playing with ML and search algorithms for life the other day, when I realised that ML doesn't do something which it could, and which would be very useful. That thing is: Guarantee that when the same constructor is applied to the same arguments, the result will share the same memory location. Or equivalently, Represent data so that the equality test reduces to pointer comparison. This can be done for directed acyclic graphs (we're talking reference counts), such as those which represent ML datatypes, very easily. [detail starts] My best plan to date is to give every storage cell an extra field which serves as the root of a binary search tree which contains all cells of which this is the left operand, sorted on the other operands. The cons routine is then O(log(n)), as follows: We start with pointers to the operands. We start at the left operand, and search the tree for a match with the other operands. If we find one, we increment its reference count and return it, otherwise we construct a new cell and add it in place of a leaf, and increment the reference counts of the operands. The garbage (or finalise) routine is also O(log(n)), as follows: When a reference count reaches zero, decrement the reference counts of the operands, and replace the cell with an indirection node which points to the result of merging the two sub-trees. There is an additional cost later, as every time somebody finds the indirection node they must short-circuit it until it becomes redundant. [detail ends] This has the following costs and benefits: - Cons is slower but often uses no memory. - Equality testing is now O(1). It is also possible to extend the scheme to directed cyclic graphs, since a directed cyclic graph can be treated as a finite state automaton, and we can reduce FSAs to a canonical form. The analogy is as follows: From any node of the graph, we can obey a string of instructions each of which selects one of the operands. We accept any string which leads to another node, and reject any that lead to non-existant operands (e.g. trying to go below a base type). Nodes which accept the same set of strings can be shared. To construct the canonical representation of a FSA is a simple algorithm, normally O(n) I think. To do it incrementally (updating after every cons or finalise operation) is harder, but not impossible. [details omitted] This scheme has the same costs (constant multiplying factor) and benefits as the previous one, but additionally allows letrecs and function bindings to be shared. It is not quite up to the optimal reduction kind of sharing, because you still can't share an expression without sharing its sub-expressions, but this doesn't matter as it is actually better than optimal reduction as explained below... I propose not to distinguish cons from apply. Writing what looks like a function application therefore merely constructs a tuple (which is exactly what would happen in a lazy language anyway). To force the computation to occur, you have to use a built in instruction EVAL on the tuple. This is great for several reasons. First, it will recognise when the same function is applied to the same argument. This is run-time common sub-expression elimination, which is much more powerful than optimal reduction because it spots things that can be shared despite having completely different origins. [detail starts] To evaluate an application in the acyclic scheme, I propose to construct a tuple of EVAL and the tuple of the function and its argument. This tuple will have an extra field pointing to the result of the function. If ever the same result is needed again, it will be looked up instead of being recalculated. This drops straight out of the cons routine. Cacheing every result (which is effectively what this achieves) could waste memory, but this is not serious as it is only a cache, and it can be flushed if memory becomes tight. Often, the results must be kept hanging around anyway (e.g. because they were included in another data structure). Doing something similar in the acyclic scheme allows ludicrous things to happen. For example, if you write a Y-combinator using lambdas, and apply it to something, it will magically become a let-rec! [detail ends] So where does the Forth come in? When defining the functions. A function of type a*b*c -> d*e takes three items of types a, b and c off a data stack and replaces them with two items of types d and e. The language used to write the function looks quite imperative, since it consists of a ; separated list of stack permutation, reverse-polish arithmetic, cons and eval instructions. However, there is no memory access (except via cons) so the language is side-effect free, and therefore deserves to be called functional. The type-checking performed by an ordinary ML interpreter can be applied just as easily to the stack machine code, including all the fancy parameterised types and things, because ML is just a different way of representing precisely the same programs. The advantage of the ML way of doing things is that you can name the arguments, which makes for readable code. The advantage of the Forth-like way of doing things is that it can be implemented in a day. If you learn another lesson from Forth, and provide a primitive function COMPILE which accepts a datatype which represents a program and returns a usuable function, then you can write a proper (pure-functional) ML interpreter within functional Forth. I'm sure that the only reason ML itself doesn't have a compile function is that it can't be typed. We ought to talk about mixed static and dynamic type checking, when you have a leisurely moment. I think I'm going to implement this language as an excercise in semantics, which I am learning at the moment. I'm sure SPJ would be interested. What do you think? Alistair