Doing it better 1998-11-11
###


I was just wondering what should be done in the single-platform part and
what in the distributed part. So here's some ramblings for you to argue
about if you want:

The window manager should be in the single-platfrom part. The boundary
between the window interface and the underlying application (which should
obviously be distributed, or at least distributable) therefore needs to be
drawn. I don't think either X (minimal local interface and hefty server)
or Windows (everything local except files) have got it right.

The front-end simply can't be reduced to images, keystrokes and
mouse-clicks:  we should model the BEHAVIOUR of the interface as data.
This is the only way to get the kind of response people expect, given that
we're probably going to be using a noisy network. A terminal then becomes
a non-deterministic automaton driven by user input and by commands over
the network, which produces images and sends messages to the host. The
shape of the automaton can also change, in response to commands over the
network, for example when a new button is added to a window. Stuff like
scrolling a window, closing a dialogue box, opening a menu and selecting
an item and so on should all be done locally. Only when the contents of a
window change in a non-trivial way, or when significant background
processing needs to be done, should the server do any work. If this system
were properly designed, the terminal could adopt a simple, safe, stable
scheduling policy (e.g. cooperative) without any risk of hanging.

Terminals are also engines of computation, though, at least in their spare
time. They are constantly receiving instructions from each other according
to whatever distribution mechanism we choose to adopt. This mechansim must
be able to cope with loss, for example if the power fails unexpectedly on
some machines. We are therefore allowed to do things like aborting a
computation if the terminal needs all of its memory for a window which has
just opened (after all, a user must not be encouraged to unplug his
terminal from the network just because somebody is simulating collisions
between galaxies). Clearly the way forwards here is to employ a
pre-emptive scheduler to allocate processing time to the distributed
tasks, but only when the front-end is idle.

Finally, a terminal is a bundle of devices, and the distributed
applications can expect a reasonable response from disks and printers and
things. Much of this sort of work happens in the background, using DMA or
via the printer buffer, and the only work the processor does is to handle
a few interrupts. Interrupts can be handled in a nice, efficient, naive
RISC OS-like way, because they will never constitute more than a percent
or two of the total load on the processor. Any more difficult processing
should be delayed, in favour of the front-end's requirements, and
performed along with all the distributed stuff.

Of course, this all flies in the face of my 'market economy' resource
allocation scheme, which is a different solution to the same thing.
Perhaps we should try to manage our economy so it behaves a little like
the hard-wired version...? It might help patch up the problem areas,
notably real-time tasks (video, games, etc.)

Anyway, that was a bit off the track.

Configuration of a terminal can conveniently be broken into two parts:
those that follow the user from machine to machine, and those that are
specific to a particular box on a particular desk. The former (which
include the keyboard repeat rate, the backdrop, the language, the
preferred screen modes etc.) should be distributed, so a user doesn't
necessarily have a 'home' machine (perhaps only a filespace somewhere),
while the latter (which include the monitor definition file, the keyboard
layout, the i.p. address etc.)  should be stationary.

Filing. This is a funny issue. It must be distributed, and yet it must
involve specific devices. When using removable disks, a user can
reasonably expect to be able to put certain files on certain disks. Even
when using fixed disks, a user will want to be able to move files off a
computer which he knows will be switched off soon. There is also the
question of non-persistent storage. This is part of the distribution
mechanism, and is used for things like open files, continuations,
workspace and so on. It will probably be invisible to the user (though a
RAM disk is often useful too). For all these differences, it will be
stored on the same devices (disks and memory) as the persistent files, and
may have a similar structure. Many criteria, no solution.

It is also essential that we find a good model for sharing. _\[Essential but impossible, it turned out! [Alistair|A]\]_ This is going
to be another aside. There are many types of sharing, all theoretically
similar. There is the obvious code-page sharing, as for libraries, and
copy-on-write. There is also linking of files. Filetypes can be seen as a
way of sharing an editor between many files (cf including the editor in
every file; Impression saving its documents as applications; postscript
defining the fonts and rendering routines as part of the file; etc.).
Then, there is version control. Depending on the distribution mechanism,
we may also have non-deterministic choice, which is related to sharing
because much of the state will be unaffected by the choice. There is also
redundancy, such as keeping two copies of the same file on different
sites, both for security and efficiency of access. Sub-expression
elimination is another form of sharing, and it occurs at run-time at a
course granularity (in things like font caches, and window contents
pre-rendered as bitmaps). These are all related. Humans find good
solutions to sharing problems easy, and often don't realise that that is
what they are doing, but I think we can and should automate this burden
using a nice unified mechanism. This mechanism should be distributed.

The nice thing about a single machine is that it is much more reliable
than many machines. I think it would be a mistake to pretend that the
machine was simply part of a larger whole. If the network went down, I
would expect to be able to continue using the machine, perhaps slightly
slower and with limited functionality. At the very least, I would expect
to be able to do diagnostics, such as pinging other machines, verifying
disks and things, and possibly re-booting if necessary.

In fact, maybe that is the first clue to a distribution mechanism. It
would be inefficient to insist that a computation started on a particular
machine were centralised and controlled by that particular machine (e.g.
what if it is using a slow modem). However, if a fault were to occur, it
would be nice to guarantee that the computation can be reconstructed from
information held on the machine which originated it. If we were to apply
this criterion at every granularity, we might get somewhere. Thoughts?


Sorry for not organising my thoughts better.

Alistair