Doing it better 1998-11-11 ### I was just wondering what should be done in the single-platform part and what in the distributed part. So here's some ramblings for you to argue about if you want: The window manager should be in the single-platfrom part. The boundary between the window interface and the underlying application (which should obviously be distributed, or at least distributable) therefore needs to be drawn. I don't think either X (minimal local interface and hefty server) or Windows (everything local except files) have got it right. The front-end simply can't be reduced to images, keystrokes and mouse-clicks: we should model the BEHAVIOUR of the interface as data. This is the only way to get the kind of response people expect, given that we're probably going to be using a noisy network. A terminal then becomes a non-deterministic automaton driven by user input and by commands over the network, which produces images and sends messages to the host. The shape of the automaton can also change, in response to commands over the network, for example when a new button is added to a window. Stuff like scrolling a window, closing a dialogue box, opening a menu and selecting an item and so on should all be done locally. Only when the contents of a window change in a non-trivial way, or when significant background processing needs to be done, should the server do any work. If this system were properly designed, the terminal could adopt a simple, safe, stable scheduling policy (e.g. cooperative) without any risk of hanging. Terminals are also engines of computation, though, at least in their spare time. They are constantly receiving instructions from each other according to whatever distribution mechanism we choose to adopt. This mechansim must be able to cope with loss, for example if the power fails unexpectedly on some machines. We are therefore allowed to do things like aborting a computation if the terminal needs all of its memory for a window which has just opened (after all, a user must not be encouraged to unplug his terminal from the network just because somebody is simulating collisions between galaxies). Clearly the way forwards here is to employ a pre-emptive scheduler to allocate processing time to the distributed tasks, but only when the front-end is idle. Finally, a terminal is a bundle of devices, and the distributed applications can expect a reasonable response from disks and printers and things. Much of this sort of work happens in the background, using DMA or via the printer buffer, and the only work the processor does is to handle a few interrupts. Interrupts can be handled in a nice, efficient, naive RISC OS-like way, because they will never constitute more than a percent or two of the total load on the processor. Any more difficult processing should be delayed, in favour of the front-end's requirements, and performed along with all the distributed stuff. Of course, this all flies in the face of my 'market economy' resource allocation scheme, which is a different solution to the same thing. Perhaps we should try to manage our economy so it behaves a little like the hard-wired version...? It might help patch up the problem areas, notably real-time tasks (video, games, etc.) Anyway, that was a bit off the track. Configuration of a terminal can conveniently be broken into two parts: those that follow the user from machine to machine, and those that are specific to a particular box on a particular desk. The former (which include the keyboard repeat rate, the backdrop, the language, the preferred screen modes etc.) should be distributed, so a user doesn't necessarily have a 'home' machine (perhaps only a filespace somewhere), while the latter (which include the monitor definition file, the keyboard layout, the i.p. address etc.) should be stationary. Filing. This is a funny issue. It must be distributed, and yet it must involve specific devices. When using removable disks, a user can reasonably expect to be able to put certain files on certain disks. Even when using fixed disks, a user will want to be able to move files off a computer which he knows will be switched off soon. There is also the question of non-persistent storage. This is part of the distribution mechanism, and is used for things like open files, continuations, workspace and so on. It will probably be invisible to the user (though a RAM disk is often useful too). For all these differences, it will be stored on the same devices (disks and memory) as the persistent files, and may have a similar structure. Many criteria, no solution. It is also essential that we find a good model for sharing. _\[Essential but impossible, it turned out! [Alistair|A]\]_ This is going to be another aside. There are many types of sharing, all theoretically similar. There is the obvious code-page sharing, as for libraries, and copy-on-write. There is also linking of files. Filetypes can be seen as a way of sharing an editor between many files (cf including the editor in every file; Impression saving its documents as applications; postscript defining the fonts and rendering routines as part of the file; etc.). Then, there is version control. Depending on the distribution mechanism, we may also have non-deterministic choice, which is related to sharing because much of the state will be unaffected by the choice. There is also redundancy, such as keeping two copies of the same file on different sites, both for security and efficiency of access. Sub-expression elimination is another form of sharing, and it occurs at run-time at a course granularity (in things like font caches, and window contents pre-rendered as bitmaps). These are all related. Humans find good solutions to sharing problems easy, and often don't realise that that is what they are doing, but I think we can and should automate this burden using a nice unified mechanism. This mechanism should be distributed. The nice thing about a single machine is that it is much more reliable than many machines. I think it would be a mistake to pretend that the machine was simply part of a larger whole. If the network went down, I would expect to be able to continue using the machine, perhaps slightly slower and with limited functionality. At the very least, I would expect to be able to do diagnostics, such as pinging other machines, verifying disks and things, and possibly re-booting if necessary. In fact, maybe that is the first clue to a distribution mechanism. It would be inefficient to insist that a computation started on a particular machine were centralised and controlled by that particular machine (e.g. what if it is using a slow modem). However, if a fault were to occur, it would be nice to guarantee that the computation can be reconstructed from information held on the machine which originated it. If we were to apply this criterion at every granularity, we might get somewhere. Thoughts? Sorry for not organising my thoughts better. Alistair