Log

Proof sought

Christmas 2010 was when half the people I know got Kindles. I even got one myself, as a Christmas present for my girlfriend. Suddenly, there are millions more pairs of eyeballs on digital books, many thousands of which belong to acute brains adept at finding errors and misprints. And nearly all of them, I fear, are going to waste.

There are three reasons for this.

First, the Kindle does not encourage you to read free books. (There’s little point proofing commercial books, not so much because they tend to contain fewer errors as because I’ve never come across a non-academic publisher who cares; and anyway, if you’ve paid for a book, shouldn’t someone have proof-read it? I’d be interested to see publisher initiatives here, though; bug bounties, anyone?) This problem is easily fixed, though: many sites offer a huge range of books you can download to your Kindle via its built-in web browser, including the oldest, biggest, and best free online book repositories, Project Gutenberg. So go to, sample the tens of thousands (and rapidly increasing) treasure trove of public domain works, and never pay for a downloaded book again.

Secondly, no ebook or reading program I’ve yet seen has built-in functionality for noting errata. I use bookmarks in FBReaderJ; Kindle users can use notes. But even this primitive method is easy to use and I find it rarely interrupts the flow of reading, even for books containing hundreds of errors. So, note any errors you find!

Thirdly, online libraries often don’t make it obvious that they welcome reports of typos and errors (they do!), or make it easy to send them. (Project Gutenberg changed its email addresses last year, to reduce spam. This wasted my time last year when they introduced the “2010” suffix in April, and again this year when I had to go and check to see if they’d decided to update automatically every year. It seems not. Maybe in April? Really, Gutenberg, just use spam filters, it’s what they’re for.) Many other sites repackage texts from Gutenberg; some fail to update to the latest version, such as Feedbooks, whose books work better on my phone than Gutenberg’s own, but often contain errors already fixed on Gutenberg. (They told me they have to apply updates to their books by hand; I have offered them help with automatic updating, which isn’t rocket science, using tools that programmers use all the time, without success so far.) It’s clearly best to report errors to the original source of the text if at all possible, but if you can’t, don’t worry. Spend a couple of minutes finding how to report typos to wherever you got your book from, and try it; you’ll soon find out if they’re unappreciative.

We have an amazing resource here, and it’ll only get better (if it isn’t. Digital editions, unlike their paper forbears, need not go out of print: errors can be fixed forever. If every reader of a free ebook reports half a dozen errors, even dodgily scanned texts will soon shine. And this is for everybody: free ebooks can be printed and bound, so allowing imaginative publishers, libraries and donors to get them into the hands of those who can’t afford ordinary books, let alone a Kindle.

But aren’t lots of people doing this already? It doesn’t seem so: in 2010 Project Gutenberg started an automated errata tracker, which allocates each new errata report a different number. By the end of 2010 it was up to about 500; by contrast, in 2010 alone several different open source software projects racked up over 100,000 bug reports each [Stop Press: Gutenberg now seems to have abandoned automated erratum numbering.]. Despite its richness, Gutenberg has a handful of full-time employees, and runs on volunteer labour and donations (by definition, they can’t ask for money for their books). And that’s just the biggest Gutenberg project, in the US. To avoid exposing itself to the vagaries of international copyright law across different régimes, the various Gutenberg sites in different countries are entirely independent. Gutenberg Australia, which is the second-biggest original source for English books after Gutenberg US, is run by one person, the heroic Colin Choat, in his spare time.

So, please help!

Unchanging rhetoric on higher education

I tried to post this as a comment to the Guardian article Universities must cut private schools intake, says Simon Hughes, but the web site said “Your browser sent a request that this server could not understand.”

The tune of the government, sadly but unsurprisingly, never changes, and continues in its hypocritical vein, suggesting that ministers are not really interested in improving access to the élite universities most of them went to.

If they were, then we might hope ministers to tell us how things have changed over time (how is access to Oxbridge now compared to 30 years ago? much better than it was, but still some way to reflecting society), to laud successes, and to commission and act on research to improve things further.

And they might stop implying that what Oxbridge want to do is keep the plebs out and keep educating the rich.

I was briefly acting Director of Studies in Computer Science at one of the bigger Cambridge colleges in the late 90s. My successor, a state-school educated Northener, had to address the problem that applications were falling off (apparently in the 21st century, computers are no longer cool), and there were barely enough applicants for the places available, let alone good applicants. So, he went on the road, mostly to state schools whose students weren't applying to Cambridge. In jeans and T-shirt he'd talk to kids, encouraging them to apply, and to their teachers. Often, it was among the teachers he'd meet the most resistance: “Even if we did get our kids to apply to Cambridge, we wouldn't apply to you, you're from a posh college,” was one of the more bizarre comments he got.

Cambridge has had a university-wide programme to widen access which has been in place since I was an undergrad there nearly twenty years ago. My college has its own scheme too, and staff were encouraged to do the sort of thing my friend did. The University is desperate to get good students (even at the peak of the Computer Science boom, in the early 90s, the department was worried that the maths skills of applicants were weak), and it doesn't care where they come from.

The way the government is increasingly piling up-front costs on to students, the answer is going to be “from rich families and/or abroad”. The new funding system may be rational, it may even be fair, but it won't broaden access, and my friend will still be left wondering where on earth he is supposed to find the next generation of computer scientists.

Sat, 08 Jan 2011 19:29:32

Kindle 3 is a good first attempt

Giving my girlfriend a Kindle for Christmas was the carrot in a multi-pronged strategy to avoid needing more bookshelves (the stick being “I will start giving away your books” and my contribution being to archive books I’ve read (or return the many that aren’t even mine). This therefore required that I stocked it with books before she got her hands on it, which in turn was all the excuse I needed to play with the thing.

My lazy solution was simply to download all of Feedbooks; I wrote some scripts to make this actually lazy, rather than brain-numbingly dull. In the process I found that while the Kindle is nice to hold and great to read, it struggles to cope with a large collection of books (even though the nearly 3,000 volumes of Feedbooks only half-filled its 4Gb memory), and is woeful as a research tool. And, of course, Amazon’s first-mover-evil surfaced early.

Here are the problems I had:

Amazon’s own store doesn’t seem to contain free books. I think it’s poor form not to give people a straightforward choice of free editions of out-of-copyright works. The Kindle may be a loss leader, but at £109 it’s still not cheap. Feedbooks, rather than integrating easily into the Kindle, like, say, a 3rd-party software provider into Ubuntu’s Software Center, provide a catalogue which itself is in the form of a book, doesn’t automatically update, and offers a list ordered only by title. In other words, it’s useless; one is better off using the built-in web browser to search the online catalogue…
…or better, another browser, since the Kindle’s is woefully slow (and I don’t just mean the screen update). It’s just about usable, and hence useful in an emergency, but is no good as, for example, an online research tool to use in parallel with the books you have downloaded, although…
…offline search is awful too. With just the few ebooks that come loaded on the device, it was slow; with the thousands of books I loaded, it simply locked up the device, even when trying to search in the manual, presumably already indexed. The Kindle seems to index its contents in the background, but even now, over a week later, search doesn’t work. The only effective navigation is by a book’s table of contents, and, to choose which books to read, the user-definable collections, though…
…collections are a pain to set up for many books, as you have to select each book manually; there is no way I have found to select a range. (Fortunately, I was able to define collections programmatically, but this will be beyond most users.)

In summary, it’s a lovely device, but the software is rather toytown. Amazon could improve it (and indeed, the 3.0.3 firmware update, at the experimental stage when I checked, claims, vaguely, “performance improvements”), but given that their main interest is in selling books and Kindles, I’m not hopeful that it will happen before the next hardware iteration; whether it happens at all depends on competition, and there should be plenty of that, to go by the number of other ebook readers.

Reuben Thomas, 3rd January 2011

The cost of flattery

Reading Neal Stephenson’s mercurial “Quicksilver”, I came across a wonderful scene in which the founder of the Royal Society, has just dropped everything to answer the Charles II’s question as to how ants’ eggs can be bigger than ants. When his assistant, Daniel Waterhouse, returns from digging up an ants’ nest, he finds

Wilkins had begun dictating, and Charles Comstock scribbling, a letter back to the King—not the substantive part (as they didn’t have an answer yet), but the lengthy paragraphs of apologies and profuse flattery that had to open it: “With your brilliance you illuminate the places that have long, er, languished in, er,—” “Sounds more like a Sun King allusion, Revered,” Charles warned him. “Strike it then! Sharp lad. Read the entire mess back to me.”

This struck me as ridiculous; more so than that Robert Hooke should drink mercury to cure his headaches, or that Isaac Newton should believe in a universal animating principle, because the scientific errors pointed merely to a lack of knowledge, whereas wantonly wasting the time of the country’s best minds on sucking up to a King, well, that would never happen today…And then I remembered research proposals. Buzzword-based box-ticking rather than fawning flattery, perhaps, but is there a real difference?

At least, I thought, working unpaid on free software I don’t have to do that. I do spend quite a lot of time worrying about standards and compatibility, oh, and licensing: the world is still divided into religions. And portability: the world is still divided into nations; and the drudge-work that goes into overcoming the divisions is much less fun than eulogising monarchs; rather, I’m doing the work of machines. Where my forbears flattered their rulers, I must flatter my slaves.

At least these are all problems I can attack directly as part of my work. For those dependent on research funding, it’s the same sacrifice as ever.

Developer packages are for packagers, not developers

Increasingly over the past few years I’ve become annoyed when building programs from source that my system libraries are out of date. There are several possible solutions:

Give up and wait for one’s distro to be updated with the required version of the libraries (and possibly the program one is trying to build).
Install newer development libraries from an unstable version of one’s distro.
Install newer development libraries from source.

Obviously, the sane option is the last one (assuming that you really do want to build the program from source, and aren’t just impatient to get your hands on the latest version). The middle option, in particular, is fraught with hazards. The only reason one thinks of it is because it’s the obvious, neat thing to do. After all, those library development packages are provided for a reason, right?

Maybe not. After all, development packages can also cause the converse problem: having updated our distro, we discover that a program we have installed from source no longer runs. And then—horror!—it no longer builds with the updated libraries we have.

And there are other problems. Many popular languages have well-developed packaging systems for libraries, such as CPAN and Rubygems, which may or may not be well-integrated into the distribution. For example, in Debian, ruby libraries installed from Debian are not registered as gems, so that if you are trying to build a program which depends on gems, you have to re-install its dependencies as gems.

While development packages are essential for building Debian itself, they should, like source packages, be something that users normally do not install. But development packages are undeniably convenient to install and use, so how do we continue to make it easy to get hold of development libraries? There are two answers to this: for languages with their own packaging systems, use them; for those without, there are already source-friendly packaging systems that can be used. In both cases, distros’ own packaging systems could be more friendly to other packaging systems, acting more as a meta-packaging system. There are many difficulties here; I certainly don’t anticipate ending the practice of natively packaging those libraries and programs needed to build a distro any time soon.

(There is another interesting, wider, discussion here: the idea that a distro should update all its packages in lock-step every six months, from the stable and mission-critical to the barely-used and bleeding-edge, is clearly nonsense. Ubuntu’s 6-monthly updates are a good compromise between stability and up-to-dateness, but breakages in both directions (kernels that no longer support a particular piece of hardware; rapidly developing programs that are out-of-date even before they are shipped) are still common.)

I suggest, therefore, that distros have separate (but probably overlapping) policy for packaging the software needed to build the system from that for installing upstream sources into a development system. In the longer term, this will benefit not just users who are developers, but all users on the one hand, increased choice, as far more apps can be installed using the distro’s standard package manager; on the other, the distro’s developers can stop wasting effort repackaging rarely-used or developer-only software, and spend more effort on that part of the distro that really makes a difference: providing a well-integrated core set of apps, including the ability safely and easily to download and install non-core apps. There’s great scope to use cross-distribution packaging tools like Zero Install; cross-distribution packaging won’t be widely used until it’s integrated with distros’ own packaging tools. Finally, application developers will have much greater incentive to package their own applications, once it can be done in a cross-platform way to which users are likely to have access.

In short, it’s time for package managers to scale up.

Reuben Thomas, 17th December 2010

Diet fads considered mostly harmless

Although there’s a lot of dodgy science and cynical profiteering in and around diet fads, and they are worth criticising on those grounds, most of them are actually an improvement on what the majority of people eat. And even where they’re not, if they even get people thinking more about what they eat, that’s a good thing. Putting lifestyle and diet before pharmacological fixes is mostly a good thing, except where the diet makes unwarranted medical claims.

I suspect that the effect of public health policy is much more important, and much more deceptively deleterious to health, as it operates consistently over much longer periods of time.

2010-12-15

Why I went back to pull email

(Note to the technically precise: yes, I know what “push” and “pull” email really are, but I’m talking about the user perspective here, not what happens under the hood.)

In May 2009 I got my first Android phone, and was delighted that it signalled the start of converged messaging. And then quickly turned off the notification ringtone for emails, because I get too many to want to hear about each one. But I kept the visible notifications, in particular, the blinking of the phone’s indicator light.

At the end of March 2010, I switched off email notifications. I had been reduced to the sort of person who pulls out their phone every two minutes to check if they have a message. I was manually implementing push email by polling my phone.

Now that I have to open the email program, I’ve stopped. I read and write far fewer emails on my phone, but occasionally it’s very useful to have the facility.

And I miss the fact that I still don’t have integrated messaging (SMS, IM, email &c.).

But a little thought shows that it’s a hard problem. At the moment we use different media for different things, though exactly what those uses are varies from person to person: many people on contracts with unlimited SMSes text each other in the way I’d use online instant messaging, whereas I use it for urgent messages, or for people for whom I don’t have an email address. It strikes me that GMail’s “Priority Inbox” might make a suitable feed for notification, but I still receive few emails that are urgent enough to interrupt me for, and though importance filtering is pretty good (it’s just adding one more category to the spam/ham dichotomy), I’m not aware of any urgency filters. And the thought of applying it to telephony too, so that my phone decides who gets to ring me and who goes to voice mail at what time still seems far-fetched.

Reluctant as the unifier in me is to admit it, the simplest solution to this problem is still to segment my communications by medium. It’s easy (for humans), it works, and it doesn’t involve changing existing media to cope with new ones.

Needs More Work.

E-text formats are a waste of time

And I’m not talking about Amazon’s Apple-stylee first-mover vertically-integrated land grab for which I have equal parts contumely (the software) and covetousness (the hardware).

What I want to stop is the immense amount of effort being wasted, especially in the free software and commons communities, on dreaming up and implementing e-text formats, and on providing free e-texts in those formats.

What’s the problem? First, the texts. The majority of free texts come from the various Projects Gutenberg. (“Projects” plural, indeed: a compartmentalised approach to different copyright régimes being the simplest way to take advantage of each and avoid legal problems, the sites have no formal connection with each other, and hence require separate staff and servers in every country.) They are surprisingly thinly staffed, many being one-person operations, and even the mighty US site apparently involving only a handful. The Gutenberg ethos is intentionally decentralized, and so far the US project, at least, has used text and lightly-marked-up HTML formats as its canonical formats. This is great for accessibility and long-term archiving, but it’s lousy for providing e-readers with rich and accurate metadata, or even simple things like contents and footnotes.

So we need a proper e-text format, then? A lot of people seem to think so, and there are half-a-dozen supported by various e-reader programs. FBReader, my favourite, supports FB2, ePub, plucker, Mobipocket, Open E-Book, OpenReader and Palmdoc, and its web site lists several more which it doesn’t support. Project Gutenberg has experimental support for generating some of these formats automatically, but the results are rather poor. Meanwhile, sites like FeedBooks lovingly hand-craft metadata, and add nice touches, such as scans of original cover images, but must redo the work each time the original text is updated (mostly, when corrections are submitted). But none of these formats is really any better than the structured markup format we all use all the time: HTML. Rather than pour effort into defining and promulgating new formats (though at least most of the recent efforts are XML languages), and then implement them in readers, why not just agree on some conventions for semantic HTML markup, or even a microformat? No new software would be needed (though for offline reading, we could do with a good HTML 5 reader application until browser authors implement the decent offline reading support that is so obviously missing).

Corrections are another major source of waste: surprisingly few readers seem to submit them. I say this purely from personal experience: Gutenberg’s corrections email address, which until recently was quite hard to find, has, since early 2010, been linked to a ticketing system, and when I last submitted a correction a couple of months ago, my ticket number was in the 400’s, which suggests a tiny trickle of corrections; meanwhile I’ve found hundreds of errors in texts released years ago by other sites, and my feedback is treated with a degree of gratitude and dispatch that suggests it is rare. Given this state of affairs, you would hope that there would be measures in place to collate corrections from downstream suppliers such as Feedbooks, but no such luck. The sorts of links that are commonly forged in the free software community to share bug fixes between software distributors and authors, and enable users to easily report bugs, seem to be virtually absent in the world of free e-texts.

So here are my gradus ad Parnassum of free e-texts:

Forge links between the repositories and their free and commercial redistributors. Push improvements upstream. The Gutenbergs should be demanding this, as it provides an incentive to the commercial redistributors to continue to innovate. (Of course, proper licensing would help; while it may be too late for existing books, the rate at which new books are being added would soon create a useful lever to crack open any reluctant partners.)
Help the repositories and redistributors automate their efforts. There are many things that could be automated or automated better: extraction of metadata from non-marked-up texts, automatic application of corrections to downstream marked-up texts, and automatic generation of corrections directly from reading programs and devices, by readers.
Get readers to help. Many willing helpers will be unaware that help is sought or even required, and others put off by not knowing how to help. Making correction an obvious feature of all e-readers might well cause the number of corrections to leap; similar functionality would in any case be a boon for scholarly and recreational noters and doodlers, and encourage both old and new forms of interaction with, and perhaps most interestingly, through texts.

Michael S. Hart’s original vision of getting the world’s great texts into everyone’s hands is well on the way to being fulfilled, and we know how to bring it to fruition. Yes, the vast majority of texts are Western literature, and the files all require electronic devices to read, but consider: Gutenberg includes images, recordings, and even “Night of the Living Dead”, so the model works not only for any written language, but even languages without a written form; and there’s nothing to stop written texts being printed, or audio texts broadcast or transferred to tape. Project Gutenberg’s methods of digitisation can be applied to virtually any human language, and dissemination is not limited to digital technologies.

It’s time to enlarge the vision.

Living in layers

Perhaps it does not, after all, diminish one’s openness to the world to fill it with meaning on many levels as so many past generations did. Done well, not with a closed reading but with an open resonance, it can rather invite yet more connections and, rather than immersing ourselves in the moment, by immersing the moment in ourselves, connect with it at more points than that of our consciousness.

7th October 2009

Unravelling the multiculturalism debate

Few contemporary debates arouse as much visceral emotion even among those they do not directly affect as that over multiculturalism. And unlike arguments in some other areas, this is unavoidable, as this one touches on our identity and fundamental beliefs, which, though they may be informed by reason, are not primarily rational.

Nevertheless, a bit of careful thinking can help untangle the issues involved, and also avoid the tendency for debate to degenerate into a fight between mutually antagonistic communities that have little understanding or sympathy for each other, nor desire to achieve it.

It also shows us that fundamentally we do have a debate about values on our hands, and that this is a struggle that could (and probably will) end up changing our culture, and there will definitely be losers as well as winners. What is important is the less obvious point that the losers and winners overlap, as do the communities mentioned above, which exist along different axes, according to the question under discussion. As usual, the way to make progress and generate light rather than heat is to debate ideas, not attack people.

I see three principal fronts in the debate. The first is between integration and multiculturalism, between those who think that that immigrants should assimilate, and those who think that they should keep their own culture.

Unfortunately, we continue to get this discord-promoting dichotomic presentation of the question, despite the fact that it’s obviously a false dichotomy. A society which thinks of itself as liberal must encourage diversity; equally, social cohesion clearly requires both an agreement on core values and sympathy between people, which is impossible when they do not mix. (This latter point applies just as strongly to the question of class divisions, which I won’t touch on further here, but it may be fruitful to bear it in mind while reading on.)

The second front is between religions. In the UK, the two main contenders are Christianity and Islam, though one could argue that it’s Islam against the rest: although Christianity is still formally the state religion, few adherents still claim that it has the priority that that implies, in particular in civil life, whereas there are strident Islamic voices calling for its establishment and the introduction of sharia law. No other major religion in the UK is having similar demands made on its behalf. Both Christianity, despite its relative decline, and Islam, growing rapidly, are actively proselytising, and have influential public voices, and these are frequently opposed on a range of social and political issues. They are, at many points, irreconcilable in their present forms; however, the differences are not primarily religious, but historical: Islam is, simply put, rather more behind the times than Christianity. (Arguably, therefore, from a secularist point of view, a victory for Islam over Christianity would be a setback rather than a permanent disaster.)

The third front is that between religion and secularism. These two confront each other on many issues; here, the principal debate is whether religion should be central to society and, if so, whether it should be one particular religion or many.

The co-existence of the struggle between religions and that between religion and irreligion seems to be what makes this debate particularly interesting: Christians and Muslims have to decide when to disagree and when to make common cause against the infidel. In fact, this contradiction is mirrored in every believer whose religious beliefs do not subsume their social beliefs.

In any civilized debate, it’s important to find the areas of irreconcilability. Those between religions are relatively obvious, as they are the subject of frequent discussion between and within religions. Those between religion and secularism are obvious too, though I should note at this point that up to now I’ve conflated two positions under the heading of secularism, namely that which holds that religion is a matter for individuals, not for societies, and the more extreme position that religion is evil and should be eradicated. I’m not going to deal with this further, because this difference of views does not expand the range of desired outcomes to the multicultarism debate (although it is interesting to observe that those who would banish religion have a view more similar to those who would insist on a single religion than on secularists who insist on religious tolerance).

What then are the possible outcomes? First, there can be a state religion. In name, Christianity holds this position in the UK; in reality it is formally little privileged over any other, though its values are embedded in our culture, and hence social and legal systems (It is easy to forget the extent to which Christianity invisibly frames the entire debate simply because of this massive historical influence.) Secondly, religions can have legal status. Those who decry suggestions to introduce sharia law in certain areas of life are often unaware that Jewish courts have held sway in some areas for a long time, and if one believes religions have a social, as opposed to a merely individual, function, it’s hard to argue against believers electing to acknowledge the authority of religious courts in certain areas of law. Thirdly, religions can be limited to the individual conscience; this is more or less the current state of affairs in the UK.

Those who argue for multiculturalism argue for either the second or third outcome; those who argue against either for the first or third. I believe that only the first or third outcomes are tenable positions for a coherent society; the second leads to social fragmentation and an eventually irreconcilable tension between state and religion. Unfortunately, it’s the direction we’re heading in at the moment in the UK. The problem is that multiculturalism tries to reconcile the irreconcilable, by encouraging the growth of societies within a society. For a society to function as a whole, its members must share values and experiences; by allowing religious adherents to expand the sphere of their religions from the personal to the societal, multiple states-within-states are created, and the irreconcilability of religions which, when it is restricted to the personal level, can be bridged, builds insurmountable barriers between communities. A pluralistic society cannot be multicultural; minorities must assimilate, or form separate societies. In the UK, at least, the latter seems an unnecessarily disruptive course, and one which, in any case, the proponents of multiculturalism are not pursuing.

In short, to be part of British society as it is today, it seems reasonable to expect a command of English, respect for the rule of law, and for the principles of individual conscience and liberty. To disagree with this list implies either that you would be better off elsewhere, or that you want to change the country. For those who believe such a change would be for the worse, such a position is therefore a challenge.

7th July, 2009

Older entries

Last updated 2023/03/21