+August 2007: A further feature has been added to Tidytxt. The current version
+of xmlto makes HTML that contains non-ASCII Unicode characters. Fortunately,
+they are few. The heading uses "box drawing" characters in the range U+2500 to
+U+253F, and within the main text, U+00A0 (hard space) occasionally appears. The
+Tidytxt script now turns all the former into hyphens and the latter into normal
+spaces. Bullets, which are set as U+25CF, are turned into asterisks. (It might
+be possible to do all this in the same way as I dealt with copyright - see
+above - but adding three lines of Perl to an existing script was a lot easier.)
+