-August 2007: A further feature has been added to Tidytxt. The current version
-of xmlto makes HTML that contains non-ASCII Unicode characters. Fortunately,
-they are few. The heading uses "box drawing" characters in the range U+2500 to
-U+253F, and within the main text, U+00A0 (hard space) occasionally appears. The
-Tidytxt script now turns all the former into hyphens and the latter into normal
-spaces. Bullets, which are set as U+25CF, are turned into asterisks. (It might
-be possible to do all this in the same way as I dealt with copyright - see
-above - but adding three lines of Perl to an existing script was a lot easier.)
+The output of xmlto also contains non-ASCII Unicode characters that w3m passes
+through. Fortunately, they are few, and Tidytxt cleans them up as well. Some
+headings use "box drawing" characters in the range U+2500 to U+253F which are
+translated into -+| as appropriate, and U+00A0 (hard space) and U+25CF (bullet)
+are translated into plain spaces and asterisks. (It might be possible to do all
+this in the same way as I dealt with copyright - see above - but adding a few
+lines of Perl to an existing script was a lot easier.)