X-Git-Url: https://git.exim.org/exim.git/blobdiff_plain/86058a4a205e6a6b06190b8ccb827c6dbdced1bb..595028e435015508f214f06456874a8882bfd54e:/doc/doc-docbook/HowItWorks.txt diff --git a/doc/doc-docbook/HowItWorks.txt b/doc/doc-docbook/HowItWorks.txt index 4c51ae34d..91326d83e 100644 --- a/doc/doc-docbook/HowItWorks.txt +++ b/doc/doc-docbook/HowItWorks.txt @@ -1,4 +1,4 @@ -$Cambridge: exim/doc/doc-docbook/HowItWorks.txt,v 1.6 2007/04/11 15:26:09 ph10 Exp $ +$Cambridge: exim/doc/doc-docbook/HowItWorks.txt,v 1.7 2007/08/29 13:37:28 ph10 Exp $ CREATING THE EXIM DOCUMENTATION @@ -149,7 +149,7 @@ at the time of writing): . w3m 0.5.1 - This is a text-oriented web brower. It is used to produce the Ascii form of + This is a text-oriented web brower. It is used to produce the ASCII form of the Exim documentation (spec.txt) from a specially-created HTML format. It seems to do a better job than lynx. @@ -218,8 +218,8 @@ DOCBOOK PROCESSING Processing a .xml file into the five different output formats is not entirely straightforward. For a start, the same XML is not suitable for all the different output styles. When the final output is in a text format (.txt, -.texinfo) for instance, all non-Ascii characters in the input must be converted -to Ascii transliterations because the current processing tools do not do this +.texinfo) for instance, all non-ASCII characters in the input must be converted +to ASCII transliterations because the current processing tools do not do this correctly automatically. In order to cope with these issues in a flexible way, a Perl script called @@ -241,7 +241,7 @@ options it is given. The currently available options are as follows: -ascii - This option is used for Ascii output formats. It makes the following + This option is used for ASCII output formats. It makes the following character replacements: ’ => ' apostrophe @@ -252,14 +252,14 @@ options it is given. The currently available options are as follows: – => - en dash The apostrophe is specified numerically because that is what xfpt generates - from an Ascii single quote character. Non-Ascii characters that are not in + from an ASCII single quote character. Non-ASCII characters that are not in this list should not be used without thinking about how they might be - converted for the Ascii formats. + converted for the ASCII formats. In addition to the character replacements, this option causes quotes to be put round text items, and and to be replaced by - Ascii quote marks. You would think the stylesheet would cope with the latter, - but it seems to generate non-Ascii characters that w3m then turns into + ASCII quote marks. You would think the stylesheet would cope with the latter, + but it seems to generate non-ASCII characters that w3m then turns into question marks. -bookinfo @@ -479,7 +479,7 @@ so the logic is somewhat different. CREATING TEXT FILES This happens in four stages. The Pre-xml script is called with the -ascii, --optbreak, and -noindex options to convert the input to Ascii characters, +-optbreak, and -noindex options to convert the input to ASCII characters, insert line break points, and disable the production of an index. Then the xmlto command converts the XML to a single HTML document, using these stylesheets: @@ -494,7 +494,7 @@ symbol is output as "(c)" rather than the Unicode character. This is necessary because the stylesheet itself generates a copyright symbol as part of the document title; the character is not in the original input. -The w3m command is used with the -dump option to turn the HTML file into Ascii +The w3m command is used with the -dump option to turn the HTML file into ASCII text, but this contains multiple sequences of blank lines that make it look awkward. Furthermore, chapter and section titles do not stand out very well. A local Perl script called Tidytxt is used to post-process the output. First, it @@ -504,6 +504,15 @@ preceded by an extra two blank lines and a line of equals characters. An extra newline is inserted before each section heading, and they are underlined with hyphens. +August 2007: A further feature has been added to Tidytxt. The current version +of xmlto makes HTML that contains non-ASCII Unicode characters. Fortunately, +they are few. The heading uses "box drawing" characters in the range U+2500 to +U+253F, and within the main text, U+00A0 (hard space) occasionally appears. The +Tidytxt script now turns all the former into hyphens and the latter into normal +spaces. Bullets, which are set as U+25CF, are turned into asterisks. (It might +be possible to do all this in the same way as I dealt with copyright - see +above - but adding three lines of Perl to an existing script was a lot easier.) + CREATING INFO FILES @@ -663,4 +672,4 @@ x2man Script to make the Exim man page from the XML Philip Hazel -Last updated: 27 March 2007 +Last updated: 23 August 2007