4 . We are going to type the name xfpt rather a lot. Make it easier:
5 .set X "<emphasis>xfpt</emphasis>"
8 . ----------------------------------------------------------------------------
14 <title>The xfpt plain text to XML processor</title>
15 <titleabbrev>xfpt</titleabbrev>
16 <date>18 May 2012</date>
18 <firstname>Philip</firstname>
19 <surname>Hazel</surname>
21 <authorinitials>PH</authorinitials>
22 <revhistory><revision><revnumber>0.09</revnumber><date>18 May 2012</date><authorinitials>PH</authorinitials></revision></revhistory>
23 <copyright><year>2012</year><holder>University of Cambridge</holder></copyright>
27 . /////////////////////////////////////////////////////////////////////////////
28 . These lines are processing instructions for the Simple DocBook Processor that
29 . Philip Hazel has developed as a less cumbersome way of making PostScript and
30 . PDFs than using xmlto and fop. They will be ignored by all other XML
32 . /////////////////////////////////////////////////////////////////////////////
36 foot_right_recto="&chaptertitle;"
37 foot_right_verso="&chaptertitle;"
43 . ----------------------------------------------------------------------------
44 .chapter "Introduction" ID00
45 &X; is a program that reads a marked-up ASCII source file, and converts it into
46 XML. It was written with DocBook XML in mind, but can also be used for other
47 forms of XML. Unlike &'AsciiDoc'& (&url(http://www.methods.co.nz/asciidoc/)),
48 &X; does not try to produce XML from a document that is also usable as a
49 freestanding ASCII document. The input for &X; is very definitely &"marked
50 up"&. This makes it less ambiguous for large and/or complicated documents. &X;
51 is also much faster than &'AsciiDoc'& because it is written in C and does not
52 rely on pattern matching.
54 &X; is aimed at users who understand the XML that they are generating. It makes
55 it easy to include literal XML, either in blocks, or within paragraphs. &X;
56 restricts itself to two special characters that trigger all its processing.
58 &X; treats any input line that starts with a dot as a &'directive'& line.
59 Directives control the way the input is processed. A small number of directives
60 are implemented in the program itself. A macro facility makes it possible to
61 combine these in various ways to define directives for higher-level concepts
62 such as chapters and sections. A standard macro library that generates a simple
63 subset of DocBook XML is provided. The only XML element that the program itself
64 generates is &`<para>`&; all the others must be included as literal XML, either
65 directly in the input text, or, more commonly, as part of the text that is
66 generated by a macro call.
68 The ampersand character is special within non-literal text that is processed by
69 &X;. An ampersand introduces a &'flag sequence'& that modifies the output.
70 Ampersand was chosen because it is also special in XML. As well as recognizing
71 flag sequences that begin with an ampersand, &X; converts grave accents and
72 apostrophes that appear in non-literal text into typographic opening and
73 closing quotes, as follows:
76 &` ` `& becomes `
77 &` ' `& becomes '
80 Within normal input text, ampersand, grave accent, and apostrophe are the only
81 characters that cause &X; to change the input text, but this applies only to
82 non-literal text. In literal text, there are no markup characters, and only a
83 dot at the start of a line is recognized as special. Within the body of a
84 macro, there is one more special character: the dollar character is used to
85 introduce an argument substitution.
87 Notwithstanding the previous paragraph, &X; knows that it is generating XML,
88 and in all cases when a literal ampersand or angle bracket is required in the
89 output, the appropriate XML entity reference (&`&&`&, &`&<`&, or
90 &`&>`&, respectively) is generated.
93 .section "The &X; command line" ID01
94 The format of the &X; command line is:
96 &`xfpt [`&&'options'&&`] [`&&'input source'&&`]`&
98 If no input is specified, the standard input is read. There are four options:
102 This option causes &X; to output its &"usage"& message, and exit.
104 .vitem "&%-o%&&~&'<output destination>'&"
105 This option overrides the default destination. If the standard input is being
106 read, the default destination is the standard output. Otherwise, the default
107 destination is the name of the input file with the extension &_.xml_&,
108 replacing its existing extension if there is one. A single hyphen character can
109 be given as an output destination to refer to the standard output.
111 .vitem "&%-S%&&~&'<directory path>'&"
112 This option overrides the path to &X;'s library directory that is built into
113 the program. This makes it possible to use or test alternate libraries.
116 This option causes &X; to output its version number and exit.
120 .section "A short &X; example" ID02
121 Here is a very short example of a complete &X; input file that uses some of the
122 standard macros and flags:
129 .chapter "The first chapter"
130 This is the text of the first chapter. Here is an &'italic'&
131 word, and here is a &*bold*& one.
133 .section "This is a section heading"
134 We can use the &*ilist*& macro to generate an itemized list:
136 The first item in the list.
138 The last item in the list.
141 There are also standard macros for ordered lists, literal
142 layout blocks, code blocks, URL references, index entries,
143 tables, footnotes, figures, etc.
148 .section "Literal and non-literal processing" "SECTliteralprocessing" ID03
149 &X; processes non-directive input lines in one of four ways (known as
153 In the default mode, text is processed paragraph by paragraph.
155 There is, however, a special case when a paragraph contains one or more
156 footnotes. In that situation, each part of the outer paragraph is processed
159 The end of a paragraph is indicated by the end of the input, a blank line, or
160 by an occurrence of the &*.literal*& directive. Other directives (for example,
161 &*.include*&) do not of themselves terminate a paragraph. Most of the standard
162 macros (such as &*.chapter*& and &*.section*&) force a paragraph end by
163 starting their contents with a &*.literal*& directive.
165 Because &X; reads a whole paragraph before processing it, error messages
166 contain the phrase &"detected near line &'nnn'&"&, where the line number is
167 typically that of the last line of the paragraph.
171 In the &"literal layout"& mode, text is processed line by line, but is
172 otherwise handled as in the default mode. The only real difference this makes
173 to the markup from the user's point of view is that both parts of a set of
174 paired flags must be on the same line. In this mode, error messages are more
175 likely to contain the exact line number where the fault lies. Literal layout
176 mode is used by the standard &*.display*& macro to generate &`<literallayout>`&
180 In the &"literal text"& mode, text is also processed line by line, but no flags
181 are recognized. The only modification &X; makes to the text is to turn
182 ampersand and angle bracket characters into XML entity references. This mode is
183 used by the standard &*.code*& macro to generate &`<literallayout>`& elements
184 that include &`class=monospaced`&.
187 In the &"literal XML"& mode, text lines are copied to the output without
188 modification. This is the easiest way to include a chunk of literal XML in the
189 output. An example might be the &`<bookinfo>`& element, which occurs only once
190 in a document. It is not worth setting up a macro for a one-off item like this.
193 The &*.literal*& directive switches between the modes. It is not normally used
194 directly, but instead is incorported into appropriate macro definitions. The
195 &*.inliteral*& directive can be used to test the current mode.
197 Directive lines are recognized and acted upon in all four modes. However, an
198 unrecognized line that starts with a dot in the literal text or literal XML
199 mode is treated as data. In the other modes, such a line provokes an error.
201 If you need to have a data line that begins with a dot in literal layout mode,
202 you can either specify it by character number, or precede it with some
203 non-acting markup. These two examples are both valid:
205 .start with a dot
206 &''&.start with a dot
208 The second example assumes the standard flags are defined: it precedes the dot
209 with an empty italic string. However, this is untidy because the empty string
210 will be carried over into the XML.
212 In literal text or literal XML mode, it is not possible to have a data line
213 that starts with a dot followed by the name of a directive or macro. You have
214 to use literal layout mode if you require such output. Another solution, which
215 is used in the source for this document (where many examples show directive
216 lines), is to indent every displayed line by one space, and thereby avoid the
220 .section "Format of directive lines" ID04
221 If an input line starts with a dot followed by a space, it is ignored by &X;.
222 This provides a facility for including comments in the input. Otherwise, the
223 dot must be followed by a directive or macro name, and possibly one or more
224 arguments. Arguments that are strings are delimited by white space unless they
225 are enclosed in single or double quotes. The delimiting quote character can be
226 included within a quoted string by doubling it. Here are some examples:
230 .row "Jack's house" 'Jill''s house'
232 An unrecognized directive line normally causes an error; however, in the
233 literal text and literal XML modes, an unrecognized line that starts with a
234 dot is treated as a data line.
238 .section "Calling macros" "SECTcallingmacro" ID05
239 Macros are defined by the &*.macro*& directive, which is described in section
240 &<<SECTmacro>>&. There are two ways of calling a macro. It can be called in the
241 same way as a directive, or it can be called from within text that is being
242 processed. The second case is called an &"inline macro call"&.
244 When a macro is called as a directive, its name is given after a dot at the
245 start of a line, and the name may be followed by any number of optional
246 arguments, in the same way as a built-in directive (see the previous section).
249 .chapter "Chapter title" chapter-reference
251 The contents of the macro, after argument substitution, are processed in
252 exactly the same way as normal input lines. A macro that is called as a
253 directive may contain nested macro calls.
255 When a macro is called from within a text string, its name is given after an
256 ampersand, and is followed by an opening parenthesis. Arguments, delimited by
257 commas, can then follow, up to a closing parenthesis. If an argument contains a
258 comma or a closing parenthesis, it must be quoted. White space after a
259 separating comma is ignored. The most common example of this type of macro
260 call is the standard macro for generating a URL reference:
262 Refer to a URL via &url(http://x.example,this text).
265 There are differences in the behaviour of macros, depending on which way they
266 are called. A macro that is called inline may not contain references to other
267 macros; it must contain only text lines and calls to built-in directives.
268 Also, newlines that terminate text lines within the macro are not included in
271 A macro that can be called inline can always be called as a directive, but the
272 opposite is not always true. Macros are usually designed to be called either
273 one way or the other. However, the &*.new*& and &*.index*& macros in the
274 standard library are examples of macros that are designed be called either way.
280 . ----------------------------------------------------------------------------
281 .chapter "Flag sequences" ID06
282 Only one flag sequence is built-into the code itself. If an input line ends
283 with three ampersands (ignoring trailing white space), the ampersands are
284 removed, and the next input line, with any leading white space removed, is
285 joined to the original line. This happens before any other processing, and may
286 involve any number of lines. Thus:
289 &`The quick &&&`&
290 &` brown &&&`&
294 produces exactly the same output as:
301 .section "Flag sequences for XML entities and &X; variables" ID07
302 If an ampersand is followed by a # character, a number, and a semicolon, it is
303 understood as a numerical reference to an XML entity, and is passed through
304 unmodified. The number can be decimal, or hexadecimal preceded by &`x`&. For
307 This is an Ohm sign: Ω.
308 This is a degree sign: °.
310 If an ampersand is followed by a letter, a sequence of letters, digits, and
311 dots is read. If this is terminated by a semicolon, the characters between the
312 ampersand and the semicolon are interpreted as an entity name. This can be:
314 The name of an inbuilt &X; variable. At present, there is only one of these,
315 called &`xfpt.rev`&. Its use is described with the &*.revision*& directive
318 The name of a user variable that has been set by the &*.set*& directive, also
321 The name of an XML entity. This is assumed if the name is not recognized as one
322 of the previous types. In this case, the input text is passed to the output
323 without modification. For example:
325 This is an Ohm sign: &Ohm;.
330 .section "Flag sequences for calling macros" ID08
331 If an ampersand is followed by a sequence of alphanumeric characters starting
332 with a letter, terminated by an opening parenthesis, the characters between the
333 ampersand and the parenthesis are interpreted as the name of a macro. See
334 section &<<SECTcallingmacro>>& for more details.
338 .section "Other flag sequences" ID09
339 Any other flag sequences that are needed must be defined by means of the
340 &*.flag*& directive. These are of two types, standalone and paired. Both cases
341 define replacement text. This is always literal; it is not itself scanned for
344 Lines are scanned from left to right when flags are being interpreted. If
345 there is any ambiguity when a text string is being scanned, the longest flag
346 sequence wins. Thus, it is possible (as in the standard flag sequences) to
347 define both &`&&<`& and &`&&<<`& as flags, provided that you never want to
348 follow the first of them with a &`<`& character.
350 You can define flags that start with &`&&#`&, but these must be used with care,
351 lest they be misinterpreted as numerical references to XML entities.
353 A standalone flag consists of an ampersand followed by any number of
354 non-alphanumeric characters. When it is encountered, it is replaced by its
355 replacement text. For example, in the standard flag definitions, &`&&&&`&
356 is defined as a standalone flag with with the replacement text &`&&`&.
358 A paired flag is defined as two sequences. The first takes the same form as a
359 standalone flag. The second also consists of non-alphanumeric characters, but
360 need not start with an ampersand. It is often defined as the reverse of the
361 first sequence. For example, in the standard definitions, &`&&'`& and
362 &`'&&`& are defined as a flag pair for enclosing text in an &`<emphasis>`&
365 When the first sequence of a paired flag is encountered, its partner is
366 expected to be found within the same text unit. In the default mode, the units
367 are a paragraphs, or part-paragraphs if footnotes intervene. In literal layout
368 mode, the text is processed line by line. Each member of the pair is replaced
369 by its replacement text.
371 Multiple occurrences of paired flags must be correctly nested. Note that,
372 though &X; diagnoses an error for badly nested flag pairs, it does not prevent
373 you from generating invalid XML. For example, DocBook does not allow
374 &`<emphasis>`& within &`<literal>`&, though it does allow &`<literal>`& within
378 .section "Unrecognized flag sequences" ID10
379 If an ampersand is not followed by a character sequence in one of the forms
380 described in the preceding sections, an error occurs.
383 .section "Standard flag sequences" ID11
384 These are the standalone flag sequences that are defined in the &_stdflags_&
385 file in the &X; library:
387 &`&&&& `& becomes &` &&`& (ampersand)
388 &`&&-- `& becomes &` &–`& (en-dash)
389 &`&&~ `& becomes &` & `& (`hard' space)
391 These are the flag pairs that are defined in the &_stdflags_& file in the &X;
394 &`&&"..."&& `& becomes &`<quote>...</quote>`&
395 &`&&'...'&& `& becomes &`<emphasis>...</emphasis>`&
396 &`&&*...*&& `& becomes &`<emphasis role="bold">...</emphasis>`&
397 &`&&`...`&& `& becomes &`<literal>...</literal>`&
398 &`&&_..._&& `& becomes &`<filename>...</filename>`&
399 &`&&(...)&& `& becomes &`<command>...</command>`&
400 &`&&[...]&& `& becomes &`<function>...</function>`&
401 &`&&%...%&& `& becomes &`<option>...</option>`&
402 &`&&$...$&& `& becomes &`<varname>...</varname>`&
403 &`&&<...>&& `& becomes &`<...>`&
404 &`&&<<...>>&& `& becomes &`<xref linkend="..."/>`&
406 For example, if you want to include a literal XML element in your output, you
407 can do it like this: &`&&<element>&&`&. If you want to include a longer
408 sequence of literal XML, changing to the literal XML mode may be more
414 . ----------------------------------------------------------------------------
415 .chapter "Built-in directive processing" ID12
416 The directives that are built into the code of &X; are now described in
417 alphabetical order. You can see more examples of their use in the descriptions
418 of the standard macros in chapter &<<CHAPstdmac>>&.
421 .section "The &*.arg*& directive" ID13
422 This directive may appear only within the body of a macro. It must be followed
423 by a single number, optionally preceded by a minus sign. If the number is
424 positive (no minus sign), subsequent lines, up to a &*.endarg*& directive, are
425 skipped unless the macro has been called with at least that number of
426 arguments and the given argument is not an empty string. If the number is
427 negative (minus sign present), subsequent lines are skipped if the macro has
428 been called with fewer than that number of arguments, or with an empty string
429 for the given argument. For example:
433 Use these lines if there are at least 2 arguments
434 and the second one is not empty. Normally there would
435 be a reference to the 2nd argument.
438 Use this line unless there are at least 2 arguments
439 and the second one is not empty.
443 Note that if a macro is defined with default values for its arguments, these
444 are not counted by the &*.arg*& directive, which looks only at the actual
445 arguments in a particular macro call.
447 The &*.arg*& directive may be nested.
450 .section "The &*.eacharg*& directive" ID14
451 This directive may appear only within the body of a macro. It may optionally be
452 followed by a single number; if omitted the value is taken to be 1. Subsequent
453 lines, up to a &*.endeach*& directive, are processed multiple times, once for
454 each remaining argument. Unlike &*.arg*&, an argument that is an empty string
455 is not treated specially. However, like &*.arg*&, only the actual arguments of
456 a macro call are considered. Default argument values do not count.
458 The number given with &*.eacharg*& defines which argument to start with. If the
459 macro is called with fewer arguments, the lines up to &*.endeach*& are skipped,
460 and are not processed at all. When these lines are being processed, the
461 remaining macro arguments can be referenced relative to the current argument.
462 &`$+1`& refers to the current argument, &`$+2`& to the next argument, and so
465 The &*.endeach*& directive may also be followed by a number, again defaulting
466 to 1. When &*.endeach*& is reached, the current argument number is incremented
467 by that number. If there are still unused arguments available, the lines
468 between &*.eacharg*& and &*.endeach*& are processed again.
470 This example is taken from the coding for the standard &*.row*& macro, which
471 generates an &`<entry>`& element for each of its arguments:
474 &<entry>&$+1&</entry>&
477 This example is taken from the coding for the standard &*.itable*& macro, which
478 processes arguments in pairs to define the table's columns, starting from the
482 &<colspec colwidth="$+1" align="$+2"/>&
485 The &*.eacharg*& directive may in principle be nested, though this does not
486 seem useful in practice.
489 .section "The &*.echo*& directive" ID15
490 This directive takes a single string argument. It writes it to the standard
491 error stream. Within a macro, argument substitution takes place, but no other
492 processing is done on the string. This directive can be useful for debugging
493 macros or writing comments to the user.
496 .section "The &*.endarg*& directive" ID16
497 See the description of &*.arg*& above.
500 .section "The &*.endeach*& directive" ID17
501 See the description of &*.eacharg*& above.
504 .section "The &*.endinliteral*& directive" ID18
505 See the description of &*.inliteral*& below.
508 .section "The &*.flag*& directive" ID19
509 This directive is used to define flag sequences. The directive must be followed
510 either by a standalone flag sequence and one string in quotes, or by a flag
511 pair and two strings in quotes. White space separates these items. For example:
514 .flag &" "& "<quote>" "</quote>"
516 There are more examples in the definitions of the standard flags. If you
517 redefine an existing flag, the new definition overrides the old. There is no
518 way to revert to the previous definition.
521 .section "The &*.include*& directive" ID20
522 This directive must be followed by a single string argument that is the path to
523 a file. The contents of the file are read and incorporated into the input at
524 this point. If the string does not contain any slashes, the path to the &X;
525 library is prepended. Otherwise, the path is used unaltered. If
526 &*.include*& is used inside a macro, it is evaluated each time the macro is
527 called, and thus can be used to include a different file on each occasion.
530 .section "The &*.inliteral*& directive" ID21
531 This directive may appear only within the body of a macro. It must be followed
532 by one of the words &"layout"&, &"text"&, &"off"&, or &"xml"&. If the current
533 literal mode does not correspond to the word, subsequent lines, up to a
534 &*.endinliteral*& directive, are skipped. The &*.inliteral*& directive may be
538 .section "The &*.literal*& directive" ID22
539 This must be followed by one of the words &"layout"&, &"text"&, &"off"&, or
540 &"xml"&. It forces an end to a previous paragraph, if there is one, and then
541 switches between processing modes. The default mode is the &"off"& mode, in
542 which text is processed paragraph by paragraph, and flags are recognized.
543 Section &<<SECTliteralprocessing>>& describes how input lines are processed in
547 .section "The &*.macro*& directive" "SECTmacro" ID23
548 This directive is used to define macros. It must be followed by a macro name,
549 and then, optionally, by any number of arguments. The macro name can be any
550 sequence of non-whitespace characters. The arguments in the definition provide
551 default values. The following lines, up to &*.endmacro*&, form the body of the
552 macro. They are not processed in any way when the macro is defined; they are
553 processed only when the macro is called (see section &<<SECTcallingmacro>>&).
555 Within the body of a macro, argument substitutions can be specified by means of
556 a dollar character and an argument number, for example, &`$3`& for the third
557 argument. See also &*.eacharg*& above for the use of &`$+`& to refer to
558 relative arguments when looping through them. A reference to an argument that
559 is not supplied, and is not given a default, results in an empty substitution.
561 There is also a facility for a conditional substitution. A reference to an
562 argument of the form:
564 &`$=`&&'<digits><delimiter><text><delimiter>'&
566 inserts the text if the argument is defined and is not an empty string, and
567 nothing otherwise. The text is itself scanned for flags and argument
568 substitutions. The delimiter must be a single character that does not appear in
569 the text. For example:
571 &<chapter$=2+ id="$2"+>&
573 If this appears in a macro that is called with only one argument, the result
578 but if the second argument is, say &`abcd`&, the result is:
582 This conditional feature can be used with both absolute and relative argument
585 If a dollar character is required as data within the body of a macro, it must
586 be doubled. For example:
593 If you redefine an existing macro, the new definition overrides the old. There
594 is no way to revert to the previous definition. If you define a macro whose
595 name is the same as the name of a built-in directive you will not be able to
596 call it, because &X; looks for built-in directives before it looks for macros.
598 It is possible to define a macro within a macro, though clearly care must be
599 taken with argument references to ensure that substitutions happen at the right
603 .section "The &*.nest*& directive" ID24
604 This directive must be followed by one of the words &"begin"& or &"end"&. It is
605 used to delimit a nested sequence of independent text items that occurs inside
606 another, such as the contents of a footnote inside a paragraph. This directive
607 is usually used inside a macro. For example, a &*footnote*& macro could be
615 At the start of a nested sequence, the current mode and paragraph state are
616 remembered and &X; then reverts to the default mode and &"not in a paragraph"&.
617 At the end of a nested sequence, if a paragraph has been started, it is
618 terminated, and then &X; reverts to the previous state.
621 .section "The &*.nonl*& directive" ID25
622 This directive must be followed by a single string argument. It is processed
623 as an input line without a newline at the end. This facility is useful
624 in macros when constructing a single data line from several text fragments. See
625 for example the &*.new*& macro in the standard macros.
628 .section "The &*.pop*& directive" ID26
629 &X; keeps a stack of text strings that are manipulated by the &*.push*& and
630 &*.pop*& directives. When the end of the input is reached, any strings that
631 remain on the stack are popped off, processed for flags, and written to the
632 output. In some cases (see the &*.push*& directive below) a warning message is
635 Each string on the stack may, optionally, be associated with an upper case
636 letter. If &*.pop*& is followed by an upper case letter, it searches down the
637 stack for a string with the same letter. If it cannot find one, it does
638 nothing. Otherwise, it pops off, processes, and writes out all the strings down
639 to and including the one that matches.
641 If &*.pop*& is given without a following letter, it pops one string off the
642 stack and writes it out. If there is nothing on the stack, an error occurs.
645 .section "The &*.push*& directive" ID27
646 This directive pushes a string onto the stack. If the rest of the command line
647 starts with an upper case letter followed by white space or the end of the
648 line, that letter is associated with the string that is pushed, which consists
649 either of a quoted string, or the rest of the line. After a quoted string, the
650 word `check' may appear. In this case, if the string has not been popped off
651 the stack by the end of processing, a warning message is output. This facility
652 is used by the standard macros to give warnings for unclosed items such as
655 For example, the &*.chapter*& macro contains this line:
659 Earlier in the macro there is the line:
663 This arrangement ensures that any previous chapter is terminated before
664 starting a new one, and also when the end of the input is reached. The
665 &*.ilist*& macro contains this line:
667 .push L "&</itemizedlist>&" check
669 Item lists are terminatated by &*.endlist*&, which contains:
673 However, if &*.endlist*& is accidentally omitted (or &*.ilist*& is accidentally
674 included), the appearance of `check' means that a warning is issued to alert
675 the user to a possible problem.
677 .section "The &*.revision*& directive" "SECTrevision" ID28
678 This directive is provided to make it easy to set the &`revisionflag`&
679 attribute on XML elements in a given portion of the document. The DocBook
680 specification states that the &`revisionflag`& attribute is common to all
683 The &*.revision*& directive must be followed by one of the words &"changed"&,
684 &"added"&, &"deleted"&, or &"off"&. For any value other than &"off"&, it causes
685 the internal variable &'xfpt.rev'& to be set to &`revisionflag=`& followed by
686 the given argument. If the argument is &"off"&, the internal variable is
689 The contents of &'xfpt.rev'& are included in every &`<para>`& element that &X;
690 generates. In addition, a number of the standard macros contain references to
691 &'xfpt.rev'& in appropriate places. Thus, setting:
695 should cause all subsequent text to be marked up with &`revisionflag`&
700 is encountered. Unfortunately, at the time of writing, not all DocBook
701 processing software pays attention to the &`revisionflag`& attribute.
702 Furthermore, some software grumbles that it is &"unexpected"& on some elements,
703 though it does still seem to process it correctly.
705 For handling the most common case (setting and unsetting &"changed"&), the
706 standard macros &*.new*& and &*.wen*& are provided (see section
710 .section "The &*.set*& directive" ID29
711 This directive must be followed by a name and a text string. It defines a user
712 variable and gives it a name. A reference to the name in the style of an XML
713 entity causes the string to be substituted, without further processing. For
718 This could be referenced as &`&&version;`&. If a variable is given the name of
719 an XML entity, you will not be able to refer to the XML entity, because local
720 variables take precedence. There is no way to delete a local variable after it
725 . ----------------------------------------------------------------------------
726 .chapter "The standard macros for DocBook" "CHAPstdmac" "Standard macros" ID30
727 A set of simple macros for commonly needed DocBook features is provided in
728 &X;'s library. This may be extended as experience with &X; accumulates. The
729 standard macros assume that the standard flags are defined, so a document that
730 is going to use these features should start with:
735 All the standard macros except &*new*&, &*index*&, and &*url*& are intended to
736 be called as directive lines. Their names are therefore shown with a leading
737 dot in the discussion below.
739 .section "Overall setup" ID31
740 There are two macros that should be used only once, at the start of the
741 document. The &*.docbook*& macro has no arguments. It inserts into the output
742 file the standard header material for a DocBook XML file, which is:
744 <?xml version="1.0" encoding="UTF-8"?>
745 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
746 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
748 The &*.book*& macro has no arguments. It generates &`<book>`& and pushes
749 &`</book>`& onto the stack so that it will be output at the end.
752 .section "Processing instructions"
753 XML processing instructions such as &`<?sdop`& &`toc_sections="no"?>`& can, of
754 course, be written written literally between &`.literal`& &`xml`& and
755 &`.literal`& &`off`&. If there are a lot of them, this is perhaps the most
756 convenient approach. A macro called &*.pi*& is provided as an easy way of
757 setting up a short processing instruction. Its first argument is the name of
758 the processor for which the instruction is intended, and its second argument is
759 the contents of the instruction, for example:
761 .pi sdop 'toc_sections="yes,yes,no"'
763 This generates &`<?sdop`& &`toc_sections="yes,yes,no"?>`&.
766 .section "Chapters, sections, and subsections" ID32
767 Chapters, sections, and subsections are supported by three macros that all
768 operate in the same way. They are &*.chapter*&, &*.section*&, and
769 &*.subsection*&. They take either one, two, or three arguments. The first
770 argument is the title. If a second argument is present, and is not an empty
771 string, it is set as an ID, and can be used in cross-references. For example:
773 .chapter "Introduction"
777 .section "A section title" "SECTdemo"
779 can be referenced from elsewhere in the document by a phrase such as:
781 see section &<<SECTdemo>>&
783 When the title of a chapter of section is being used as a running head or foot
784 (for example), it may be too long to fit comfortably into the available space.
785 DocBook provides the facility for a title abbreviation to be specified to deal
786 with this problem. If a third argument is given to one of these macros, it
787 causes a &`<titleabbrev>`& element to be generated. In this case, a second
788 argument must also be provided, but if you do not need an ID, the second
789 argument can be an empty string. For example:
791 .chapter "This chapter has quite a long title" "" "Long title"
793 Where and when the abbreviation is used in place of the full title is
794 controlled by the stylesheet when the XML is processed.
797 .section "Prefaces, appendixes, and colophons" ID33
798 The macros &*.preface*&, &*.appendix*&, and &*.colophon*& operate in the same
799 way as &*.chapter*&, except that the first and the last have the default title
800 strings &"Preface"& and &"Colophon"&.
803 .section "Terminating chapters, etc."
804 The macros for chapters, sections, appendixes, etc. use the stack to ensure
805 that each one is terminated at the correct point, without the need for an
806 explicit terminator. For example, starting a new section automatically
807 terminates an open subsection and a previous section.
809 Occasionally, however, there is a need to force an explicit termination. The
810 &*.endchapter*&, &*.endsection*&, &*.endsubsection*&, &*.endpreface*&,
811 &*.endappendix*&, and &*.endcolophon*& macros provide this facility. For
812 example, if you want to include an XML processing instruction after a preface,
813 but before the start of the following chapter, you must terminate the preface
814 with &*.endpreface*&. Otherwise a processing instruction that precedes the next
815 &*.chapter*& will end up inside the &`<preface>`& element. You should not
816 include any actual text items at these points.
820 .section "URL references" ID34
821 The &*url*& macro generates URL references, and is intended to be called inline
822 within the text that is being processed. It generates a &`<ulink>`& element,
823 and has either one or two arguments. The first argument is the URL, and the
824 second is the text that describes it. For example:
826 More details are &url(http://x.example, here).
828 This generates the following XML:
830 More details are <ulink url="http://x.example">here</ulink>.
832 If the second argument is absent, the contents of the first argument are used
833 instead. If &*url*& is called as a directive, there will be a newline in the
834 output after &`</ulink>`&, which in most cases (such as the example above), you
839 .section "Itemized lists" ID35
840 The &*.ilist*& macro marks the start of an itemized list, the items of which
841 are normally rendered with bullets or similar markings. The macro can
842 optionally be called with one argument, for which there is no default. If the
843 argument is present, it is used to add a &`mark=`& attribute to the
844 &`<itemizedlist>`& element that is generated. The mark names that can be used
845 depend on the software that processes the resulting XML. For HTML output,
846 &"square"& and &"opencircle"& work in some browsers.
848 The text for the first item follows the macro call. The start of the next item
849 is indicated by the &*.next*& macro, and the end of the list by &*.endlist*&.
853 This is the first item.
855 This is the next item.
858 There may be more than one paragraph in an item.
861 .section "Ordered lists" ID36
862 The &*.olist*& macro marks the start of an ordered list, the items of which are
863 numbered. If no argument is given, arabic numerals are used. One of the
864 following words can be given as the macro's argument to specify the numeration:
866 &`arabic `& arabic numerals
867 &`loweralpha `& lower case letters
868 &`lowerroman `& lower case roman numerals
869 &`upperalpha `& upper case letters
870 &`upperroman `& upper case roman numerals
872 The text for the first item follows the macro call. The start of the next item
873 is indicated by the &*.next*& macro, and the end of the list by &*.endlist*&.
877 This is the first item.
879 This is the next item.
882 There may be more than one paragraph in an item.
885 .section "Variable lists" ID37
886 A variable list is one in which each entry is composed of a set of one or more
887 terms and an associated description. Typically, the terms are printed in a
888 style that makes them stand out, and the description is indented underneath.
889 The start of a variable list is indicated by the &*.vlist*& macro, which has
890 one optional argument. If present, it defines a title for the list.
892 Each entry is defined by a &*.vitem*& macro, whose arguments are the terms.
893 This is followed by the body of the entry. The list is terminated by the
894 &*.endlist*& macro. For example:
896 .vlist "Font filename extensions"
903 As for the other lists, there may be more than one paragraph in an item.
906 .section "Nested lists" ID38
907 Lists may be nested as required. Some DocBook processors automatically choose
908 different bullets for nested itemized lists, but others do not. The
909 &*.endlist*& macro has no useful arguments. Any text that follows it is
910 treated as a comment. This can provide an annotation facility that may make the
911 input easier to understand when lists are nested.
914 .section "Displayed text" ID39
915 In displayed text each non-directive input line generates one output line. The
916 &`<literallayout>`& DocBook element is used to achieve this. Two kinds of
917 displayed text are supported by the standard macros. They differ in their
918 handling of the text itself.
920 The macro &*.display*& is followed by lines that are processed in the same way
921 as normal paragraphs: flags are interpreted, and so there may be font changes
922 and so on. The lines are processed in literal layout mode. For example:
925 &`-o`& set output destination
926 &`-S`& set library path
929 The output is as follows:
931 &`-o`& set output destination
932 &`-S`& set library path
935 The macro &*.code*& is followed lines that are not processed in any way, except
936 to turn ampersands and angle brackets into XML entities. The lines are
937 processed in literal text mode. In addition, &`class="monospaced"`& is added to
938 the &`<literallayout>`& element, so that the lines are displayed in a
939 monospaced font. For example:
946 As the examples illustrate, both kinds of display are terminated by the
951 .section "Block quotes" ID40
952 The macro pair &*.blockquote*& and &*.endblockquote*& are used to wrap the
953 lines between them in a &`<blockquote>`& element.
956 .section "Revision markings" "SECTrevmacs" ID41
957 Two macros are provided to simplify setting and unsetting the &"changed"&
958 revision marking (see section &<<SECTrevision>>&). When the revised text is
959 substantial (for example, a complete paragraph, table, display, or section), it
960 can be placed between &*.new*& and &*.wen*&, as in this example:
962 This paragraph is not flagged as changed.
964 This is a changed paragraph that contains a display:
968 This is the next paragraph.
970 Here is the next, unmarked, paragraph.
972 When called like this, without an argument, in ordinary text, &*.new*&
973 terminates the current paragraph, and &*.wen*& always does so. Therefore, even
974 though there are no blank lines before &*.new*& or &*.wen*& above, the revised
975 text will end up in a paragraph of its own. (You can, of course, put in blank
978 If want to indicate that just a few words inside a paragraph are revised, you
979 can call the &*new*& macro with an argument. The macro can be called either as
980 a directive or inline:
982 This is a paragraph that has
983 .new "a few marked words"
984 within it. Here are &new(some more) marked words.
986 The effect of this is to generate a &`<phrase>`& XML element with the
987 &`revisionflag`& attribute set. The &*.wen*& macro is not used in this case.
989 You can use the &*.new*&/&*.wen*& macro pair to generate a &`<phrase>`& element
990 inside a section of displayed text. For example:
993 This line is not flagged as changed.
995 This line is flagged as changed.
997 This line is not flagged as changed.
1000 This usage works with both &*.display*& and &*.code*&. Within a &*.display*&
1001 section you can also call &*.new*& with an argument, either as a directive or
1002 inline. This does not work for &*.code*& because its lines are processed in
1005 If you want to add revision indications to part of a table, you must use an
1006 inline call of &*new*& within an argument of the &*.row*& macro (see below).
1007 This is the only usage that works in this case.
1010 .section "Informal tables" ID42
1011 The &*.itable*& macro starts an informal (untitled) table with some basic
1012 parameterization. If you are working on a large document that has many tables
1013 with the same parameters, the best approach is to define your own table macros,
1014 possibly calling the standard one with specific arguments.
1016 The &*.itable*& macro has four basic arguments:
1018 The frame requirement for the table, which may be one of the words &"all"&,
1019 &"bottom"&, &"none"& (the default), &"sides"&, &"top"&, or &"topbot"&.
1021 The &"colsep"& value for the table. The default is &"0"&, meaning no vertical
1022 separator lines between columns. The value &"1"& requests vertical separator
1025 The &"rowsep"& value for the table. The default is &"0"&, meaning no horizontal
1026 lines between rows. The value &"1"& requests horizontal separator lines.
1028 The number of columns.
1030 These arguments must be followed by two arguments for each column. The first
1031 specifies the column width, and the second its aligmnent. A column width can be
1032 specified as an absolute dimension such as 36pt or 2in, or as a proportional
1033 measure, which has the form of a number followed by an asterisk. The two forms
1034 can be mixed &-- see the DocBook specification for details.
1036 Straightforward column alignments can be specified as &"center"&, &"left"&, or
1037 &"right"&. DocBook also has some other possibilities, but sadly they do not
1038 seem to include &"centre"&.
1040 Each row of the table is specified using a &*.row*& macro; the entries in
1041 the row are the macros's arguments. The table is terminated by &*.endtable*&,
1042 which has no arguments. For example:
1045 .itable all 1 1 2 1in left 2in center
1046 .row "cell 11" "cell 12"
1047 .row "cell 21" "cell 22"
1051 This specifies a framed table, with both column and row separator lines. There
1052 are two columns: the first is one inch wide and left aligned, and the second is
1053 two inches wide and centred. There are two rows. The resulting table looks like
1056 .itable all 1 1 2 1in left 2in center
1057 .row "cell 11" "cell 12"
1058 .row "cell 21" "cell 22"
1061 The &*.row*& macro does not set the &`revisionflag`& attribute in the
1062 &`<entry>`& elements that it generates because this appears to be ignored by
1063 all current XML processors. However, you can use an inline call of the &*new*&
1064 macro within an entry to generate a &`<phrase>`& element with &`revisionflag`&
1068 .section "Formal tables" ID43
1069 The &*.table*& macro starts a formal table, that is, a table that has a title,
1070 and which can be cross referenced. The first argument of this macro is the
1071 table's title; the second is an identifier for cross-referencing. If you are
1072 not going to reference the table, an empty string must be supplied. From the
1073 third argument onwards, the arguments are identical to the &*.itable*& macro.
1077 .table "A title for the table" "" all 1 1 2 1in left 2in center
1078 .row "cell 11" "cell 12"
1079 .row "cell 21" "cell 22"
1084 .section "Figures and images" ID44
1085 A figure is enclosed between &*.figure*& and &*.endfigure*& macros. The first
1086 argument of &*.figure*& provides a title for the figure. The second is
1087 optional; if present, it is a tag for references to the figure.
1089 A figure normally contains an image. The &*.image*& macro can be used in simple
1090 cases. It generates a &`<mediaobject>`& element containing an
1091 &`<imageobject>`&. The first argument is the name of the file containing the
1092 image. The remaining arguments are optional; an empty string must be
1093 supplied as a placeholder when one that is not required is followed by one that
1097 The second argument specifies a scaling factor for the image, as a percentage.
1098 Thus, a value of 50 reduces the image to half size.
1100 The third argument specifies an alignment for the image. It must be one of
1101 &`left`& (default), &`right`& or &`center`& (or even &`centre`& if the
1102 DocBook processor you are using can handle it).
1104 The fourth and fifth arguments specify the depth and width, respectively. How
1105 these values are handled depends on the processing software.
1108 Here is an example of the input for a figure, with all the image options
1111 .figure "My figure's title" "FIGfirst"
1116 Here is another example, where the figure is reduced to 80% and centred:
1118 .figure "A reduced figure"
1119 .image figure02.eps 80 center
1124 .section "Footnotes" ID45
1125 Footnotes can be specified between &*.footnote*& and &*.endnote*& macros.
1126 Within a footnote there can be any kind of text item, including displays and
1127 tables. When a footnote occurs in the middle of a paragraph, paired flags
1128 must not straddle the footnote. This example is wrong:
1136 The correct markup for this example is:
1146 .section "Indexes" ID46
1147 The &*.index*& macro generates &`<indexterm>`& elements (index entries) in the
1148 output. It takes one or two arguments. The first is the text for the primary
1149 index term, and the second, if present, specifies a secondary index term. This
1150 macro can be called either from a directive line, or inline. However, it is
1151 mostly called as a directive, at the start of a relevant paragraph. For
1154 .index goose "wild chase"
1155 The chasing of wild geese...
1157 You can generate &"see"& and &"see also"& index entries by using &*.index-see*&
1158 and &*.index-seealso*& instead of &*.index*&. The first argument of these
1159 macros is the text for the &"see"&. For example:
1161 .index-see "chase" "wild goose"
1166 <primary>wild goose</primary>
1171 If you want to generate an index entry for a range of pages, you can use the
1172 &*.index-from*& and &*.index-to*& macros. The first argument of each of them is
1173 an ID that ties them together. The second and third arguments of
1174 &*.index-from*& are the primary and secondary index items. For example:
1176 .index-from "ID5" "indexes" "handling ranges"
1177 ... <lines of text> ...
1181 The &*.makeindex*& macro should be called at the end of the document, at the
1182 point where you want an index to be generated. It can have up to two
1183 arguments. The first is the title for the index, for which the default is
1184 &"Index"&. The second, if present, causes a &`role=`& attribute to be added to
1185 the &`<index>`& element that is generated. For this to be useful, you need to
1186 generate &`<indexterm>`& elements that have similar &`role=`& attributes. The
1187 standard &*index*& macro cannot do this. If you want to generate multiple
1188 indexes using this mechanism, it is best to define your own macros for each
1189 index type. For example:
1192 &<indexterm role="concept">&
1193 &<primary>&$1&</primary>&
1195 &<secondary>&$2&</secondary>&
1200 This defines a &*.cindex*& macro for the &"concept"& index. At the end of the
1201 document you might have:
1203 .makeindex "Concept index" "concept"
1206 As long as the processing software can handle multiple indexes, this causes two
1207 indexes to be generated. The first is entitled &"Concept index"&, and contains
1208 only those index entries that were generated by the &*.cindex*& macro. The
1209 second contains all index entries.