Sources of information

The most useful sources of information I have found are:

DocBook XSL: The Complete Guide - online book describing the modern (i.e. XSL-based) way of transforming DocBook to HTML.
DocBook: The Definitive Guide - online reference manual for DocBook language (tags etc.).
The DocBook Project - home page of the DocBook project.
The man pages for xsltproc and xmllint.

Installation

All the required files are provided as part of Slackware 11. However, the DocBook V4.1.2 that is provided with Slackware 11 does not support HTML tables and may have problems handling Xincludes. I've upgraded the  linuxdoc-tools-0.9.21-i486-2 package to linuxdoc-tools-0.9.21-i486-6 (also from Slackware), which includes DocBook V4.5.

Generating Java help HTML

The xsltproc program is used to generate HTML from DocBook XML files (xsltproc is used instead of the docbook2html command that we originally used). The command line should be something like:

xsltproc \
--nonet \
--stringparam base.dir docdir/ \
--stringparam root.filename manual \
--stringparam use.id.as.filename 1 \
--stringparam chunker.output.indent yes \
--stringparam chunk.first.sections 1 \
--stringparam html.stylesheet javahelp.css \
http://docbook.sourceforge.net/release/xsl/current/html/chunk.xsl \
manual.xml

(See Using a customized stylesheet below for a way to simplify this command.)

manual.xml is the DocBook XML file that the HTML files are to be generated from.

http://docbook.sourceforge.net/release/xsl/current/html/chunk.xsl is the XML stylesheet to be used to convert from DocBook to HTML. Although this is specified using a URL, it should be found on the local machine (the catalog file /etc/xml/catalog provides a mapping from the URL to the local filename). This stylesheet provides "chunked" output - which means that the output is split into several HTML files (see Chunking into multiple HTML files).

The --nonet option tells xsltproc not to fetch files from the internet. This is optional, and should have no effect since all the required files should be installed locally. But it should serve as a check that the files are installed locally.

The various stringparam options are described in the following table:

Parameter Value Description
base.dir e.g. "docdir/" Directory where the generated HTML files are to be put. It can be a relative or absolute pathname. A trailing '/' is essential! The directory must already exist.
root.filename e.g. "manual" Name to be used for the first HTML file (i.e. in this example the first HTML file will be called manual.html). By default, the first HTML file is called index.html.
use.id.as.filename 1 Use the id attribute of each "chunk" element as the name (with ".html" appended) for each HTML file except the first.
chunker.output.indent yes Indent the HTML output to make it more readable by humans.
html.stylesheet e.g. "javahelp.css" CSS stylesheet to be used to control appearance of HTML. Only the name of this file is used by the DocBook processing, not the contents (the browser interprets the contents, as usual).
chunk.first.sections 1 Don't put the first section in the same page as the table of contents.


The xsltproc command does not check the XML input file very thoroughly for errors. It is recommended that the XML file should be validated using xmllint before passing it to xsltproc. The command for this is, for example:

xmllint --valid --noout --noent --nonet modsak.xml

Changes needed to files to use xsltproc

A few minor changes were needed to the XML files in order to work with xsltproc and the XSL stylesheet:
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
   "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
convert input.png -resize 75% -depth 8 -colors 256 output.png

Even if the image doesn't need to be smaller, it's a good idea to use a 256-colour palette to reduce the image file size:

convert input.png -depth 8 -colors 256 output.png

The javahelp.css HTML stylesheet also needed modifying, mainly because different heading levels are generated when using xsltproc instead of docbook2html.

Entities

"Entities" are a convenient way of naming a string that you want to use repeatedly in the XML file. They are a bit like C's #define statement. Entity definitions should be placed within the DOCTYPE header at the start of the XML file, for example:

<!ENTITY company "Wingpath Limited">

gives the name "company" to the string "Wingpath Limited". Each occurrence of "&company;" in the document text wil be automatically replaced by "Wingpath Limited".

If you want the string to come from a file, you can use a "system entity". For example:

<!ENTITY company SYSTEM "company.txt">

will get the string from the file company.txt. I.e. each occurrence of "&company;" in the document text will be replaced by the contents of the file company.txt. Note that files created/edited using vim will normally have a newline at end. This newline will get rendered as a space when the HTML is displayed, which may not be what is wanted. To stop vim writing the newline at the end of the file, use the vim commands:

:set binary
:set noendofline
:w

to save the file.

Including files

If you give xsltproc the extra option "--xinclude", it will perform Xinclude processing of the input document. This allows you to break up the input document into mutiple files, share document fragments between documents, include generated text (plain or DocBook), etc. See Docbook XSL: The Complete Guide: Modular DocBook files for more details.

If you use xmllint, you will have to give it the --xinclude option too, and also use the option --postvalid instead of --valid (so that validity checks are done after the Xinclude processing - see Validating with Xincludes).

If you have a document that is split into multiple XML files, and you are using entities as described above, then each XML file will have to include the entity definitions. The best way to handle this is to move the entity definitions into a separate file, which is then included in each XML file - see Shared text entities. For example, you might have a file definitions.ent containing:

<?xml version="1.0" encoding="utf-8"?>
<!ENTITY company "Wingpath Limited">
<!ENTITY appname SYSTEM "appname">
<!ENTITY appversion SYSTEM "../src/VERSION">

which is included in each XML file using:

<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % entities SYSTEM "definitions.ent" >
%entities;
]>

Note that xmllint and xsltproc seem to require the "encoding" in the entities file, but allow it to be omitted in XML files (according to the XML standard, the encoding is optional even in entities).

You would normally use a relative pathname in an xi:include element to specify the file to be included. The pathname is interpreted as relative to the directory that the including file came from. If the included file itself includes other files, then the names of these files will be relative to the directory that the included file came from. If you want the names of the indirectly-included files to be interpreted as relative to the directory of the original including file, you can use a symbolic link from the original directory to the first included file, and use the name of the symbolic link in the xi:include element.

Conditional text

DocBook has a rudimentary mechanism (called "profiling") for conditionally including DocBook elements. Profiling uses a few special attributes (e.g. the "condition" attribute) to mark elements that are to be conditionally included. An element that is so marked will only be included if the attribute value in the element matches the attribute value specified on the xsltproc command-line.

For example, if an XML file contains:

<phrase condition="modsak">master or slave</phrase><phrase condition="modmaster">master only</phrase>

and the xsltproc command-line specifies:

--stringparam profile.condition modsak

then the output will include the phrase "master or slave", but not the phrase "master only".

You can make an element conditional on more than condition. For example, if an XML file contains:

<phrase condition="modsak;modmaster">master</phrase>

then the phrase will be included if the command line specifies "profile.condition modsak" or "profile.condition modmaster". You cannot easily AND two conditions - it requires two passes of xsltproc, one for each condition.

When using profiling, the DocBook processing is best done in two passes: the first pass processes the conditionals (and any Xincludes) and produces a temporary XML file, and the second pass translates the temporary XML file into HTML. It's a good idea to use xmllint to validate both the original XML file and the temporary XML file (see Validation and profiling). The commands to do the two-pass processing with validation will be something like:

xmllint --xinclude --postvalid --noent --noout --nonet manual.xml
xsltproc \
--nonet \
--xinclude \
--output manual.tmp.xml  \
--stringparam profile.condition modsak \
http://docbook.sourceforge.net/release/xsl/current/profiling/profile.xsl  \
manual.xml
xmllint --dtdvalid http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd --noout --nonet manual.tmp.xml
xsltproc \
--nonet \
--stringparam base.dir docdir/ \
--stringparam root.filename manual \
--stringparam use.id.as.filename 1 \
--stringparam chunker.output.indent yes \
--stringparam chunk.first.sections 1 \
--stringparam html.stylesheet javahelp.css \
http://docbook.sourceforge.net/release/xsl/current/html/chunk.xsl \
manual.tmp.xml

Tables

Docbook supports two sets of elements for tables: CALS and HTML. If you use xsltproc the setting of CALS column widths is not supported - you have to use HTML elements if you want to specify column widths. The widths are best specified in the <td> elements of the first table row - the <col> element is not supported by Java EditorPane.

Using a customized stylesheet

The section Customization Methods in the Complete Guide describes how to write a wrapper for the chunk.xsl stylesheet.

One use for this is to simplify the xsltproc command line by moving commonly used parameter settings into the stylesheet wrapper. For example, by using the stylesheet wrapper:

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/chunk.xsl"/>

<xsl:param name="root.filename">manual</xsl:param>
<xsl:param name="use.id.as.filename">1</xsl:param>
<xsl:param name="chunker.output.indent">yes</xsl:param>
<xsl:param name="chunk.first.sections">1</xsl:param>
<xsl:param name="html.stylesheet">javahelp.css</xsl:param>
<xsl:param name="generate.section.toc.level">1</xsl:param>
<xsl:param name="toc.section.depth">3</xsl:param>
<xsl:param name="bridgehead.in.toc">1</xsl:param>

</xsl:stylesheet>

we can simplify the command line:

@xsltproc \
--nonet \
--stringparam base.dir docdir/ \
--stringparam root.filename manual \
--stringparam use.id.as.filename 1 \
--stringparam chunker.output.indent yes \
--stringparam chunk.first.sections 1 \
--stringparam html.stylesheet javahelp.css \
--stringparam generate.section.toc.level 1 \
--stringparam toc.section.depth 3 \
--stringparam bridgehead.in.toc 1 \
http://docbook.sourceforge.net/release/xsl/current/html/chunk.xsl \
manual.tmp.xml

to:

@xsltproc \
--nonet \
--stringparam base.dir docdir/ \
javahelp.xsl \
manual.tmp.xml

where javahelp.xml is the stylesheet wrapper.

The wrapper can also be used to customize the rendering by overriding templates that are defined in (and called from) chunk.xsl. For example, the following template definition puts a copyright notice at the bottom of every page (with the notice also being a link to our website):

<xsl:template name="user.footer.navigation">
  <div class="centre">
  <a href="http://wingpath.co.uk">
  Copyright © 2009 Wingpath Limited
  </a> 
  </div>
</xsl:template>

The template user.footer.navigation is one of many templates that have empty definitions in chunk.xsl and are there specifically to allow customization.

You can also change "generated" text. For example, the following changes "Home" in the footer to "Contents":

<xsl:param name="local.l10n.xml" select="document('')" />
<l:i18n xmlns:l="http://docbook.sourceforge.net/xmlns/l10n/1.0">
 <l:l10n language="en">
  <l:gentext key="nav-home" text="Contents"/>
 </l:l10n>
</l:i18n>