Setting up a DocBook Toolchain for documenting PHP code The Right Way (TM)

A simple question: why is the word “docbook” always followed by “toolchain” instead of “editor”? Why can’t I just write my documentation in xml as easily as I do with Ms Word and be happy with the results?

The answer is unfortunately not so simple. The core of the problem lies in the flexibility provided by the docbook format. After all, it is an xml dialect, which can be used to write (almost) any kind of technical documentation and produce (almost) any kind of output. Existing graphical editing and conversion tools either cater only to a specific category of documents or suffer from a generic interface that does not introduce significant productivity gains.

What I needed to document my php project was:

  • A free (at least as in beer) docbook editor with a decent wysiwyg interface that would not force me to learn the intricacies of every single docbook tag
  • some way to automatically convert the docbook file to a nicely formatted XHTML version
  • some way to automatically convert the docbook file to a similarly formatted PDFversion
  • nice-to-have but not required: php syntax highlighting in the final output, generation of (parts of) the docbook manual from javadoc embedded in the php source code, conversion of docbook to OpenOffice format, etc…

After struggling with a couple of buggy/incomplete editing and conversion tools, being somewhat of a coder myself, I decided to roll my own solution.

Here’s how I set up my toolchain:

  1. A docbook editor.
    Any text editor will do, but the fancy ones come with syntax highlighting, xml validation and a slew of nice features. The best free (as in beer) one I found is XMLMind Xml Editor. It is nice because it does not present your xml as a tree in the default view, opting for a much friendlier wysiwyg view. Many of the more interesting features are available only in the pro ($$$) version, but we will roll our own conversion mechanism anyway.
    Once your doc is ok, save it as manual.xml in a directory where the final documentation will be generated
  2. An xsl processor and appropriate stylesheets to convert docbook to the desired formats.
    The standard docbook xsl stylesheets are available from Sourceforge. Current release is 1.7.2. Download and unzip them somewhere close to the xml source.
    These are the most complex stylesheets I have ever seen, and can be customized to suit almost every need. There is a very complete and highly recommended manual on their usage here.
    The xsl transform is applied via php itself, using the very simple convert.php script (php 5 with the xsltproc extension needed).
  3. A customization layer.
    The default stylesheets produce output that is slightly different from the standard php documentation (ie. the format used in the php website) so we are going to fix that. All the customizations added to the stock xslt files are kept in a single separate file. The version that will produce XHTML documentation is named custom.xsl. In short, it formats function definitions (a huge part of my documentation) as in the online php manual.
    PLEASE NOTE that the path to the main docbook xsl is hardcoded in the customization file, and you will have to change it to suit your setup.
    A very simple css file, xmlrpc.css is also used for further customization. Please note that it is not used during the conversion phase, but it will be referenced by the html pages produced.
    The command line used to do the trick is:

    php convert.php manual.xml custom.xsl out/

    It will generate an html version of the manual, split into many html files, in the out directory.

  4. Now, wouldn’t it be great if the php code examples given in the documentation had some source highlighting? The php interpreter comes to our help again, with its native capability of colorizing php source. The script that does the magic, post-processing the html files generated in the previous step, is highlight.php:
    And the command line to run it is:

    php highlight.php out

    You can see the differences bewteen the original and the final versions:

    standard html output from docbook source improved html output from docbook source
  5. Finally, the generation of the pdf version of the manual is tackled. To achieve it, a two step process is used, and the Apache FOP preprocessor is used. A Java virtual machine is needed to run it, since there is no fop engine available for php (yet).
    A different xls file is used in this case to add the needed customizations to the stock transform: custom.fo.xsl.
    The fop command is part of the fop distribution for both unix and windows (fop.bat). Make sure it is in your path, or just copy it over to the documentation directory and make sure it works adjusting as needed the executable paths.
    Then run the two commands:

    php convert.php manual.xml custom.fo.xsl manual.fo.xml
    fop manual.fo.xml manual.pdf

Et voilá, this toolchain, built only with free tools, operating system agnostic, will produce exactly the output I wanted.
For the lazy, all the scripts used can be downloaded in a single zip package.

A live example of the html documentation obtained is available on the phpxmlrpc website. The pdf version is there, too.

Other enhancements are possible, that have not yet been integrated into this work:

  • The pdf version does not have source code highlighting for php examples. It is possible to use a different, java based, xsl processor that adds support for source code colorizing (see the xslt manual for more info), but the result is below par, using only a bold font for php keywords and no coloring
  • Javadoc is sort of a standard for documenting php code. While keeping together code and docs is a worthy goal in itself, javadoc-based manuals generally lack examples,appendix chapters, reference material and many other things that make up a real manual. To get the best of both worlds it would be possible to split the manual in two parts: the API docs would then be taken from the javadoc in the code and transformed into docbook via the phpdocumentor tool, then merged with the rest of the documentation, hand-written as docbook, and finally transformed into the desired output.

  • Comments and suggestions for improvement are as always heartily welcome

2 thoughts on “Setting up a DocBook Toolchain for documenting PHP code The Right Way (TM)”

  1. Thanks for the tips! I’m just doing the same myself since I want to find a new way to document a fairly large PHP library using XMLMind among other things.

    One thing I haven’t yet found a good way to do is to include external small PHP examples (highlighted) in the source XML document (in programlistings tag) without copy the source text into the document. I have tried using tags (automatic generated) and then use &link; within the programlisting tag but for some reason it doesn’t quite work.

    The reason is that I like to have the source fully separate from the manual to make sure it can be run as part of a test suit.

    Does anyone know the best way of doing this?

    /Johan

  2. Sorry, I never tried this, but it sounds a good idea.
    You could probably use an xslt (or even plain php) script to inject the source code into the xml at the phase of producing the final output – not optimal because you loose visibility of the code while editing the docs though…

Leave a Reply

Your email address will not be published. Required fields are marked *