Setting up a DocBook Toolchain for documenting PHP code The Right Way (TM)

A simple question: why is the word “docbook” always followed by “toolchain” instead of “editor”? Why can’t I just write my documentation in xml as easily as I do with Ms Word and be happy with the results?

The answer is unfortunately not so simple. The core of the problem lies in the flexibility provided by the docbook format. After all, it is an xml dialect, which can be used to write (almost) any kind of technical documentation and produce (almost) any kind of output. Existing graphical editing and conversion tools either cater only to a specific category of documents or suffer from a generic interface that does not introduce significant productivity gains.

What I needed to document my php project was:

  • A free (at least as in beer) docbook editor with a decent wysiwyg interface that would not force me to learn the intricacies of every single docbook tag
  • some way to automatically convert the docbook file to a nicely formatted XHTML version
  • some way to automatically convert the docbook file to a similarly formatted PDFversion
  • nice-to-have but not required: php syntax highlighting in the final output, generation of (parts of) the docbook manual from javadoc embedded in the php source code, conversion of docbook to OpenOffice format, etc…

After struggling with a couple of buggy/incomplete editing and conversion tools, being somewhat of a coder myself, I decided to roll my own solution.

Here’s how I set up my toolchain:

Continue reading Setting up a DocBook Toolchain for documenting PHP code The Right Way (TM)

Migrating to PHP5: a sensible solution at last?

As everybody in the php world is bound to know, PHP4 is still the dominant platform on the net, despite its age and many shortcomings. Notwhistanding many improvements in speed, security and functionality, PHP 5 has seen so far a less-than-stellar adoption rate.

PHP adoptions stats by nexen.net

There have been a lot of discussions in the blogosphere about this “problem”, with the most basic explanation being:

  • web hosters do not upgrade because some very successful web apps (blogs, cms, bulletins, etc…) still need php 4 to run
  • php coders have to code for php 4 because otherwise they loose the opportunity of getting deployed on shared hosting
  • since the apps run on php 4 anyway, why upgrade? (a classic case of dog-eats-tail)

Other people refer to PHP as victim-of-its-own-success: PHP 4 was good enough for everybody, so a lot of people do not feel compelled to upgrade.

While I am generally in favour of extreme ABInex and API stability (I have servers running php 4.0.4, pl1 mind you…), I have to admit that the current situation imposes a heavy burden on the developers of php libraries and applications: the coder has to cater to the quirks and bugs of every possible php version, and either avoid any enhancement that has seen the light roughly in the last 5 years (since the release of php 4.2) or resort to provide alternative code paths for the less-fortunate (the php-compat package on Pear is a great help if you’re into defensive coding).

The best proposal I have seen so far to this situation comes from the Drupal mailing list:

PICK A DATE FOR PHP 4 DESUPPORT AND ANNOUNCE IT TO THE WORLD WITH ENOUGH ADVANCE.

The chosen date has to be agreed upon by most major php application teams (this is the hard part), but most of them, have been planning the upgrade to php5 anyway.

  • The hosters and sysadmins at private companies will be given enough time to test the deployment of the new php version
  • The developers will feel the need to make sure that their app runs fine on php 5 before the cutover date
  • Everybody will be happy (except Stefan Esser, he never is…)

One last question remains open: WHAT SHALL WE CALL THE DAY THAT PHP4 IS PUT TO REST?

Suggestions are welcome…

A comparison chart of PHP Environment variables

One of the little quirks the php framework developer has to face when confronted with the daunting task of writing real portable code is figuring out which global variables will be available in every single conceivable user setup (and in most of the unthinkable ones), and what kind of values they will assume.
Although the online manual does a pretty good job in describing where the environment variables are supposed to come from, and their supposed usage, it sports no single, comprehensive list of all the junk that might – or not – be filling up the “Environment” section of the global namespace.
Having wrestled with php deployments ranging from SCO Openserver (brr…) to windows to solaris 32 and 64 bits, I set up to publish my own findings.
The list can be found here: http://gggeek.altervista.org/sw/env_vars_comparison_chart.xhtml. People on slow links please note it weights in at 300k.
In its present incarnation it is based exclusively on windows installs. I plan to add some more unix goodiness later on, but any contribution is welcome (a printout of your phpinfo will do, or, in case you value privacy and security, a plain list of the values in the ‘env’ and ‘server’ sections).
The colors, more or less, indicate:

  • red: value or variable name changes from server to server (eg. some values change casing, such as PATH vs. Path, COMSPEC, )
  • yellow: variable is in the cgi spec, but the server omits it…
  • gray: variable is present in all setups tested: it can be used reliably
  • blue: variable should not be present in CLI versions, it can be thus safely used to test if app is called via web or not

Also note that “script name” in some settings will point to php executable, not php script.
In no particular order, things that should be done before the table is considered reliable include: add a column with exaple values, testing all php setups without a php.ini file active, test with IIS in isapi mode, apache 1 in mod_apache mode, apache in ssl mode, cgi mode when run from command line, version 4 cli, linux / solaris installs, separate clearly variables from the windows environment from the more “general” ones, add some insight on usage of REGISTER_GLOBALS, variables_order, and other assorted ini settings that might influence the php environment.

Edit: (2007/02/14)
A similar table, with better formatting, can be found here, while a discussion on this topic a little while ago was here

Shrink the size of your javascript with js min (the php way)

Getting more and more into javascript coding, two tools I found I could not do without are JSMin and JSlint, from Douglas Crockford.
On the JSmin webpage, a php version is available for download, but it did not fit my needs really well (in fact, the js version did, but, being a php-head, I got that one first), so I patched it a bit:

  1. made all the code work with PHP 4 (removed usage of splfileiterator, exceptions are replaced by triggering errors, class constants are turned into global constants, etc…)
  2. constant VERSION changed to JSMIN_VERSION to avoid name clashes
  3. added a new function to class JSMin to cleanly separate writing to output stream from parsing
  4. make sure that newline chars passed inside the comments parameter do not break output
  5. the file can now be included in a php app (whether cli or web hosted) and used as library (just define JSMIN_AS_LIB before including jsmin.php)
  6. when used as library, the class can operate on php strings instead of files
  7. i could not resist the urge to remove excess whitespace here and there in the php source, too (phpmin-syndrome???)

The new code is available here. Take a look at the comments inside for more info.

There is some other (php) work done on jsmin from a guy named Ed Eliot there

Got my javascript yellow belt!

A very well kept secret: I had done very few to not-at-all javascript development until now. My personal philosophy can be summed up as “if it can be done on the server, do it on the server”.

Alas, I have at last succumbed to the ajax craze, and waded knee-deep into the murky javascript pools, in search for diamond and pearls and api-compatible library reimplementations.
Not only have I discovered that translating php into javascript is almost as simple as translating “->” into “.”, “.” into “+” and “gettype” into “typeof”, but my mind got bended and warped forever while trying to rationalize the prototype-based inheritance model.

If the result of such perilous journeys has been something of interest, I let the readers judge.
For now, the visible part is a graphical editor of xmlrpc / json values, added to the online xmlrpc debugger available here: http://gggeek.damacom.it/debugger/ (to see it, click on ‘Execute method’ then on the ‘Edit’ link).

A comparison of JSON libraries for PHP

The recent release of php 5.2.0, which includes by default an extension for converting php values from and to the JSON format, is a good occasion for comparing the different existing php libraries which aim to provide the same capabilities.
Here you can find a detailed analysis of the features, performances and encoding/decoding capabilities of 4 of the main implementations: 3 “pure php” libraries and the native extension.
I have to warn readers: I personally maintain one of the four libs in exam.
Any correction, criticism or feedback is welcome.

Edit 2007/04/25 – an update version of the comparison has been posted here

The daily PHP WTF

A couple of days ago I succeeded in the daunting task of bringing a production server to a grinding halt. In fact the server was a sturdy linux box, and despite all my wrongdoings it kept chugging along, but all php services running on top of it where rendered useless (it was the company intranet server, and people where not happy).
Since this is quite an extraordinary event, I thought I might capitalize on the experience gained through this accident and share it for the benefit of future generations.
To make a long story short, here is the sequence of the events that lead to the catastrophe:

  1. I installed on the web server a shrink-wrapped php package, which I shall not name here, without taking the time to sift properly trough the code line by line, but following as carefully as I could the install instructions
  2. After toying around for a while, I decided to implement some new must-have feature, and within a couple of hours had on my hands code that looked good enough on my development box.
  3. So I promptly dumped the code to the production server, and fired up my browser to see the final result, but…
  4. …a nasty php error message greeted me. The code was trying to read from a file descriptor that was not open (in the fact the php variable delegated to hold the file descriptor was not even declared at the point it was used)
  5. The fix was really quick: a close look revealed an error in nesting code blocks, so that code was being executed where it was not supposed to. Hey presto, edit, save, redeploy and test!
  6. And this time, BINGO! the new functionality was fine and dandy. I dashed down the corridor to announce the new amazing functionality to fellow coders, and after discussing it and future evolutions for a while, I packed up and went home, happy and satisfied with my rapid-development skills and the incredibly apt platform that php had once again proved to be

But things are seldom as simple as they seem…
…and I was greeted by angry sysadmins and furious customers the next morning when returning to my cubicle.

The problem had been promptly identified: there was an error log file that kept growing at incredible speed, and it had saturated during the night the /home partition. Since the main website, based on a template engine and php opcode cache, was using the same partition to hold both the compiled templates and cached php scripts, it was having a hard time writing to disk its necessary files.
The sysadmins could not understand what was causing the log to grow indefinitely, but the mere fact that it was the php error log flashed a big nasty disaster sign in my mind. Sure enough, the culprit was to be found in my modifications of the previous day, but how on earth could the application, fixed and tested the day before, have kept running erroneously for the whole night, and still causing damage at early morning?
The worst part of it is the error log was reporting accesses to the web server from my very own development machine, which had been switched off during all that time.
Being well versed in php, I know for a fact that any php script is bound to end within a specified time limit (specified in php.ini that is), or be terminated by the php engine.
It is also very unusual for a script to keep running after the client that requested the web page terminates the http connection, such as when the user hits the STOP button or quits the application.
After a quick and dirty Apache restart, the error log finally showed a normal behaviour, everybody calmed down, and I was left alone with my dark train of toughts: was it possible that I had hit a bug of php, or, heaven forbids, Apache itself? Impossible! The j2ee goblins would have laughed at me for months to come.

As that old Jedi saying goes, use the source, Luke.

Continue reading The daily PHP WTF

Call by name and PHP

A little while ago, the question of allowing named parameteres in function calls was raised on the JSON-RPC mailing list.

As you might or might not know, named parameters (as opposed to positional parameters) are used by many database programming languages, and in some languages not really database related. The main advantage of using named parameters is that you can choose which parameters to pass to a function, omitting all the ones you do not need. PHP, with default parameter values, does something similar, with two small catches:

  • the parameters have to be passed to the function in the same order that they were declared (this is, imho, not a big problem in most situations)
  • given a function with 3 parameters, of which only the first is manadatory, the caller is not allowed to omit the parameter that occupies position 2 if he wants to specify the parameter at position 3. He can of course attain the same effect by specifyng the default value for parameter 2, but that means that he must know that value.

Named parameters are useful e.g. when a function takes a very long list of options, most of which are not compulsory.

The “php way” of doing this usually involves declaring a function as accepting a single parameter of type array, and passing all the options as key=>value pairs. The downside is that this forces the coder implementing the function to write quite a bit of code to validate all the options received.

By taking advantage of the introspection capabilties offered by PHP 5, it is possible to ease this burden. I have whipped up this code snippet that migth come handy in situations where a lot of call-by-name is used.

There are some caveats to take into account:

  • Despite the function working on php 5 only, it does not make use of exceptions, and relies on php 4 error mechanism instead
  • hell might freeze over when you try it on functions that accept parameters by reference (or return by ref)

Enjoy

/**
* Call a user function using named instead of positional parameters.
* If some of the named parameters are not present in the original function, they
* will be silently discarded.
* Does no special processing for call-by-ref functions...
* Cannot be used on internal PHP functions.
* @param string $function name of function to be called
* @param array $params named array with parameters used for function invocation
*/
function call_user_func_named($function, $params)
{
  // make sure we do not throw exception if function not found: raise error instead...
  // (oh boy, we do like php 4 better than 5, don't we...)
  if (!function_exists($function))
  {
    trigger_error('call to unexisting function '.$function, E_USER_ERROR);
    return NULL;
  }
  $reflect = new ReflectionFunction($function);
  if ($reflect->isInternal())
  {
    trigger_error('cannot call by name internal function '.$function, E_USER_ERROR);
    return NULL;
  }
  $real_params = array();
  foreach ($reflect->getParameters() as $i => $param)
  {
    $pname = $param->getName();
    if ($param->isPassedByReference())
    {
      /// @todo shall we raise some warning?
    }
    if (array_key_exists($pname, $params))
    {
      $real_params[] = $params[$pname];
    }
    else if ($param->isDefaultValueAvailable()) {
      $real_params[] = $param->getDefaultValue();
    }
    else
    {
      // missing required parameter: mark an error and exit
      //return new Exception('call to '.$function.' missing parameter nr. '.$i+1);
      trigger_error(sprintf('call to %s missing parameter nr. %d', $function, $i+1), E_USER_ERROR);
      return NULL;
    }
  }
  return call_user_func_array($function, $real_params);
}

PHP Day 2006

I participated to the italian PHP day 2006, giving two talks. Met a lot of nice people, weather was excellent and food fantastic. Lots of interesting talks, both at the conference and at the restaurant.
Here are a couple of pictures taken at the event.
These are the slides (in italian, of course – PDF format) of my talk about php for webservices.
Those are the slides of the talk about php usage in the enterprise context (at SEA airports) I gave together with Elena Brambilla.