A comparison of JSON libraries for PHP

The recent release of php 5.2.0, which includes by default an extension for converting php values from and to the JSON format, is a good occasion for comparing the different existing php libraries which aim to provide the same capabilities.
Here you can find a detailed analysis of the features, performances and encoding/decoding capabilities of 4 of the main implementations: 3 “pure php” libraries and the native extension.
I have to warn readers: I personally maintain one of the four libs in exam.
Any correction, criticism or feedback is welcome.

Edit 2007/04/25 – an update version of the comparison has been posted here

10 thoughts on “A comparison of JSON libraries for PHP”

  1. I may have an explanation for your slow PHP5 results. I’ve been benchmarking against PHP4, not PHP5, but all my numbers changed, some dramatically, when I exercised the entire code path prior to running the timed test. Perhaps PHP5 takes a big up-front optimization hit, and you’re factoring that into the results.

    Generally speaking, some libraries will run faster on smaller data sets, some on larger data sets, and some only on enormous data sets. There’s also the issue of which runs fastest on one parse, and which on multiple parses (due to possibly recurring object/memory allocations). You might also cycle among test subjects rather than repeatedly testing the same subject sequentially, to help smooth out server background processes.

  2. “Perhaps PHP5 takes a big up-front optimization hit, and you’re factoring that into the results”

    In fact I am positive it does.

    There was a really nice and interesting article a couple of moths ago in the php|architect magazine, where somebody from Zend (or the core php team, I cannot seem to recall) explained all the optimizations that went into php 5, 5.1 and especially 5.2.
    The list was impressive, and synthetic benchmarks showed steady improvements.
    Then came benchmarks of very common 5 real-world apps, and improvement in response time ranged from very-small to none, with one case of severe regression (hopefully fixed since then).

    There where two lessons I learned from that article:

    – raw response time is not the end-all of webserving. Getting a better throughput in terms of concurrents responses served with a given hardware is also a worthy goal, and the two optimizations can sometimes collide (I think php 5.2 was optimized especially for high concurrency)

    – a lot of speed optimization tricks are completely wasted on php, where the app. server model means that the script is parsed on every single web page request. Unlike in C or Java, all the time spent in optimization is time taken away from execution. Eventually, the point is reached where optimizing more becomes counter-productive.

    The reason why I did not pre-exercise code paths is precisely because I think it is unlikely that a real web app will see that usage pattern.

    I did see a slight change when shifting the order of tests, but the changes seemed to be more random than consistent.

    About the data set: I used it a big one to stress more the encoding/decoding part of the library. A real world app where the usage pattern is sending/receiving a lot of small messages would probably be more interested in the optimization of the http send/receive code.

    All in all, as with every test measure, I think it is a mere indication, and it should not be taken as absolute truth.

    Note: as you might have guessed, I am not too fond of php being optimized for the huge-application-server case, where a code cache + optimizer is installed and thousand of requests have to be served per second, if it has to be detrimental to the more general ‘slap together a few php scripts and dump them on whatever webserver you happen to find’ case…

  3. At my company I write a lot of applications that need to share datastructures between PHP and Python. This is because we obviously use PHP for our webapps, but a lot of background processes (“cronjobs”) that the apps sometimes have are written in Python. I regularly use JSON as the intermediate encoding because it supports exactly what we want to and has stable libraries for a variety of languages.

    However, this is where my two cents comes in. One of our apps sends a very large (200+MB) uncontrollable dataset over the Internet in a Python -> PHP fashion. At some point the PEAR library ceased to be able to decode the data – I don’t know why. I fixed it by switching to the PECL library, which handled the data with no problems. I have no idea what the PEAR library didn’t like about the data – I was much more interested in fixing the app than triaging bugs.

    So to sum up: The PEAR library probably has a really tiny bug in it which can prevent it from reading some rare bits of JSON.

  4. “At some point the PEAR library ceased to be able to decode the data”

    This might simply be caused by the php application running out of memory, rather than a bug with some obscure json corner-case.

    The same thing will most likely happen using the phpxmlrpc-based library for parsing huge file, too (a user once mailed me asking why he had problems using the phpxmlrpc lib to parse a 800 meg xml file, which btw was used to serialize a matrix of integers…).

    The php engine has no built-in protection against deep recursion (I think Python does have it), and the stack simply got smashed at some point.

    The pecl library, being written in highly optimized C code, is a lot less prone to resource exhaustion.

  5. Would you mind fix link “Here you can find”, it not working 🙁

  6. Thanks a lot for this comparison. I have been using AJAX for a while, but have just recently got into JSON. This helped me pointing at what library I should go for 🙂

  7. Just a little thought. Why not have an option not to escape the forward slash “/”? This would give some improvement in real world applications.

    The forward slash only needs to be escaped (or rather “

Leave a Reply

Your email address will not be published. Required fields are marked *