Why I am not using Zend_Markup

Sunday, February 28, 2010

About three months ago I took notice of the new Zend_Markup component and I was very pleased to see such a component in Zend Framework. To my own fault I forgot about it follow the development process in the incubator and only remembered it again two weeks before the Zend Framework 1.10 release. By that time it had many flaws and inconsistency in the code, to name just a few of them:
  • Converter codes where called tags, which only was correct for the BBCode to HTML conversion
  • It didn't follow the PHP coding standard completly
  • The documentation was more than incomplete
  • There were majors bugs and inconsistency against other BBCode parsers
Those were just the points I noticed when looking over the code. I surely informed the author Pieter Kokx about those issues, and he started to work them out. As there wasn't much time left, he didn't get everything done in time. His parser implementation was also really slow, as he was doing a character-by-character parsing like one would do in C, but which was not suitable in PHP. I showed him my implementation I also held a talk on about tokens and lexemes, after which he implemented it at least in the BBCode-parser.

When Matthew was ready to create the 1.10.0 tag, I mentioned to him that Zend_Markup would still require some more time before it could make it into the Zend Framework core, tho it was already merged into trunk by him. I was really wondering at that point, why Matthew didn't look deeper into the code while reviewing it, but on the other hand he cannot always understand the complete complexity of a component, so I don't really blame him for that. Tho he should have spotted that the documentation was incomplete and that the coding standard was violated at some points.

As he was going to create the 1.10.0 tag and I told him about some flaws which were still left, he moved the release two days further, so Pieter Kokx had still some time left to fix the missing parts. At that point I already forgot about the documentation and Zend_Markup made it into the 1.10 release.

A few days later, I wanted to implement Zend_Markup in my blog, replacing the old StringParser_BBCode. After I converted all callbacks and implemented all BBCodes I started unit-testing Zend_Markup against my old implementation. At that point I started spotting other major bugs which should not be unique to my own code:
  • BBCodes withing a code-tag would have been rendered after the closing code-tag
  • Newline characters would be rendered around block-level elements
  • List-items cannot contain newlines within, as they are treated as stoppers for those
  • Converter codes were partly still called tags
I informed Pieter about those issues for sure, and the first bug was closed quickly. When I asked how he had solved it, he told me that specific tags were hard-coded within the parser now, instead of the renderer. This is surely a complete wrong approach, as the responsibility of treating contents of a tag should be within the parser, but withing the renderer. This meant to me that I couldn't add a php-tag easily, except if I would extend the renderer class. This was one of the points for me where I started thinking about using my own BBCode parser.

When I noticed the problem with newline characters, I told Pieter that I'd create an issue ticket together with a patch, which I planned to work like in my own implementation to handle newline-characters around and within tags. To illustrate the problem, think about the common usual BBCode:
This is a nice list:

[list]
[*]First item
[*]Second item
[/list]

That's the list
This code would (somewhat) result in the following HTML with Zend_Markup:
This is a nice list:<br />
<br />
<ul>
<li>First item</li>
<li>Second item</li>
</ul><br />
<br />
That's the list
As you can see, there will be unwanted newlines rendered around the <ul> block-level element. After some tinkering around I had to find out that it was impossible to implement without major refactoring, which would affect the backward compatibility and thus had no chance to be fixed before Zend Framework 2.0.

In the end there is a component left which has neither a complete documentation (many methods are not documented in the manual) nor it can be used to full extend as a usual BBCode-Parser. I ended up refactoring the BBCode-parser I partly wrote withing my company and using it as final replacement until I can see that Zend_Markup does not suffer by those teething problems.

Again I'm really disapointed that this component was rushed into the 1.10 release, which again was partly my fault by not noticing given problems earlier. The probably best thing would have been to postpone it to 1.11 or 2.0, so that the all problems requiring a BC-break could be fixed and the documentation be fixed, but now it's too late for that.

Comments to this article

  • Avatar of Matthew Weier O'Phinney Reply Matthew Weier O'Phinney Sunday, February 28, 2010 8:40 PM

    When I'm reviewing code, I'm typically looking at coding standards, tests, and documentation. I'm curious what CS issues you saw in trunk, as I actually went through and revised/refactored code to conform to CS prior to pushing into trunk. As for tests, these were doing an adequate job of testing the behaviors outlined in the proposal, as well as some additional ones. Documentation could be better, but showed at least minimal use cases; advanced use cases are often fleshed out during the RC phase or following the release once we get more people using the component.

    I will be the first to admit that I do not always evaluate the algorithms or approach used in the code to accomplish their tasks. If I am unfamiliar with the problem domain, it makes sense to leave that task up to community review -- which, as you note, may have happened late in the release process. We have had very few occasions, if any, to remove code from trunk after it has already been promoted; it feels unfair to the contributor to do so, since they have already put in much work, and typically the code has been sitting in the incubator for weeks if not months, where it could be reviewed prior to promotion. Incubation is the appropriate time to make architectural critiques.

    When you raised your concerns, we were at relatively late stages of the release process. The code as written was basically doing what it should be doing, and while I understood you had architectural concerns, I did not feel they should stand in the way of release, particularly when the contributor was not only willing to work to fix things prior to release, but was actively doing so.

    Considering how soon 2.0 will be upon us, it also feels reasonable to get functionality out that developers can use *now*, and refactor as we approach 2.0. I hope that you will be willing to assist, as you certainly have some strong ideas on how to do so

    • Avatar of Ben Scholzen 'DASPRiD' Reply Ben Scholzen 'DASPRiD' Sunday, February 28, 2010 9:04 PM

      In fact I just looked through fisheye and it seems that I confused it a bit. Most coding standard violations were actually fixed prior to the promotion, and I'm not perfectly sure about the remaining ones. At that point it depends on how strict you consider the coding standard, but as far as I remember, there should be no kind of inline-comment outside of methods, like for example at the end of:

      http://framework.zend.com/code/browse/~raw,r=19720/Zend_Framework/standard/trunk/library/Zend/Markup/Renderer/RendererAbstract.php

      But as I said, it depends on how strict you see such stuff. PHPCS would probably see it as issue, some humans may not. As for the tests, I didn't mention them as we already spoke about that and I accpeted your valid points there.

      I'm sure we can do the appropriate refactoring for 2.0, and as I mentioned, I'm interested in using Zend_Markup when it fits my expectations.

      • Avatar of Matthew Weier O'Phinney Reply Matthew Weier O'Phinney Sunday, February 28, 2010 9:12 PM

        There are no restrictions against those; typically, we'll use a block comment for that sort of thing, but the CS doesn't address it one way or the other. Such comments are useful for giving a reviewer a visual clue as to how methods are grouped within the class.

        Glad to hear you're on board for helping improve the component!

  • Avatar of Matthew Weier O'Phinney Reply Matthew Weier O'Phinney Sunday, February 28, 2010 9:12 PM

    BTW, 2 things about your blog: 1) it'd be nice to get emails on follow-up comments. 2) the first time I enter a captcha always fails; it only takes on the second try.

    • Avatar of Ben Scholzen 'DASPRiD' Reply Ben Scholzen 'DASPRiD' Sunday, February 28, 2010 9:19 PM

      The email-notify is planned for the next version (9.2), as for the captcha, that has to be a bug in Zend_Captcha then, maybe the lifetime is not long enough, you can report that in Jira ;)

      • Avatar of Matthew Weier O'Phinney Reply Matthew Weier O'Phinney Sunday, February 28, 2010 9:39 PM

        Awesome -- got a comment notification just now. :)

        As for Zend_Captcha -- odd. Most likely it's a expiration-hops issue. I'll see if I can debug it sometime soon.

Leave a comment

Please note that your email address will not be shown, it is only used to fetch your avatar image from gravatar.com and for notifications.

                        _       
 ___ _   _  __ _ _   _ (_)_   _ 
/ __| | | |/ _` | | | || | | | |
\__ \ |_| | (_| | |_| || | |_| |
|___/\__,_|\__, |\__,_|/ |\__,_|
           |___/     |__/