Full JavaScript parser for PHP

Despite the glorious sunshine this week, my week off, I managed to put some time into my pet project of developing a full JavaScript parser written in 100% native PHP. Actually, I've been developing a generic parser suite for some time, and using it to build a full JavaScript parser was my ultimate goal to be satisfied that it all works and is powerful enough to be useful. I've written a bunch of blogs about developing a parser generator in PHP, (click "parsing" to do a tag search).

Before I start wittering on,
Click here to play with the online example of JParser

Here are the main difficulties I encountered while building the JavaScript parser:

1. Performance
Generating the parse table was taking about 30 minutes and using several hundred megabytes of memory. Going back to the drawing board with certain parts of the parse table generator, I've managed to get this down to about 7 minutes on my humble Mac Mini.

2. Special rules
The ECMAScript standard states certain special cases in the grammar rules. One of particular note (clause 12.4) says that an ExpressionStatement may not begin with a "{" or a "function". This special rule avoids ambiguity and therefore avoids parse table conflicts, but the rule is effectively outside of the grammar. I've finally found the right part of the parser architecture to implement such rules. They are currently hard coded into a generation script, but could be built into an extended BNF notation if need be.

3. Automatic semicolon insertion
As you probably know just from writing JavaScript, the ECMAScript standard permits the lazy omission of semicolons at the end of some statements, as long as you terminate with a line break instead. This is actually more complex than it sounds, but more to the point, it is as another special rule that is not directly a part of the grammar and will probably have to be handled at parse time. I have not yet tackled this issue. It is my next challenge.
[UPDATE: This is now implemented, See! ]

Bookmark this!
[Bloglines] [del.icio.us] [Digg] [Facebook] [Furl] [Google] [Newsvine] [StumbleUpon] [Technorati] [Yahoo!]

Tags: , , ,

3 Responses to “Full JavaScript parser for PHP”

  1. Tim Says:

    [UPDATE]
    Automatic Semicolon Insertion now largely implemented.

  2. mw Says:

    Alright; ya got me. I’m interested in what you’re doing, as it’s part of what I’m thinking

    My goal:
    1) Use XML similar to your output: http://dasganze.com/tmp/sample1.xml
    2) Create a visual code-explorer, based on an outline like: http://www.flickr.com/photos/aaronp/2134057660/sizes/l/
    3) Ideally, use this new code-explorer build new JS code like Lego?

    Perhaps you have similar goals?

  3. Tim Says:

    Actually the output wasn’t supposed to be XML, it’s just convenient to format it that way. Perhaps I should modify the dump routine to print valid XML.

    A visual representation of the parse tree would be a fun Flash project, but not on my list. If you’re interested in this kind of thing outside the realm of PHP, Google “ANTLR”

    Incidentally, my real goal for this parser framework is a bigger deal than this on it’s own - check it out: http://web.2point1.com/2008/09/11/jaspa-sneak-preview/

Leave a Reply