Posts Tagged ‘parsing’

jParser and jTokenizer released

Saturday, November 14th, 2009

After nearly two years I've finally gotten around to releasing my PHP JavaScript parser, although documentation is still thin on the ground.

The library has been split in two:

  1. jTokenizer – A JavaScript tokenizer designed to mimic the PHP tokenizer.
  2. jParser - The fully blown JavaScript syntactical parser which generates a parse tree.

(more…)

jParser grammar

Thursday, February 26th, 2009

I've been asked how I generate the JavaScript parse table for jParser, so I'm posting the grammar file here for anyone else who's interested.

↓ JavaScript grammar file for jParser

(more…)

JavaScript Obfuscator and Minifier

Saturday, June 14th, 2008

This tool is based on a full JavaScript parser that is part of a much bigger plan. I won't go into that just yet, but along the way I'm going to be releasing useful tools like this as they come about. It's useful to have some short term goals to keep up morale and ensure that the framework is working well.

> Try it here: Obfuscate and minify your JavaScript code

(more…)

JParser now with Unicode support

Sunday, June 8th, 2008

I've updated my JavaScript parser to include full Unicode support.
Check out the test interfaces for:
» Full parser;
» Code highlighting.

(more…)

JavaScript Syntax Nuances

Saturday, June 7th, 2008

If you learn a programming language it is unlikely that you will read the formal language specification that defines all the laws of the syntax. You may never read it at all. It is more useful to learn by example, or at least topic-by-topic. However, a mere ten years after writing my first few lines of JavaScript, I read the ECMAScript standard and it threw up some things I did not know.

There are many things that you can write in JavaScript that are perfectly valid syntax, but that you probably never will write. Here are a few that raised an eyebrow or two.

(more…)

JParser now with Automatic Semicolon Insertion

Sunday, June 1st, 2008

I finally found a spare few hours to implement Automatic Semicolon Insertion into my JavaScript Parser.
Check out the test interface here.

(more…)

Full JavaScript parser for PHP

Friday, May 9th, 2008

[ Update 18 Nov 2009 ]

This article is rather old now – the jParser code has been released

Despite the glorious sunshine this week, my week off, I managed to put some time into my pet project of developing a full JavaScript parser written in 100% native PHP. Actually, I've been developing a generic parser suite for some time, and using it to build a full JavaScript parser was my ultimate goal to be satisfied that it all works and is powerful enough to be useful. I've written a bunch of blogs about developing a parser generator in PHP, (click "parsing" to do a tag search).

Before I start wittering on,
Click here to play with the online example of JParser

(more…)

Parsing for PHP developers – Part III

Sunday, April 6th, 2008

JSON Parser

If you haven't read Part 1, or Part 2 they are there for the reading.

I'm going to demo a JSON parser in this post. It's 100% native PHP code, and is based on the work I've done toward my ultimate goal of a full JavaScript parser.

Click here to play with the interactive JSONParser demo

I thought I'd get this example online now as my ultimate goal is taking longer than I had hoped. I shan't go into the details, suffice to say that the JSON grammar below is a very tiny subset of the full JavaScript grammar and doesn't really have any complex rules.

(more…)

Parsing for PHP developers – Part II

Sunday, March 30th, 2008

In part 1 I introduced and demonstrated the parsing concept using a very simple date parser. In this part I am going to talk about the important role of tokenizing. If you haven't read part 1 this may not make much sense, so read it now if you haven't already.

Syntactical vs Lexical

Looking again at the simple grammar of part 1. You may notice that the rule: <D_DIGIT> ::= "0" | "1" ... "9" is a bit different to all the others. It does not really contribute to the syntax of our language, it merely describes the legal characters that make up a single digit. It is convenient to view this aspect of the language as a subset of the grammar; one that is concerned only with what input ‘looks like' rather than where it appears. This can be called the lexical grammar. The rest of the language which is concerned with syntax can be called the syntactical grammar. (more…)

Parsing for PHP developers – Part I

Monday, March 24th, 2008

Parsing is a fairly common word in the web developer's vocabulary. We do it all the time. One immediately thinks of XML as something we parse regularly without batting an eyelid. As a PHP developer you might also parse an ini file with parse_ini_file, or parse a date string with strtotime. Whatever language you write, these tasks are easily achieved using either built-in functions or by installing other code libraries or extensions. Sometimes you may find yourself needing to parse something more bespoke, like say a postcode – you'll either write a routine yourself, or do some googling for a neat algorithm someone out there has decided to share. – no problem.

A rod for my back

But what if you want to parse something really complex, like say – an entire JavaScript program. What if you can't find a third party library that works for you? Well I tried to find one. I found some very promising projects. But they ranged from abandoned projects, to dodgy alpha releases, to ones that just plain didn't work and with no documentation to help. The most serious looking projects were so sophisticated that I didn't even have the knowledge to start using them. I decided, as I often do, that I need empowering with the knowledge to write my own parser should I need one for – well, whatever. (more…)