Simple Parser in JavaCC
I've pretty much decided to write an improved NNTP server for USENET that includes a number of features that USENET really requires. Basically the idea is to improve USENET. I don't know that I'll be successful in this attempt, but I'm going to write the code anyway.
I'm just getting started in this endeavor and the first step is to write an NNTP server that doesn't contain any extensions and follows the specification well. I've started on this task and it became time to refactor the NNTP command parser to something a little more generic.
I asked on USENET for suggestions on which parser generator to use and JavaCC is the one that seemed to be the most popular. To get started I wrote a grammar that only accepted four commands; HELP, POST, GROUP, and ARTICLE. Only the message ID version of ARTICLE was accepted. This grammar doesn't really do anything useful yet, but I thought it might be useful to people working to implement simple grammars in JavaCC.
The biggest problem was recovering from an error. There are really two parts to this. The first is to make sure that your grammar matches any input that comes in. That is done in my grammar using the CATCH_ALL token. This will prevent any TokenMgrError from being thrown. The second step is to catch the ParseExcpetion when it is thrown and to do something useful. In my grammar, we just throw out everything up to the new line and then continue parsing. I think this is probably how many people would like their parsers to work.
You might notice the JAVACODE non-terminal and wonder why I didn't just use a regular empty non-terminal. As it turns out, JavaCC doesn't like it when you do this. By using the JAVACODE statement certain sanity checks in JavaCC are turned off and then everything starts working correctly. Thanks go to Jonathan Revusky for this.
I hope this code is useful to you. If you have any questions I recommend the mailing list that can also be accessed on the Gmane USENET news server at news.gmane.org.
options {
STATIC = false;
IGNORE_CASE = true;
}
PARSER_BEGIN(NNTPParser)
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
public class NNTPParser {
public static void main(String[] args) throws
ParseException,IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader (System.in));
NNTPParser parser = new NNTPParser(reader);
parser.start();
}
}
PARSER_END(NNTPParser)
SKIP : { " " | "\t" }
TOKEN : {
<HELP_CMD: "HELP" >
| <GROUP_CMD: "GROUP" >
| <ARTICLE_CMD: "ARTICLE" >
| <POST_CMD: "POST" >
| <MSG_ID: "<"((["a"-"z","_", "@"])*".")*(["a"-"z"])*">" >
| <GROUP: ((["a"-"z"])*".")*(["a"-"z"])+ >
| <LINE_END: "\n" | "\r\n" >
| <CATCH_ALL: ~[] >
}
void start() throws IOException :
{
Token t;
}
{
(
try {
<HELP_CMD> <LINE_END>
{ System.out.println("HELP"); }
| <POST_CMD> <LINE_END>
{ System.out.println("POST");
| <GROUP_CMD> t = <GROUP> <LINE_END>
{ System.out.println("GROUP " + t.image); }
| <ARTICLE_CMD> t = <MSG_ID> <LINE_END>
{ System.out.println("ARTICLE " + t.image.substring(1, t.image.length() - 1)); }
| oops()
}
catch (ParseException ex) {
boolean errorSent = false;
do {
t = getNextToken();
if (! errorSent && t.kind != EOF) {
System.out.println("Syntax Error");
errorSent = true;
}
} while (t.kind != LINE_END && t.kind != EOF);
if (t.kind == EOF) {
return;
}
}
)*
}
JAVACODE
void oops() {
throw new ParseException("Oops");
}