Home   Cover Cover Cover Cover
 

How to use CoCo/R and the attributed C# grammar

CoCo/R is a compiler generator which takes a compiler description in the form of an LL(1) attributed grammar (ATG) and generates the scanner and the parser of the described parser.

The C# grammar, as presented in ECMA standard 334, is apparently not LL(1) and not written in EBNF. So we translated it to EBNF, did some factorizations, eliminated some productions, and inserted resolver clauses (IF (boolean expression)) to get an top-down-parsable EBNF grammar for C#. A resolver clause appears only at the beginning of an alternative that would otherwise produce an LL(1) conflict. The value of the boolean expression determines if the alternative shall be chosen by the parser or not.

Regardless of these changes to Coco/R, you can use the CoCo/R toolkit in the familiar way with the advantage of "Peek" functionality in the scanner and availabiliy of named handles for the tokens.

TOKENNAMES

Because one often needs to refer to the tokens when trying to resolve an LL(1) conflict, we introduced a new keyword (TOKENNAMES). After the keyword the user can define arbitrary identifiers for the tokens in the grammar. All (and only) the defined tokens will be made available as integer constants of a generated class Tokens, e.g.:

TOKENNAMES
  /* names for terminal classes */
  ident = ident
  literal = literal

  /* names for simple tokens */
  "(" = lpar
  ")" = rpar
  "{" = lbrace
  "}" = rbrace
  "++" = inc

  /* names for C# keyword tokens */
  "new" = newKW
  "int" = intKW

From the above definition Coco/R generates the following class Tokens
(Note: the actual values for the constants depend on the order of appearance of the tokens in the grammar)

public class Tokens {
  public const int id = 1;
  public const int num = 4;
  public const int lpar = 9;
  public const int rpar = 11;
  public const int lbrace = 6;
  public const int rbrace = 23;
  public const int inc = 39;
  public const int newKW = 27;
  public const int intKW = 15;
}

Conflict Resolution and "Peek" functionality

To be able to distinguish between two alternatives which are not distinguishable by only a single lookahead token, we have added "peeking" functionality to the scanner which is exposed by the two static functions of the generated Scanner class:

  • void StartPeek ()
    ensures that peeking starts from the current parsing position


  • Token Peek ()
    returns the next peek token, moves the current peek position ahead, while not effecting the current parsing position.
    NOTE: the scanner's method Token Scan () returns the next parse token and moves the current parse and peek positions ahead.

Let's do a simple example:

S1 = a b
   | a c .

This LL(1) conflict (marked red) could be resolved by simply transforming the grammar to:

S1 = a ( b | c ) .

But the grammar may loose some semantic information and/or clarity. So you can now solve the problem by taking advantage of the new conflict resolution capability of Coco/R (marked red):

S1 = IF (IsAandB()) a b
   | a c .

When the parser reaches the point where it has to choose between ab or ac, it does not check the lookahead token, but calls IsAandB() instead and selects the first alternative, if the call returns true. Otherwise it continues with checking the next alternative.

The resolution function IsAandB() can be implemented as follows (using the peek functionality of the scanner (marked red)):

static bool IsAandB () {
  Scanner.StartPeek(); // peek = lookahead
  return la.kind == Tokens.a &&
         Scanner.Peek().kind == Tokens.b;
}

Let's do a more complex example:

S2 = { a } b
   | { a } c .

This LL(1) conflict (marked red) could be resolved by transforming the grammar to:

S2 = { a } ( b | c ).

But again the grammar may loose semantic information and/or clarity and thus want to solve the problem with conflict resolvers in the same way as above.

S2 = IF (FollowsB()) { a } b
   | { a } c .

NOTE: In the previous example a fixed lookahead of k > 1 would do the trick, i.e. the grammar is LL(2). Here this is not possible any more, i.e. the grammar is not LL(k) for any fixed k.

This time the resolution function FollowsB() looks like this:

static bool FollowsB () {
  Scanner.StartPeek();
  int peek = la.kind;
  // look behind { a }
  while (peek == Tokens.a)
    peek = Scanner.Peek().kind;
  return peek == Tokens.b;
}

This example shows how to easily achieve an arbitrary lookahead where necessary.

And another one:

IdentList = ident { "," ident } [ "," ]
NOTE: Contructs like this one actually appear in the C# grammar.

In this case the LL(1) conflict (marked red, above) cannot be resolved by rewriting the grammar, so we have to use our conflict resolvers (marked red, below):

IdentList = ident { IF (NotFinalComma()) "," ident } [ "," ] .

... and the resolution function looks like this:

static bool NotFinalComma () {
  Scanner.StartPeek(); // peek = lookahead
  return Scanner.Peek().kind == Tokens.ident &&
         Scanner.Peek().kind == Tokens.comma;
}

Play with the examples

In order to give you a better feel for the whole thing, we put the above three examples in one attributed grammar (Examples.atg) and added a Main class (in Comp.cs). We also provide the generated files Scanner.cs and Parser.cs as well as the .NET executable (Comp.exe). You can download all this in a single zip file.

To test the examples, create a text file and write ONE valid (or invalid, if you want to check out error handling) sentence into it, e.g.
TestS1.txt:
1 a b
  or   TestS2.txt:
2 a a a a c
  or   TestIdentList.txt:
3 ABCDEF, ZYXWV ,
Now execute
      > Comp.exe TestS1.txt
or   > Comp.exe TestS2.txt
or   > Comp.exe TestIdentList.txt

If you want to experiment with the grammar and regenerate the parser, you need to get Coco executable and the frame files (Scanner.frame, Parser.frame), put them all in the same directory as the attributed grammar (.atg-file) and then execute
   > coco.exe Examples.atg
This will generate new versions of Scanner.cs and Parser.cs that reflect your changes to the grammar. Now execute
   > csc.exe Comp.cs Scanner.cs Parser.cs
in order to get the new .NET executable.

Using the C# grammar

In order to use CoCo/R with these extensions to parse C# source code, we had to instrument the C# grammar in the way demonstrated above.

You can now instrument the grammar as you wish, i.e. add attributes to non-terminals and semantic actions to the productions. But you should not change existing parts of the grammar unless you know exactly what are you doing, because otherwise you might break the parser!

Here is an example:

... C O M I N G   S O O N ...

Download

Please go to our Rotor Community Project Site for up-to-date downloads (File Sharing).