Home   Cover Cover Cover Cover
 

How to use CoCo/R and the attributed C# grammar

NOTE: This documentation refers to the early (alpha) version of Coco/R with LL(1) conflict resolution capabilies. It is fundamentally different from the current version which we describe in our presentation at the 2nd Rotor Workshop and which can be downloaded from the Rotor Community Project Site.

Only the current version will be updated from now on!

CoCo/R is a compiler generator which takes a compiler description in the form of an LL(1) attributed grammar (ATG) and generates the scanner and the parser of the described parser.

The C# grammar, as presented in ECMA standard 334, is apparently not LL(1) and not written in EBNF. So we translated it to EBNF, did some factorizations, eliminated some productions, and inserted artificial tokens to get an LL(1)-EBNF-C# grammar. The artificial tokens are generated on the fly by peek functions which look ahead in the token stream until they can decide which alternative matches the actual tokens.

Regardless of these changes to Coco/R, you can use the CoCo/R toolkit in the familiar way with the advantage of "Peek" functionality and availabiliy of handles for the tokens.

Token Handles

Because we need to refer to the tokens in the peek functions, we altered the ParserGen class such that it generates a class Tokens into Parser.cs, which lists every token as an integer constant (named like the token with a leading underscore) , e.g.:

public class Tokens {
  public const int _EOF=0;
  public const int _literal=1;
  public const int _ident=2;
  ...
  public const int _Plus=99;
  public const int _Minus=100;
  public const int _Not=101;
  public const int _PlusPlus=102;
  ...
}

Tokens, which do not have a name in the ATG, like "+", "-", "++", etc., get a suggestive name like "_Plus", "_Minus", "_PlusPlus", etc.

"Peek" functionality

To distinguish between two alternatives which are not distinguishable by the single look ahead token alone, we have added a "peek mode" exposed by the two functions:

  • StartPeek(): activates the peek mode. Now Parser.Get() does not recognize tokens any more, but only provides them for determining the appropriate alternative.
  • ClosePeek(Token): ends the peek mode. Restores the state before the peek mode has been activated and inserts either the provided token after the last recognized token, i.e. it becomes the new look ahead token and will therefore be used by CoCo/R to decide where to go from here. If the provided token reference is null no token will be inserted

Let's do a simple example:

S = A | B .
A = { a b } c .
B = { a b } d [e].

This LL(1) conflict could be simply solved by writing:

S = { a b } ( c | d [e] ).

But the grammar may loose some semantic information and clarity. So you can now solve the problem also by taking advantage of the peek facility in a semantic action:

NOTE: We aligned the semantic actions farther right to separate it from the syntax and thus (hopefully) increase the readability of the grammars.

S =                                                 (. SetArtificialToken(); .)
    (ArtificialAToken A | ArtificialBToken B) .
A = { a b } c .
B = { a b } d [e] .

The peek function SetArtificialToken() could be implemented like:

static void SetArtificialToken() {
   Token ll1Token = new Token();

   StartPeek();
   while(t.kind==Tokens._a) {
     Get();
     Get(); // jump over b
   }
   if (t.kind==Tokens._c) {
     ll1Token.kind=Tokens._ArtificialAToken;
     ll1Token.val="ArtificialAToken";
   else {
     ll1Token.kind=Tokens._ArtificialBToken;
     ll1Token.val="ArtificialBToken";
   }
   ClosePeek(ll1Token);
}

With the two artificial tokens:

ArtificialAToken="²" "ArtificialAToken" .
ArtificialBToken="²" "ArtificialBToken" .

The artificial tokens start with "²" (because this character does not appear in the given grammar) followed by the token name in order to guarantee the uniqueness of every token.

Let's do a more interresting example:

S = a { b a } [b]

The LL(1) conflict in this case cannot be solved by rewriting the grammar, so we have to use our peek functionality:

S = a                        (. SetNotFinalBToken(); .)
    { NotFinalBToken b a     (. SetNotFinalBToken(); .)
    } [b]

In this case the peek function could be implemented like:

static void SetNotFinalBToken() {
   Token ll1Token = null;

   StartPeek();
   if (t.kind==Tokens._b) {
     Get();

     if (t.kind==Tokens._a) {
     ll1Token = new Token();
     ll1Token.val="NotFinalBToken";

     ll1Token.kind=Tokens._NotFinalBToken;
   }
   ClosePeek(ll1Token);
}

With an artificial token definded as above:

NotFinalBToken ="²" "NotFinalBToken" .

Using the C# grammar

In order to use CoCo/R with these extensions to parse C# source code, we had to instrument the C# grammar in the way demonstrated above. We inserted a /* Do not alter ! */ comment before every semantic action that is necessary for correctly parsing the grammar, e.g.:

CompilationUnit =
  { UsingDirective }                           (. /* Do not alter ! */
                                                  SetSquareBraceOpenIdentIsAssemblyToken(); .)
  { llSquareBraceOpenIdentIsAssemblyToken
    GlobalAttributeSection                     (. /* Do not alter ! */
                                                  SetSquareBraceOpenIdentIsAssemblyToken(); .)
  }
  { NamespaceMemberDeclaration } .

You can now instrument the grammar as you wish, but you should not touch the semantic actions which we inserted unless you know exactly what are you doing.

Download:

What you need too use CoCo/R on C# source files

CocoOnCs.zip contains

Coco.exe The executable.
Scanner.frame The frame file from which the C# scanner is generated
Parser.frame The frame file from which the C# parser is generated.
cs.ATG The attibuted C# grammar.

In order to create an application, add your own semantic actions to this grammar and then have Coco/R generate the scanner and parser for you. Add the main application code to another C# file (e.g. MyApp.cs) that uses the Scanner and Parser. Then use csc.exe to compile these there into your application exe:
> coco MyCs.ATG
> csc MyApp.cs Scanner.cs Parser.cs

What you may be intrested in

Note: These files of the Coco/R compiler generator include the changes for this project and therefore differ from the official version.

CocoNew.zip contains

Coco.exe The executable.
Coco.ATG The attributed grammar. Describes the processing of a compiler description.
Scanner.frame The frame file from which the Coco/R scanner is generated.
Scanner.cs Scanner generated from Coco.ATG.
Parser.frame The frame file from which the Coco/R parser is generated.
Parser.cs Parser generated from Coco.ATG.
ParserGen.cs This class builds a syntax graph of the grammar rules and generates the parser source file from it.
Tab.cs Symbol table of Coco. Stores information about terminals and nonterminals.
DFA.cs This class builds the scanner automaton and generates the scanner source file.
Coco.cs Main class. Initializes the scanner and calls the parser. This file also contains the custom error message class Errors.