How can I change a token type int an ANTLR3 lexer rule?

I have a lexical rule (Integer) which uses some fragments. In a parser rule (parse) I want to rewrite my tree differently depending on which fragment generated the token in question. I have made a small grammar to demonstrate what I'm attempting:

grammar subrange;

options {
    output=AST;
}

tokens {
    NumberNode;
    DecimalNode;
    BinaryNode;
    HexNode;
    OctalNode;
}

parse
    : Integer+ -> ^(NumberNode Integer)+
    ;

Integer
    : DECIMAL_LITERAL
    | BINARY_LITERAL
    | HEX_LITERAL
    | OCTAL_LITERAL
    ;

fragment BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

fragment HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

fragment HEX_DIGIT
    : (DIGIT|'a'..'f'|'A'..'F')
    ;

fragment DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

fragment OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

fragment DIGIT
    : '0'..'9'
    ;

SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};

I want the parse rule to rewrite a DECIMAL_LITERAL under an imaginary DecimalNode but a BINARY_LITERAL under a BinaryNode (rather than everything under a NumberNode).

I'm attempting to do this by changing the token type inside the lexical rule so that I can then rewrite accordingly inside the parse rule.

I think I should be able to do this with an action but I have been unable to figure out how to find the returned token in order to change its type. http://www.antlr.org/wiki/display/ANTLR3/Special+symbols+in+actions seems to indicate that $tokenref should work but it doesn't get translated at all.

Or is there another way to accomplish this?

Thanks in advance.

Answers


It seems a bit odd to me: grouping all such literals under a single Integer token, and then, in a parser rule you want to separate them again.

Why not just remove Integer and do:

integer
    : BINARY_LITERAL // when output=AST, this creates a CommonTree with type 'BINARY_LITERAL'
    | HEX_LITERAL    // ...
    | DECIMAL_LITERAL
    | OCTAL_LITERAL 
    ;

BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

?

Or you could keep the Int(eger) rule but set the numerical value of the various int-literals by doing:

Int
@init{int skip = 0, base = 10;}
    : ( DECIMAL_LITERAL
      | BINARY_LITERAL  {base = 2;  skip = 2;} 
      | OCTAL_LITERAL   {base = 8;  skip = 2;} 
      | HEX_LITERAL     {base = 16; skip = $text.contains("#") ? 3 : 2;} 
      )
      {
        setText(String.valueOf(Integer.parseInt($text.substring(skip), base)));
      }
    ;

fragment BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

fragment HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

fragment DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

fragment OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

Be careful giving rules a name as some object/class/reserved-word of the target language can have (Integer in case of Java).


EDIT

Okay. I'll leave my other answer there in case passers-by are wondering why on earth I'm proposing this... :)

Here's what (I think) you're after:

grammar T;

options {
  output=AST;
}

tokens {
  BinaryNode;
  OctalNode;
  HexNode;
  DecimalNode;
}

parse
 : integer+
 ;

integer
 : i=Integer -> {$i.text.startsWith("2#")}?         ^(BinaryNode Integer)
             -> {$i.text.startsWith("8#")}?         ^(OctalNode Integer)
             -> {$i.text.matches("(16#|0[xX]).*")}? ^(HexNode Integer)
             ->                                     ^(DecimalNode Integer)
 ;

Integer
 : DECIMAL_LITERAL
 | BINARY_LITERAL
 | HEX_LITERAL
 | OCTAL_LITERAL
 ;

fragment BINARY_LITERAL
 : '2#' ('0' | '1')+
 ;

fragment HEX_LITERAL 
 : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
 ;

fragment HEX_DIGIT
 : (DIGIT|'a'..'f'|'A'..'F')
 ;

fragment DECIMAL_LITERAL 
 : ('0' | '1'..'9' DIGIT*)
 ;

fragment OCTAL_LITERAL 
 : '8#' ('0'..'7')+
 ;

fragment DIGIT
 : '0'..'9'
 ;

SPACE 
 : (' ' | '\t' | '\r' | '\n')+ {skip();}
 ;

Parsing the input "2#1111 8#77 0xff 16#ff 123" will result in the following AST:

Since you've lost the information about what type of Integer each literal is, you will have to do this check in the integer-rule (the -> {boolean-expression}? ... things after the rewrite rules).


Need Your Help

VS2010 Setup Project - Run As Administrator

visual-studio-2010 installation uac

I have a VS2010 solution with 2 projects in it - a .NET 4 program, and an installer for it. The Installer is just a simple Setup Project with a prerequisite - .NET Framework 4.