среда, 7 декабря 2016 г.

ANTLR, how to validate the balance?

Sometime we need to  check that open and closed brackets, quotes and double quoutes are in the balance. How to process it? The best way is to create the antlr grammar, parse the text then validate it.

Please, see the grammar text bellow.

 grammar GrammarValidator;  
 r  
 :  
      (  
           string  
           | expression  
      )*  
 ;  
 string  
 :  
      QUOTED  
 ;  
 expression  
 :  
      not_string_term  
      | '{'  
      (  
           r?  
      ) '}'  
      | '['  
      (  
           r?  
      ) ']'  
      | '('  
      (  
           r?  
      ) ')'  
 ;  
 not_string_term  
 :  
      NOT_STRING_CHAR+  
 ;  
 NOT_STRING_CHAR  
 :  
      ~['"'|'\'']  
 ;  
 QUOTED  
 :  
      '"' StringCharacters? '"'| CharacterLiteral  
 ;  
 CharacterLiteral  
   :  '\'' SingleCharacter '\''  
   |  '\'' EscapeSequence '\''  
   ;  
   fragment  
 SingleCharacter  
   :  ~['\\]  
   ;  
 fragment  
 StringCharacters  
   :  StringCharacter+  
   ;  
 fragment  
 StringCharacter  
   :  ~["\\]  
   |  EscapeSequence  
   ;  
 fragment  
 EscapeSequence  
   :  '\\' [btnfr"'\\];  
 WS  
 :  
      [ \t\r\n\u000C]+ -> skip  
 ;  

As you can see our main rule contents from the two subrules: string and expression. String contains any character except double quote or slash '\' or EscapeSequence. 
Expression contains not_string_term (anything except quote or double quote) or main rule inside the brackets.

Then we can validate error via standard antlr mechanism. That's all, very simple


How to work with comments in antlr4

Hi all.
What is comments? It's some text what describe code of you program. Really it's part of the program text, but needs to ignore it when we process algorithm implemented in your code. Althouse, sometimes we need to work with it. For example, when we need to hightlight comment string in the text editor.

What is channels?

Lets imagine that we need to tell parser:please, skip something - don't include it in the syntax tree. How to do it? The best way is to define skipped lexeme. Each time when parser will recognise it in the code it will skip this lexeme.

We can do it in following way

WS : [ \r\t\u000C\n]+ ->skip;
This mean that tabs, spaces and line breaks will skipped - not included in the final syntax tree.
But, where it dissapied? Good question! Really it's go to the channel with name skip. How many channels can be? As many, as you need! You can to specify your own channel and process it.

For example, in following way you can skip multiline

MULTILINE_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);

or online

ONE_LINE_COMMENT: '#' ~[\n\r]->channel(HIDDEN);

comments.

But, what if ignoring of the comments isn't enough. You need to process it in some way. In this case we can get hidden channels and do with it what we want.