1 year ago

#366895

test-img

Wave and Matter

Antlr: Re-use array access on righ/left-hand side without introducing forbidden left-recursion

Most programming languages allow array or struct access both on the left-hand sight -as target for storage- and the right-hand side -as source for a load- of an assignment. I would like to handle that with an efficient antlr grammar.

"Efficient" would include in my view, that the "access" operator which generates the pointer to the accessed element is represented by a single antlr rule so that it can be re-used on the left-hand and the right-hand side. The calling right-hand side would then load from, the left-hand side calling code would store to the calculated address.

Naively it would look like that:

expr:
    lhs=access '=' rhs=rvalue               #AssignExpr
;
rvalue: 
    left=rvalue op=OPM right=rvalue         #opExpr
    | left=rvalue op=OPA right=rvalue       #opExpr
    | atom=NUM                              #atomNumExpr
    | access                                #atomAccExpr   
    | '(' rvalue ')'                        #parenExpr
;
access:
    atom=ID                                 #VarAccExpr
    | base=rvalue '[' index=rvalue ']'      #arrAccExpr
;

Trying this however gives a forbidden left-recursion because "rvalue" and "access" include each other.

I have several ideas how to fix this, but I don't like any of them and/or cannot make them work out:

a) Abandon the extra rule for access, paste the sub-rules into both "expr" and "rvalue". This will lead to copying of both rules and parser/visitor code, but the recursion will be "fixed" since a rule (in this case rvalue) recursing into itself is allowed. This will technically work, but is ugly in my eyes. At least a minimum of code-reuse would be possible if the antlr-visitors at least call the same code-generation helper function for the array access in the end...

b) Try some way to fix the recursion. Both "base" and "index" really can be any rvalue, so I don't see how I could ever fix the recursion to the full "rvalue" rule without sacrificing functionality. I also thought about a fix by moving the mutual recursion into a self-recursion of "access" by allowing "rvalue_noaccess | access" there explicitly. But this in turn would require a splitting of "rvalue" into "rvalue_noaccess | access" as well, which would then in turn be mutually recursive again....

c) Abandon rule for "access" and paste "access" subrules into "rvalue" like in solution (a). Then resolve the broken reference to "access" in "lhs=access" in "expr" by "lhs=rvalue". This would both re-use the access code (though not as an extra rule) and fix left-recursion. However, my visitor code would then need to check by hand that the "rvalue" on the left-hand side is actually something writable and not, for instance, "3+5". With (a) or (b), this would have been checked by the antlr parser itself.

Right now, (c) seems like the least bad solution. But I would be happy if anybody can tell me what is the usual way to implement this and whether there is any good way not listed in my ideas above.

EDIT After re-examination i am not sure whether i understand the problem correctly at all. Another older set of rules, which is also mutually recursive does work without a problem!

block:
    NL* expr                                    #oneLiner
    | NL* '{' NL* expr ( NL+ expr)* NL* '}'     #trueBlock;
expr:
    //...
    | 'if' cond=rvalue ':' ifcase=block NL* ('else' ':' elsecase=block)? #condExpr
    //...;

Apparently, this case is recursive, but not "left" recursive....

antlr4

0 Answers

Your Answer

Accepted video resources