1 year ago

#208980

test-img

Spart

Lark simple sql grammar

I'm trying to parse a simple sql via this grammar:

grammar = ```
        program          : stmnt*
        stmnt            : select_stmnt | drop_stmnt
        select_stmnt     : select_clause from_clause? group_by_clause? having_clause? order_by_clause? limit_clause? SEMICOLON

        select_clause    : "select"i selectables
        selectables      : column_name ("," column_name)*
        from_clause      : "from"i source where_clause?
        where_clause     : "where"i condition
        group_by_clause  : "group"i "by"i column_name ("," column_name)*
        having_clause    : "having"i condition
        order_by_clause  : "order"i "by" (column_name ("asc"i|"desc"i)?)*
        limit_clause     : "limit"i INTEGER_NUMBER ("offset"i INTEGER_NUMBER)?

        // NOTE: there should be no on-clause on cross join and this will have to enforced post parse
        source           : joining? table_name table_alias?
        joining          : source join_modifier? JOIN source ON condition
        
        //source           : table_name table_alias? joined_source?
        //joined_source    : join_modifier? JOIN table_name table_alias? ON condition
        join_modifier    : "inner" | ("left" "outer"?) | ("right" "outer"?) | ("full" "outer"?) | "cross"
        
        condition        : or_clause+
        or_clause        : and_clause ("or" and_clause)*
        and_clause       : predicate ("and" predicate)*

        // NOTE: order of operator should be longest tokens first
   
        predicate        : comparison ( ( EQUAL | NOT_EQUAL ) comparison )* 
        comparison       : term ( ( LESS_EQUAL | GREATER_EQUAL | LESS | GREATER ) term )* 
        term             : factor ( ( "-" | "+" ) factor )*
        factor           : unary ( ( "/" | "*" ) unary )*
        unary            : ( "!" | "-" ) unary
                         | primary
        primary          : INTEGER_NUMBER | FLOAT_NUMBER | STRING | "true" | "false" | "null"
                         | IDENTIFIER

        drop_stmnt       : "drop" "table" table_name

        FLOAT_NUMBER     : INTEGER_NUMBER "." ("0".."9")*

        column_name      : IDENTIFIER
        table_name       : IDENTIFIER
        table_alias      : IDENTIFIER

        // keywords
        // define keywords as they have higher priority
        SELECT.5           : "select"i
        FROM.5             : "from"i
        WHERE.5            : "where"i
        JOIN.5             : "join"i
        ON.5               : "on"i

        // operators
        STAR              : "*"
        LEFT_PAREN        : "("
        RIGHT_PAREN       : ")"
        LEFT_BRACKET      : "["
        RIGHT_BRACKET     : "]"
        DOT               : "."
        EQUAL             : "="
        LESS              : "<"
        GREATER           : ">"
        COMMA             : ","

        // 2-char ops
        LESS_EQUAL        : "<="
        GREATER_EQUAL     : ">="
        NOT_EQUAL         : ("<>" | "!=")

        SEMICOLON         : ";"

        IDENTIFIER.9       : ("_" | ("a".."z") | ("A".."Z"))* ("_" | ("a".."z") | ("A".."Z") | ("0".."9"))+

        %import common.ESCAPED_STRING   -> STRING
        %import common.SIGNED_NUMBER    -> INTEGER_NUMBER
        %import common.WS
        %ignore WS

However, when I call the parser with text, """select cola, colb from foo left outer join bar b on x = 1 join jar j on jb > xw where cola <> colb and colx > coly""", it parses the second join as a term, i.e. as part of the first join's condition. Any thoughts on how to do this correctly?

lark-parser

0 Answers

Your Answer

Accepted video resources