How to get pyparser to work in a particular form

Refresh

March 2019

Views

589 time

1

Sorry for the sorry title. I could not think of anything better

I am trying to implement a DSL with pyparsing that has the following requirements:

  1. varaibles: All of them begin with v_
  2. Unary operators: +, -
  3. Binary operators: +,-,*,/,%
  4. Constant numbers
  5. Functions, like normal functions when they have just one variable
  6. Functions need to work like this: foo(v_1+v_2) = foo(v_1) + foo(v_2), foo(bar(10*v_6))=foo(bar(10))*foo(bar(v_6)). It should be the case for any binary operation

I am able to get 1-5 working

This is the code I have so far

from pyparsing import *

exprstack = []

#~ @traceParseAction
def pushFirst(tokens):
    exprstack.insert(0,tokens[0])

# define grammar
point = Literal( '.' )
plusorminus = Literal( '+' ) | Literal( '-' )
number = Word( nums )
integer = Combine( Optional( plusorminus ) + number )
floatnumber = Combine( integer +
                       Optional( point + Optional( number ) ) +
                       Optional( integer )
                     )

ident = Combine("v_" + Word(nums))

plus  = Literal( "+" )
minus = Literal( "-" )
mult  = Literal( "*" )
div   = Literal( "/" )
cent   = Literal( "%" )
lpar  = Literal( "(" ).suppress()
rpar  = Literal( ")" ).suppress()
addop  = plus | minus
multop = mult | div | cent
expop = Literal( "^" )
band = Literal( "@" )

# define expr as Forward, since we will reference it in atom
expr = Forward()
fn = Word( alphas )
atom = ( ( floatnumber | integer | ident | ( fn + lpar + expr + rpar ) ).setParseAction(pushFirst) |
         ( lpar + expr.suppress() + rpar ))

factor = Forward()
factor << atom + ( ( band + factor ).setParseAction( pushFirst ) | ZeroOrMore( ( expop + factor ).setParseAction( pushFirst ) ) )

term = factor + ZeroOrMore( ( multop + factor ).setParseAction( pushFirst ) )
expr << term + ZeroOrMore( ( addop + term ).setParseAction( pushFirst ) )
print(expr)
bnf = expr

pattern =  bnf + StringEnd()


def test(s):
    del exprstack[:]
    bnf.parseString(s,parseAll=True)
    print exprstack

test("avg(+10)")
test("v_1+8")
test("avg(v_1+10)+10")

Here is the what I want.

My functions are of this type:

foo(v_1+v_2) = foo(v_1) + foo(v_2)

The same behaviour is expected for any other binary operation as well. I have no idea how to make the parser do this automatically.

1 answers

2

Break out the function call as a separate sub expression:

function_call = fn + lpar + expr + rpar

Then add a parse action to function_call that pops the operators and operands from expr_stack, then pushes them back onto the stack:

  • if an operand, push operand then function
  • if an operator, push the operator

Since you are only doing binary operations, you might be better off doing a simple approach first:

expr = Forward()
identifier = Word(alphas+'_', alphanums+'_')
expr = Forward()
function_call = Group(identifier + LPAR + Group(expr) + RPAR)

unop = oneOf("+ -")
binop = oneOf("+ - * / %")
operand = Group(Optional(unop) + (function_call | number | identifier))
binexpr = operand + binop + operand

expr << (binexpr | operand)

bnf = expr

This gives you a simpler structure to work with, without having to mess with exprstack.

def test(s):
    exprtokens = bnf.parseString(s,parseAll=True)
    print exprtokens

test("10")
test("10+20")
test("avg(10)")
test("avg(+10)")
test("column_1+8")
test("avg(column_1+10)+10")

Gives:

[['10']]
[['10'], '+', ['20']]
[[['avg', [['10']]]]]
[[['avg', [['+', '10']]]]]
[['column_1'], '+', ['8']]
[[['avg', [['column_1'], '+', ['10']]]], '+', ['10']]

You want to expand fn(a op b) to fn(a) op fn(b), but fn(a) should be left alone, so you need to test on the length of the parsed expression argument:

def distribute_function(tokens):
    # unpack function name and arguments
    fname, args = tokens[0]

    # if args contains an expression, expand it
    if len(args) > 1:
        ret = ParseResults([])
        for i,a in enumerate(args):
            if i % 2 == 0:
                # even args are operands to be wrapped in the function
                ret += ParseResults([ParseResults([fname,ParseResults([a])])])
            else:
                # odd args are operators, just add them to the results
                ret += ParseResults([a])
        return ParseResults([ret])
function_call.setParseAction(distribute_function)        

Now your calls to test will look like:

[['10']]
[['10'], '+', ['20']]
[[['avg', [['10']]]]]
[[['avg', [['+', '10']]]]]
[['column_1'], '+', ['8']]
[[[['avg', [['column_1']]], '+', ['avg', [['10']]]]], '+', ['10']]

This should even work recursively with a call like fna(fnb(3+2)+fnc(4+9)).