07.363 Logic Programming: Lecture 12

Writing Parsers in Prolog

  Prolog syntax is flexible, but there are times when something more
  sophisticated is needed.

  Easy to write recursive descent parser in Prolog.  Even easier when
  using grammar rules (DCGs).

  Anyone not taking 07.330?
  Here is a crash course in lexical analysis and parsing.


A DCG for Evaluating Arithmetic Expressions

	expr(Z) --> term(X), "+", expr(Y), { Z is X + Y }.
	expr(Z) --> term(X), "-", expr(Y), { Z is X - Y }.
	expr(X) --> term(X).

	term(Z) --> number(X), "*", term(Y), { Z is X * Y }.
	term(Z) --> number(X), "/", term(Y), { Z is X / Y }.
	term(Z) --> number(Z).

	number(C) --> "+", number(C).
	number(C) --> "-", number(X), { C is -X }.
	number(X) --> [C], { "0"=<C, C=<"9", X is C - "0" }.
  

DCG Expansion
  
    ?- expr(Z, "-2+3*5+1", [])
    Z = 14

  DCG translation does this:

    p(X) --> q(X).

  becomes

    p(X, S0, S) :- q(X, S0, S).
  


More than one goal

    p(X, Y) -->
              q(X),
              r(X, Y),
              s(Y).
  
  becomes
  
    p(X, Y, S0, S) -->
              q(X, S0, S1),
              r(X, Y, S1, S2),
              s(Y, S2, S).
  


Terminals

  Inside a grammar rule body, a list indicates a terminal (``thing that
  must literally appear'')

  Recall: "thing" is a list of ASCII codes.

    p(X) --> [go,to], q(X), [stop].

  Is translated into 'C'/3

    p(X, S0, S) :-
             'C'(S0, go, S1),
             'C'(S1, to, S2),
             q(X, S2, S3),
             'C'(S3, stop, S).
  
  'C'/3 is predefined by the unit clause "'C'([XS],X,S)".



An Example

  
	parse( program(Decls,Stmt) ) -->
	    decls(Decls),
	    stmt_seq(Stmt).

	decls( [D|Decls] ) --> decl(D), decls(Decls).
	decls( [] )        --> [].

	decl( const(Ide,Val) ) --> [const, ide(Ide), =], expr(Val).
	decl( var(Ide,Type) )  --> [var,   ide(Ide), :], type(Type).

	type( integer ) --> [integer].
	type( boolean ) --> [boolean].

	stmt_seq( [S|Stmts] ) --> stmt(S), opt_stmt_seq(Stmts).

	opt_stmt_seq( [S|Stmts] ) --> [;], stmt(S), opt_stmt_seq(Stmts).
	opt_stmt_seq( [] )        --> [].

	stmt( assign(Ide,Expr) ) --> [ide(Ide), :=], expr(Expr).
	stmt( write(Expr) )      --> [write], expr(Expr).
	stmt( read(Ide) )        --> [read, '(', ide(Ide), ')'].
	stmt( while(Cond,S) )    --> [while], expr(Cond),
		                     [do], stmt(S).
	stmt( if(Cond,S1,S2) )   --> [if], expr(Cond),
			             [then], stmt(S1),
				     [else], stmt(S2).
	stmt( block(Stmt) )      --> [begin], stmt_seq(Stmt), [end].


	expr( E ) --> lexpr( L ), expr_(L, E).

	expr_( L, and(L,E) ) --> [and], expr(E).
	expr_( L, or(L,E) )  --> [or], expr(E).
	expr_( L, L )        --> [].

	lexpr( not(L) ) --> [not], lexpr(L).
	lexpr( B )      --> bexpr(B).

	bexpr( B ) --> aexpr(A), bexpr_(A, B).

	bexpr_( A, A>B ) --> [>], aexpr( B ).
	bexpr_( A, A<B ) --> [<], aexpr( B ).
	bexpr_( A, A=B ) --> [=], aexpr( B ).
	bexpr_( A, A )   --> [].

	aexpr( E ) --> term(T), aexpr_(T, E).

	aexpr_( T, T+E ) --> [+], aexpr(E).
	aexpr_( T1, E )  --> [-], term(T2), aexpr_( T1-T2, E ).
	aexpr_( T, T )   --> [].

	term( T ) --> factor(F), term_(F, T).

	term_( F, F*T ) --> [*], term(T).
	term_( F, F/T ) --> [/], term(T).
	term_( F, F )   --> [].

	factor( ide(Ide) ) --> [ide(Ide)].
	factor( int(N) )   --> [int(N)].
	factor( true )     --> [true].
	factor( false )    --> [false].
	factor( -F )       --> [-], factor(F).
	factor( E )        --> ['('], expr(E), [')'].