About Grammars

Start with an unstructured file (list) of characters.

  "program var a : integer; const max = 10; ..."

Describe lexical structure with finite state machine

	state 0:
	  if char is digit, goto state 1
	  if char is alphabetic, goto state 2
	  if char is symbol, goto state 3
	  if char is white space, goto state 0
	  else error

	state 1:
	  if char is digit, goto state 1
	  else goto state 0

	...

Or, we can use "linear" patterns (regular expressions)

	ident = alpha ( alpha | digit ) *
	number = digit digit*
	...

Either way, we can end up with list of tokens.

   [program, var, ide(a), :, integer, ';', const, ide(max), =, int(10),
    ';', ... ]


-- advantages of separate lexical pass
	: less irrelevant detail (white space, comments)
	: less complexity (reserved words embedded in strings)
	: easier to change (add reserved words)


Remaining program structure is non-linear (recursive).  Use context-free
syntax (e.g. forget about subtle type errors)

Expression:

	E ::= E + E
	    | E - E
	    | E * E
	    | E / E
	    | Ident
	    | Number
	    | ( E )

Structure needs to be unambiguous; e.g.

	7 - 5 - 1

should always be grouped as

	(7 - 5) - 1

not
	7 - (5 - 1)


Try again:

	E ::= E + T
	    | E - T
	    | T

	T ::= T * F
	    | T / F
	    | F

	F ::= Ident
	    | Number
	    | ( E )

Now, the problem (for Prolog) is that a naive translation will be
left-recursive (i.e. Prolog's search strategy tries for the
infinitely-long expression alternative first!)

Hack:
	E ::= T E'

	E' ::= + E
	     | - E
	     | <nothing>

	T ::= F T'

	T' ::= * T
	     | / T
	     | <nothing>

	F ::= <as before>

Actually, not quite right for minus.  Really need

	E' ::= ...
	     | - T E'
	     | ...