yacc(1)



       yacc  - Generates an LR(1) parsing program from input con-
       sisting of a context-free grammar specification


SYNOPSIS

       yacc [-vltds] [-b prefix]  [-N number]  [-p symbol_prefix]
       [-P pathname] grammar

       The  yacc command converts a context-free grammar specifi-
       cation into a set of tables for a  simple  automaton  that
       executes an LR(1) parsing algorithm.


STANDARDS

       Interfaces  documented  on  this reference page conform to
       industry standards as follows:

       yacc:  XPG4, XPG4-UNIX

       Refer to the standards(5) reference page for more informa-
       tion about industry standards and associated tags.


FLAGS

       Uses  prefix  instead  of  y  as the prefix for all output
       filenames (prefix.tab.c, prefix.tab.h, and prefix.output).
       Produces  the  y.tab.h  file,  which  contains the #define
       statements that associate the  yacc-assigned  token  codes
       with  your  token  names.   This allows source files other
       than y.tab.c to access the token codes by  including  this
       header  file.   Includes  no  #line constructs in y.tab.c.
       Use this only after the grammar and associated actions are
       fully debugged.  [Digital]  Provides yacc with extra stor-
       age for building its LALR tables, which may  be  necessary
       when  compiling  very large grammars.  Thenumber should be
       larger than 40,000 when you use this flag.  Allows  multi-
       ple yacc parsers to be linked together.  Use symbol_prefix
       instead of yy to prefix global symbols.  [Digital]  Speci-
       fies  an alternative parser (instead of /usr/ccs/lib/yacc-
       par).  The pathname specifies the filename of the skeleton
       to  be  used  in place of yaccpar).  [Digital]  Breaks the
       yyparse()  function  into   several   smaller   functions.
       Because  its  size is somewhat proportional to that of the
       grammar, it is possible for yyparse() to become too  large
       to  compile,  optimize,  or execute efficiently.  Compiles
       run-time debugging code.  By default,  this  code  is  not
       included  when  y.tab.c  is  compiled.   If  YYDEBUG has a
       nonzero value, the C compiler (cc) includes the  debugging
       code,  whether  or not the -t flag was used.  Without com-
       piling this code, yyparse() will run more  quickly.   Pro-
       duces   the  y.output  file,  which  contains  a  readable
       description of the parsing tables and  a  report  on  con-
       flicts generated by grammar ambiguities.


PARAMETERS

       yacc Input under the DESCRIPTION.


DESCRIPTION

       The  yacc  grammar  can be ambiguous; specified precedence
       rules are used to break ambiguities.

       You must compile the y.tab.c output file with a C language
       compiler to produce the yyparse() function.  This function
       must be loaded with a yylex lexical analyzer function,  as
       well  as  main()  and yyerror(), an error-handling routine
       (you must provide these routines).   The  lex  command  is
       useful for creating lexical analyzers usable by yacc.

       The  yacc  program reads its skeleton parser from the file
       /usr/ccs/lib/yaccpar.  Use the environment variable  YACC-
       PAR  to  specify  another location for the yacc program to
       read from.  If you use this environment variable,  the  -P
       option is ignored, if specified.

   Syntax for yacc Input
       This  section  contains  a  formal description of the yacc
       input file (or grammar file), which is normally named with
       a  .y  suffix.  The section provides a listing of the spe-
       cial values, macros, and functions recognized by yacc.

       The general format of the yacc input file  is:  [  defini-
       tions  ]  %%  [ rules ] [ %% [ user functions ] ] where Is
       the section where you define  the  variables  to  be  used
       later in the grammar, such as in the rules section.  It is
       also where files are included  (#include)  and  processing
       conditions are defined.  This section is optional.  Is the
       section that contains grammar rules  for  the  parser.   A
       yacc input file must have a rules section.  Is the section
       that contains user-supplied functions that can be used  by
       the  actions  in  the  rules  section.   This  section  is
       optional.

       The NULL character must not be used in  grammar  rules  or
       literals.  Each line in the definitions can be:

       %{  When  placed  on  lines by themselves, these enclose C
       code to be passed into the global definitions of the  out-
       put file.  Such lines commonly include preprocessor direc-
       tives and declarations of  external  variables  and  func-
       tions.  Lists tokens or terminal symbols to be used in the
       rest of the input file.  This line is  needed  for  tokens
       that do not appear in other % definitions. If type is pre-
       sent, the C type for all tokens on this line  is  declared
       to  be  the type referenced by type. If a positive integer
       number follows a token, that  value  is  assigned  to  the
       token.  Indicates that each token is an operator, that all
       tokens in this definition have equal precedence, and  that
       an operator, that all tokens in this definition have equal
       precedence, and that a succession of the operators  listed
       in this definition are evaluated right to left.  Indicates
       that each token is an operator,  and  that  the  operators
       listed  in  this  definition  cannot appear in succession.
       Indicates that the token  cannot  be  used  associatively.
       Indicates the highest-level production rule to be reduced;
       in other words, the rule where the parser can consider its
       work  done  and  terminate.   If  this  definition  is not
       included, the parser uses the first production rule.   The
       symbol  must  be non-terminal (not a token).  Defines each
       symbol as data type type, to resolve ambiguities. If  this
       construct is present, yacc performs type checking and oth-
       erwise assumes all symbols to be of type integer.  Defines
       the  yylval global variable as a union, where union-def is
       a standard C definition in the format: { type member  ;  [
       type member ; ... ] }

              At  least one member should be an int.  Any valid C
              data type can  be  defined,  including  structures.
              When  you  run yacc with the -d option, the defini-
              tion of yylval is placed in the  y.tab.h  file  and
              can be referred to in a lex input file.

       Every token (non-terminal symbol) must be listed in one of
       the preceding % definitions.  Multiple tokens can be sepa-
       rated  by white space or commas.  All the tokens in %left,
       %right, and %nonassoc definitions are  assigned  a  prece-
       dence  with  tokens in later definitions having precedence
       over those in earlier definitions.

       In addition to symbols, a token can be  literal  character
       enclosed in single quotes.  (Multibyte characters are rec-
       ognized by the lexical analyzer and returned  as  tokens.)
       The following special characters can be used, just as in C
       programs: Alert Newline Tab Vertical tab  Carriage  Return
       Backspace  Form  Feed Backslash Single Quote Question mark
       One or more octal digits specifying the integer  value  of
       the character

       The rules section consists of a series of production rules
       that the parser tries to reduce.  The format of each  pro-
       duction rule is:

symbol : symbol-sequence [ action ] [ | symbol-sequence [ action


] ... ] ;


       where symbol-sequence consists of  zero  or  more  symbols
       separated  by  white  space.  The first symbol must be the
       first character of the line, but newlines and other  white
       space  can appear anywhere else in the rule.  All terminal
       symbols must be declared in %token definitions.

       own rule.  Always use left-recursion (where the  recursive
       symbol  appears  before  the  terminating  case in symbol-
       sequence).

       The specific sequence: %prec token indicates that the cur-
       rent  sequence  of symbols is to be preferred over others,
       at the level of precedence assigned to token in the  defi-
       nitions section.

       The specially defined token error matches any unrecognized
       sequence of input.  This token causes the parser to invoke
       the  yyerror  function.   By  default, the parser tries to
       synchronize with the input and continue processing  it  by
       reading  and discarding all input up to the symbol follow-
       ing error.  (You can override this  behavior  through  the
       yyerrok  action.)   If  no error token appears in the yacc
       input file, the parser exits with an  error  message  upon
       encountering unrecognized input.

       The  parser  always executes action after encountering the
       symbol that precedes it.  Thus, an action  can  appear  in
       the  middle  of  a  symbol-sequence,  after  each  symbol-
       sequence, or after multiple instances of  symbol-sequence.
       In  the  last  case,  action  is  executed when the parser
       matches any of the sequences.

       The action consists of standard C code within  braces  and
       can  also  take  the following values, variables, and key-
       words.  If the token returned by  the  yylex  function  is
       associated  with  a  significant value, yylex should place
       the value in this global variable.  By default, yylval  is
       of  type  long.   The  definitions  section  can include a
       %union definition to  associate  with  other  data  types,
       including structures.  If you run yacc with the -d option,
       the full yylval definition is passed into the y.tab.h file
       for  access  by  lex  Causes  the  parser to start parsing
       tokens immediately after an erroneous sequence, instead of
       performing  the  default  action of reading and discarding
       tokens up to a synchronization token.  The yyerrok  action
       should  appear  immediately after the error token.  Refers
       to symbol n, a token index  in  the  production,  counting
       from the beginning of the production rule, where the first
       symbol after the colon is $1.  The type  variable  is  the
       name of one of the union lines listed in the %union direc-
       tive in the declaration section.  The <type> syntax  (non-
       standard)  allows  the value to be cast to a specific data
       type.  Note that you will rarely need to use the type syn-
       tax.   Refers to the value returned by the matched symbol-
       sequence and used for the  matched  symbol  when  reducing
       other  rules.   The  symbol-sequence  generally  assigns a
       value to $$.  The type variable is the name of one of  the
       union lines listed in the %union directive in the declara-
       will rarely need to use the type syntax.

       The user functions  section  contains  user-supplied  pro-
       grams.   If  you  supply a lexical analyzer (yylex) to the
       parser, it must be contained in the  user  functions  sec-
       tion.

       The  following  functions, which are contained in the user
       functions section, are invoked within the yyparse function
       generated by yacc.  The lexical analyzer called by yyparse
       to recognize each token of input.  Usually  this  function
       is  created by lex.  yylex reads input, recognizes expres-
       sions within the input, and returns a token number  repre-
       senting  the  kind of token read.  The function returns an
       int value.  A return value of 0 (zero) means  the  end  of
       input.

              If the parser and yylex do not agree on these token
              numbers, reliable communication between them cannot
              occur.  For  (one character) literals, the token is
              simply the numeric value of the  character  in  the
              current character set. The numbers for other tokens
              can either be chosen by yacc, or by  the  user.  In
              either  case, the #define construct of C is used to
              allow yylex () to  return  these  numbers  symboli-
              cally. The #define statements are put into the code
              file,  and  the  header  file  if  that   file   is
              requested.  The set of characters permitted by yacc
              in an identifier is larger than that  permitted  by
              C.  Token  names  found  to contain such characters
              will not be included in the #define declarations.

              If the token numbers are chosen by yacc, the tokens
              other  than  literals, are assigned numbers greater
              than 256, although no order is implied. A token can
              be  explicitly  assigned  a number by following its
              first appearance in the declaration section with  a
              number.  Names  and  literals  not defined this way
              retain their default definition. All assigned token
              numbers are unique and distinct from the token num-
              bers used for literals.  If duplicate token numbers
              cause  conflicts in parser generation, yacc reports
              an error; otherwise, it is unspecified whether  the
              token   assignment  is  accepted  or  an  error  is
              reported.

              The end of the input is marked by a  special  token
              called  the  endmarker that has a token number that
              is zero or negative.  All lexical analyzers  return
              zero  or  negative  as a token number upon reaching
              the end of their input. If the tokens  up  to,  but
              not  excluding, the endmarker form a structure that
              text, it is considered an error.  The function that
              the  parser calls upon encountering an input error.
              The default function,  defined  in  liby.a,  simply
              prints  string to the standard error.  The user can
              redefine the  function.   The  function's  type  is
              void.

       The  liby.a  library contains default main() and yyerror()
       functions.  These look like the  following,  respectively:
       main() {
            setlocale(LC_ALL, "");
            (void) yyparse();
            return(0); }

       int yyerror(s);
            char *s; {
            fprintf(stderr,"%s\n",s);
            return (0); }

       Comments,  in  C  syntax,  can appear anywhere in the user
       functions or definitions sections.  In the rules  section,
       comments  can  appear wherever a symbol is allowed.  Blank
       lines or lines consisting of white space can  be  inserted
       anywhere in the file, and are ignored.


ENVIRONMENT VARIABLES

       The  following  environment variables affect the execution
       of yacc: Provides a default value for  the  international-
       ization variables that are unset or null. If LANG is unset
       or null, the corresponding value from the  default  locale
       is used. If any of the internationalization variables con-
       tain an invalid setting, the utility behaves as if none of
       the  variables  had  been  defined.  If set to a non-empty
       string value, overrides the values of all the other inter-
       nationalization  variables.  Determines the locale for the
       interpretation of sequences of bytes of text data as char-
       acters  (for example, single-byte as opposed to multi-byte
       characters in arguments and input files).  Determines  the
       locale  for the format and contents of diagnostic messages
       written to standard error.   Determines  the  location  of
       message catalogues for the processing of LC_MESSAGES.


NOTES

       The  LANG  and  LC_* variables affect the execution of the
       yacc command as stated. The  main()  function  defined  by
       yacc  calls setlocale(LC_ALL, "") thus, the program gener-
       ated by yacc will also be  affected  by  the  contents  of
       these variables at runtime.


EXAMPLES

       This  section  describes  the example programs for the lex
       and yacc commands, which together  create  a  simple  desk
       program  also  allows  you  to  assign values to variables
       (each designated by a single lowercase ASCII letter),  and
       then  use  the  variables in calculations.  The files that
       contain the program are as follows: The lex  specification
       file  that  defines  the lexical analysis rules.  The yacc
       grammar file that defines the parsing rules and calls  the
       yylex() function created by lex to provide input.

       The  remaining  text expects that the current directory is
       the directory that contains the lex and yacc example  pro-
       gram files.

   Compiling the Example Program
       Perform  the following steps to create the example program
       using lex and yacc: Process the yacc  grammar  file  using
       the -d flag.  The -d flag tells yacc to create a file that
       defines the tokens it uses in addition to the  C  language
       source  code.  yacc -d calc.y The following files are cre-
       ated (the *.o  files  are  created  temporarily  and  then
       removed): The C language source file that yacc created for
       the parser.  A header file containing  #define  statements
       for the tokens used by the parser.  Process the lex speci-
       fication file: lex calc.l The following file  is  created:
       The  C language source file that lex created for the lexi-
       cal analyzer.  Compile and link the two C language  source
       files: cc -o calc y.tab.c lex.yy.c The following files are
       created: The object file for y.tab.c.  The object file for
       lex.yy.c.  The executable program file.

              You can then run the program directly by entering:

              calc

              Then  enter  numbers  and  operators  in calculator
              fashion.  After you  press  <Return>,  the  program
              displays  the  result  of  the  operation.   If you
              assign a value to a variable as follows, the cursor
              moves to the next line: m=4 <Return> _

              You  can  then use the variable in calculations and
              it will have the value assigned to it: m+5 <Return>
              9

   The Parser Source Code
       The  text  that  follows  shows  the  contents of the file
       calc.y.  This file has entries in all three  of  the  sec-
       tions  of  a  yacc grammar file:  declarations, rules, and
       programs.  %{ #include <stdio.h>

       int regs[26]; int base;

       %}

       %token DIGIT LETTER

       %left '|' %left '&' %left '+' '-' %left '*' '/' '%'  %left
       UMINUS /*supplies precedence for unary minus */

       %%   /*beginning of rules section */

       list :    /*empty     */         |    list    stat    '\n'
            |    list   error   '\n'             {    yyerrok;  }
            ;

       stat :    expr                  {    printf("%d\n",$1);  }
            |    LETTER '=' expr           {    regs[$1] = $3;  }
            ;

       expr :    '('    expr    ')'         {    $$    =   $2;  }
            |    expr '*' expr           {    $$ =  $1  *  $3;  }
            |    expr   '/'  expr       {    $$  =  $1  /  $3;  }
            |    expr '%' expr           {    $$ =  $1  %  $3;  }
            |    expr  '+'  expr            {    $$ = $1 + $3;  }
            |    expr '-' expr           {    $$ =  $1  -  $3;  }
            |    expr  '&'  expr            {    $$ = $1 & $3;  }
            |    expr '|' expr           {    $$ =  $1  |  $3;  }
            |    '-' expr %prec UMINUS           {    $$ = -$2; }
            |    LETTER              {    $$    =     regs[$1]; }
            |    number      ;

       number    :    DIGIT             {    $$   =  $1;  base  =
       ($1==0)     ?     8:10;   }           |    number    DIGIT
                 {    $$ = base * $1 + $2;   }      ;

       %% main() {      return(yyparse()); }

       yyerror(s) char *s; {      fprintf(stderr,"%s\n",s); }

       yywrap() {      return(1); }

   Declarations Section
       This  section  contains entries that perform the following
       functions: Includes standard  I/O  header  file.   Defines
       global  variables.   Defines the list rule as the place to
       start processing.  Defines the tokens used by the  parser.
       Defines the operators and their precedence.

   Rules Section
       The  rules  section defines the rules that parse the input
       stream.

   Programs Section
       The programs  section  contains  the  following  routines.
       Because  these  routines are included in this file, you do
       start the  program.   This  error  handling  routine  only
       prints  a  syntax error message.  The wrap-up routine that
       returns a value of 1 when the end of input occurs.

   The Lexical Analyzer Source Code
       This shows the contents of the  file  calc.l.   This  file
       contains include statements for standard input and output,
       as well as for the y.tab.h file.  The yacc program  gener-
       ates  that file from the yacc grammar file information, if
       you use the -d flag  with  the  yacc  command.   The  file
       y.tab.h  contains  definitions  for  the  tokens  that the
       parser program uses.  In  addition,  calc.l  contains  the
       rules  used  to generate the tokens from the input stream.
       %{

       #include <stdio.h> #include "y.tab.h" int c; #if  !defined
       (YYSTYPE)  #define YYSTYPE long #endif extern YYSTYPE yyl-
       val; %} %% " "  ;  [a-z]     {            c  =  yytext[0];
                 yylval  =  c  -  'a';            return(LETTER);
            } [0-9]     {           c = yytext[0];           yyl-
       val  =  c  -  '0';            return(DIGIT);       } [^a-z
       0-9]     {           c =  yytext[0];            return(c);
                 }


FILES

       A  readable  description of parsing tables and a report on
       conflicts generated by grammar ambiguities.  Output  file.
       Definitions  for  token names.  Temporary file.  Temporary
       file.  Temporary file.  Default skeleton parser for C pro-
       grams.  yacc library.


EXIT VALUES

       The following exit values are returned:

       Successful completion An error occurred


RELATED INFORMATION

       Commands:  lex(1)

       Standards:  standards(5)

       Documents: Programming Support Tools

       delim off