...making Linux just a little more fun!

<-- prev | next -->

Lemon Parser Generator Tutorial

By Mike Chirico

Lemon is a compact, thread safe, well-tested parser generator written by Dr. Richard Hipp. Using a parser generator, along with a scanner like flex, can be advantageous because there is less code to write. You just write the grammar for the parser.

Example 1: Getting started

Compare this to writing all of the raw code from scratch. For instance, compare the basic C++ desktop calculator to the file below. Below is a syntax file which contains the grammar that the parser uses. "example1.y" is an example syntax file for a very basic calculator.

example1.y


    1  %token_type {int}
    2
    3  %left PLUS MINUS.
    4  %left DIVIDE TIMES.
    5
    6  %include {
    7  #include <iostream>
    8  #include "example1.h"
    9  }
    10
    11  %syntax_error {
    12    std::cout << "Syntax error!" << std::endl;
    13  }
    14
    15  program ::= expr(A).   { std::cout << "Result=" << A << std::endl; }
    16
    17  expr(A) ::= expr(B) MINUS  expr(C).   { A = B - C; }
    18  expr(A) ::= expr(B) PLUS  expr(C).   { A = B + C; }
    19  expr(A) ::= expr(B) TIMES  expr(C).   { A = B * C; }
    20  expr(A) ::= expr(B) DIVIDE expr(C).  {
    
    
    21           if(C != 0){
    22             A = B / C;
    23            }else{
    24             std::cout << "divide by zero" << std::endl;
    25             }
    26  }  /* end of DIVIDE */
    
    
    27  expr(A) ::= INTEGER(B). { A = B; }

As you can see, this file is only 27 lines of code, not counting spaces. It is much easer to modify the grammar than it is to rewrite larger sections of raw code.

The parser generator (lemon.c and lempar.c) takes the input from the syntax file "example1.y" and creates the parser file "example1.c", along with two other files "example1.h", which contains definitions for the tokens, and "example1.out", which gives a detailed listing of the shift reduce states for the grammar listed in "example1.y".

Let's run through the steps, starting first with compiling the source code of lemon (available here). The first step is to compile lemon.c:


    $ gcc -o lemon lemon.c

Now we have our parser generator, lemon, so run the syntax file "example1.y" through it.


    $ ./lemon example1.y

This will create example1.c, example1.h, and example1.out. What about lempar.c? Compare "example1.c" with "lempar.c", and you will see that it contains a lot of the same code. "lempar.c" is a template file. You can modify the code if you want, and all modifications will be passed to "example1.c" (including any comments).

But "example1.c" is not complete. We'll append the contents of the file "main_part", which contains a main function and tokens. "main_part" is called a driver.

main_part


    1  int main()
    2  {
    3    void* pParser = ParseAlloc (malloc);
    
    4    /* First input:
    5        15 / 5
    6                                  */
    7    Parse (pParser, INTEGER, 15);
    8    Parse (pParser, DIVIDE, 0);
    9    Parse (pParser, INTEGER, 5);
    10    Parse (pParser, 0, 0);
    
    11    /*  Second input:
    12          50 + 125
    13                                 */
    
    14    Parse (pParser, INTEGER, 50);
    15    Parse (pParser, PLUS, 0);
    16    Parse (pParser, INTEGER, 125);
    17    Parse (pParser, 0, 0);
    
    18    /*  Third input:
    19          50 * 125 + 125
    20                                 */
    
    
    
    21    Parse (pParser, INTEGER, 50);
    22    Parse (pParser, TIMES, 0);
    23    Parse (pParser, INTEGER, 125);
    24    Parse (pParser, PLUS, 0);
    25    Parse (pParser, INTEGER, 125);
    26    Parse (pParser, 0, 0);
    
    27    ParseFree(pParser, free );
    
    28  }

So, what is main_part doing? Well, line 3 initializes the parser. You'll note that pParser is passed to each call of the "Parse" function starting at line 7. The expression 15/5, or 15 DIVIDE 5, is performed in lines 7 through 10, sending first the INTEGER 15, then the identifier DIVIDE, which doesn't need a value, so 0 is chosen as the third parameter on line 8. Finally, line 10, with 0 as the second parameter in "Parse(pParser, 0, 0);" signals the end of the input for this expression. (Please note that in "example4.y", the grammar will handle this with a NEWLINE, and "Parse(pParser,0,...);" will only be called at the very end of the syntax file.)

"main_part" is appended to "example1.c". You may want to reference the Makefile with the downloadable example, which has this step:


    $ cat main_part >> example1.c

Next, just compile example1.c, and it's good to go.


    $ g++ -o ex1 example1.c

Now execute "ex1", and we'll see that the result of 15/5 is, of course, 3. And 50+125 is equal to 175, and 50*125+125 is indeed equal to (50*125)+125= 6375. This last result verifies that TIMES has higher precedence than PLUS.


    $ ./ex1
    Result=3
    Result=175
    Result=6375

Now for a closer look at the syntax file (example1.y). Why does TIMES have higher precedence than PLUS? Line 3 and line 4 determine the precedence of the operators PLUS, MINUS, DIVIDE, and TIMES.


    3  %left PLUS MINUS.
    4  %left DIVIDE TIMES.

Lines at the top have a lower operator precedence. This is very important to note. PLUS and MINUS have less operator precedence than DIVIDE and TIMES because PLUS and MINUS are on line 3, whereas DIVIDE and TIMES are on line 4. If, for example, exponentiation (EXP) were added, since EXP has even higher operator precedence than TIMES and DIVIDE, it would be added below line 4.

What if you wanted real numbers in the input, 15.5,5.2 instead of integers? How would you do that? It's easy. These tokens are currently integers because of the following line in "example1.y":


    1  %token_type {int}

So to accommodate a double, line 1 would be changed to:


    %token_type {double}

Moving further down the lines of "example1.y", there is an "include" directive on line 6. The include statement in "example1.y" passes along any C statements, which are inserted at the beginning of the parse file "example1.c". Again, the contents are inserted into the beginning of "example1.c", which is necessary for declarations and headers.


    ...
    6  %include {
    7  #include <iostream>
    8  #include "example1.h"
    9  }
    ...

Note that "example1.h" is generated from "$ ./lemon example1.y". It defines the tokens, or, put another way, assigns integer values to the token names starting at 1. Why start at 1, and not 0? Because 0 is reserved for the Parse function. Remember, "Parse (pParser, 0, 0);", with the second parameter set to zero, signals an end to the input.

example1.h (note, this is a generated file; do not add code to it):


    #define PLUS                            1
    #define MINUS                           2
    #define DIVIDE                          3
    #define TIMES                           4
    #define INTEGER                         5

Example 2: Creating a custom token type, or structure

"example2.y" differs from "example1.y" by defining the token type as a structure. Specifically, this token type is defined in the file "ex2def.h". Defining our own structure can give us flexibility in the semantic action, or the piece of code on the right of the production rule. Here is an example:


    expr(A) ::= expr(B) TIMES  expr(C).   { A.value = B.value * C.value;
    A.n = B.n+1  + C.n+1;
    }

The token_type in "example2.y" is defined as Token in line 6.


    6  %token_type {Token}

This structure Token is defined in "ex2def.h" as follows:

ex2def.h


    struct Token {
    const char *z;
    int value;
    unsigned n;
    };
    

Special note: "const char *z" is not used in these examples, but I've kept it in this structure, since it's the next logical step in a calculator, assigning an expression to a variable. For instance, variable1=2+5, where variable1 would be some value in a symbol table. See this reference.

Again, note the change in the include directive, the addition of "ex2def.h", which defines struct Token.

example2.y


    1  #include {
    2  #include <iostream>
    3  #include "ex2def.h"
    4  #include "example2.h"
    5  }
    
    6  %token_type {Token}
    7  %default_type {Token}
    
    8  %type expr {Token}
    9  %type NUM {Token}
    10
    11  %left PLUS MINUS.
    12  %left DIVIDE TIMES.
    13
    14
    15  %syntax_error {
    16    std::cout << "Syntax error!" << std::endl;
    17  }
    18
    19  program ::= expr(A).   {
    20                          std::cout << "Result.value=" << A.value << std::endl;
    21                          std::cout << "Result.n=" << A.n << std::endl;
    
    22                           }
    
    23  expr(A) ::= expr(B) MINUS  expr(C).   { A.value = B.value - C.value;
    24                                         A.n = B.n+1  + C.n+1;
    25                                        }
    
    
    26  expr(A) ::= expr(B) PLUS  expr(C).   { A.value = B.value + C.value;
    27                                         A.n = B.n+1  + C.n+1;
    28                                        }
    
    29  expr(A) ::= expr(B) TIMES  expr(C).   { A.value = B.value * C.value;
    30                                          A.n = B.n+1  + C.n+1;
    
    31                                           }
    32  expr(A) ::= expr(B) DIVIDE expr(C).  {
    
    
    33           if(C.value != 0){
    34             A.value = B.value / C.value;
    35             A.n = B.n+1 + C.n+1;
    36            }else{
    37             std::cout << "divide by zero" << std::endl;
    38             }
    39  }  /* end of DIVIDE */
    40  expr(A) ::= NUM(B). { A.value = B.value; A.n = B.n+1; }

As you can see below, taking a close look at lines 23 through 25, the Token structure A now takes on members "A.value" and "A.n", with ".value" taking on the value of the expression and ".n" the number of times an assignment is made:


    23  expr(A) ::= expr(B) MINUS  expr(C).   { A.value = B.value - C.value;
    24                                         A.n = B.n+1  + C.n+1;
    25                                        }

This is a quick way to see the "shift" and "reduce" dynamically. A "shift" is the number of times a token is pushed on the stack. A "reduce" is the number of times an expression rule has been matched. Once it's matched, it can be reduced. As you will recall, when lemon is run, three files are normally created: *.c, *.h, and *.out. This ".out" file contains each step of the grammar, along with the shift and reduce states. If you want a simple summary, run lemon with the "-s" option:


    $ ./lemon -s example2.y
    Parser statistics: 6 terminals, 3 nonterminals, 6 rules
    11 states, 0 parser table entries, 0 conflicts

Again, as in the previous example, "main_part2", the driver, is appended to "example2.c":


    $ cat main_part2 >> example2.c

Now "example2.c" can be compiled and executed:


    $ g++ -o ex2  example2.c
    
    $ ./ex2
    Result.value=17
    Result.n=4
    Result.value=-9
    Result.n=4
    Result.value=78
    Result.n=10

Example 3: Working with the token destructor

One advantage of lemon over bison is the ability to free memory used by a non-terminal. You can call the function of your choice. "expr" is an example of a non-terminal. When the program is done with the non-terminal, the function defined by token_destructor is called.

example3.y


    1  %include {
    2  #include <iostream>
    3  #include "ex3def.h"
    4  #include "example3.h"
    
    
    5    void token_destructor(Token t)
    6      {
    7        std::cout << "In token_destructor t.value= " << t.value << std::endl;
    8        std::cout << "In token_destructor t.n= " << t.n << std::endl;
    9      }
    
    
    10  }
    
    
    11  %token_type {Token}
    12  %default_type {Token}
    13  %token_destructor { token_destructor($$); }
    ...

In line 13, token_destructor is the function "token_destructor($$);". The function "token_destructor" is defined in lines 5 through 9. For this simple example, no memory is allocated, so there is no need to call free. Instead, to see what is happening, output will be written to std::cout.

After the program is compiled, it can be executed as follows. Note that I have added line numbers to the output of "ex3" for easy reference.


    $ ./ex3
    1  t0.value=4  PLUS t1.value=13
    2  In token_destructor t.value= 4
    3  In token_destructor t.n= 0
    4  Result.value=17
    5  Result.n=4
    6  parsing complete!
    ...

After the expression has been reduced, the destructor is called, but it is only called for the token.value=4. Why? For an answer we will have to take a look at "main_part3".

main_part3


    1  int main()
    2  {
    3    void* pParser = ParseAlloc (malloc);
    
    4    struct Token t0,t1;
    5    struct Token mToken;
    
    6    t0.value=4;
    7    t0.n=0;
    
    8    t1.value=13;
    9    t1.n=0;
    
    10    std::cout << " t0.value=4  PLUS t1.value=13 " << std::endl;
    
    11    Parse (pParser, NUM, t0);
    12    Parse (pParser, PLUS, t0);
    13    Parse (pParser, NUM, t1);
    14    Parse (pParser, 0, t0);
    
    15    std::cout << " t0.value=4  DIVIDE t1.value=13 " << std::endl;
    
    16    Parse (pParser, NUM, t0);
    17    Parse (pParser, DIVIDE, t0);
    18    Parse (pParser, NUM, t1);
    19    Parse (pParser, 0, t1);
    ...

Line 14 terminates the grammar with t0 as the third parameter. That third parameter is passed as "$$" to the defined destructor function, "token_destructor(...". When calling "Parse" a second time immediately, it is undefined, so you should only call the destructor function once after you're done passing tokens to complete an expression. In other words, you would never call "Parse (pParser, 0, t0);", immediately followed by another "Parse (pParser, 0, t0);".

In line 19, token_destructor is called for t1.value= 13. If you look at "main_part3", line 19, you'll see that Parse is called with t1 as the third parameter and 0 and the second parameter.

Continuation of the output from the program:


    7
    8
    9   t1.value=13  PLUS  t0.value=4
    10   In token_destructor t.value= 13
    11   In token_destructor t.n= 0
    12   Result.value=17
    13   Result.n=4
    14   parsing complete!

So t0 is called at the third parameter position in line 14 and t1 is called in line 19. This shouldn't be a problem. One variable could hold the value of the tokens. For instance, main_part3 could have had Token t0 used for both the values 4 and 14 as follows:


    ...
    struct Token t0;
    
    t0.value=4;
    t0.n=0;
    
    Parse (pParser, NUM, t0);
    Parse (pParser, PLUS, t0);
    
    t0.value=13;
    t0.n=0;
    
    Parse (pParser, NUM, t0);
    Parse (pParser, 0, t0);
    ...
    

Example 4: Ending the grammar with a NEWLINE

Notice that in the last three examples, Parse(pParse,0.. had to be called to signal the end of the input for an expression. This is awkward. Instead, the grammar should dictate when the expression can no longer be reduced.

"example4.y" contains the following lines:

example4.y


    1  %include {
    2  #include <iostream>
    3  #include "ex4def.h"
    4  #include "example4.h"
    
    ...
    
    23
    24  %syntax_error {
    25    std::cout << "Syntax error!" << std::endl;
    26  }
    27
    28  /*  This is to terminate with a new line */
    29  main ::= in.
    30  in ::= .
    31  in ::= in state NEWLINE.
    
    
    32  state ::= expr(A).   {
    33                          std::cout << "Result.value=" << A.value << std::end
    34                          std::cout << "Result.n=" << A.n << std::endl;
    
    
    35                           }
    
    
    
    36  expr(A) ::= expr(B) MINUS  expr(C).   { A.value = B.value - C.value;
    37                                         A.n = B.n+1  + C.n+1;
    38                                        }
    
    ...

Note lines 29 through 35. "main" and "in" must be defined (lines 29-31). If you're a Bison user, you could get away without having to define the non-terminal main, but lemon currently requires it.

With this change made to the grammar in "example4.y", "main_part4" can now terminate each expression by passing the token NEWLINE.

Here is a section of main_part4:

main_part4


    1  int main()
    2  {
    3    void* pParser = ParseAlloc (malloc);
    
    4    struct Token t0,t1;
    5    struct Token mToken;
    
    6    t0.value=4;
    7    t0.n=0;
    
    8    t1.value=13;
    9    t1.n=0;
    
    10    std::cout << std::endl <<" t0.value=4  PLUS t1.value=13 " << std::endl << std::endl;
    
    11    Parse (pParser, NUM, t0);
    12    Parse (pParser, PLUS, t0);
    13    Parse (pParser, NUM, t1);
    14    Parse (pParser, NEWLINE, t1);
    
    
    15    std::cout << std::endl <<" t0.value=4  TIMES t1.value=13 " << std::endl << std::endl;

Note that line 14 is passing the token NEWLINE and checking "example4.h". NEWLINE in this case is defined as an integer, 6.

So, looking at the output of "ex4", with line numbers added for clarification, we get the following:


    $ ./ex4
    
    1  t0.value=4  PLUS t1.value=13
    2
    3  In token_destructor t.value= 4
    4  In token_destructor t.n= 0
    5  Result.value=17
    6  Result.n=4
    7
    8   t0.value=4  TIMES t1.value=13
    9
    10  In token_destructor t.value= 4
    11  In token_destructor t.n= 0
    12  Result.value=52
    13  Result.n=4
    14  parsing complete!

We get the result on line 5, and there was no need to call Parse (pParser, 0, t0);. Instead, Parse( pParse, NEWLINE, t0) worked.

Example 5: Using flex for the tokenizer

The next example takes input directly from the terminal, and flex will create a scanner for finding the appropriate tokens.

First, a quick look at the flex program "lexer.l", again with line numbers added for clarification:

lexer.l


    1  %{
    2  #include "lexglobal.h"
    3  #include "example5.h"
    4  #include <string.h>
    5  #include <math.h>
    
    6  int line = 1, col = 1;
    
    7  %}
    8  %%
    
    9  [0-9]+|[0-9]*\.[0-9]+    {                      col += (int) strlen(yytext);
    10                                                  yylval.dval = atof(yytext);
    11                                                  return NUM; }
    12  [ \t]   { col += (int) strlen(yytext); }               /* ignore but count white space */
    13  [A-Za-z][A-Za-z0-9]*                           { /* ignore but needed for variables */
    
    14                                                  return 0;
    15                                                 }
    
    16  "+"           {  return PLUS; }
    17  "-"           {  return MINUS; }
    18  "*"           {  return TIMES; }
    19  "/"           {  return DIVIDE; }
    
    20  \n      { col = 0; ++line; return NEWLINE; }
    
    21  .       { col += (int) strlen(yytext); return yytext[0]; }
    22  %%
    23  /**
    24   * reset the line and column count
    25   *
    26   *
    27   */
    28  void reset_lexer(void)
    29  {
    
    30    line = 1;
    31    col  = 1;
    
    32  }
    
    33  /**
    34   * yyerror() is invoked when the lexer or the parser encounter
    35   * an error. The error message is passed via *s
    36   *
    37   *
    38   */
    39  void yyerror(char *s)
    40  {
    41    printf("error: %s at line: %d col: %d\n",s,line,col);
    
    42  }
    
    43  int yywrap(void)
    44  {
    45    return 1;
    46  }
    

The format for flex is basically a rule on the left side followed by C code to execute on the right side. Take line 9, "[0-9]+|[0-9]*\.[0-9]+", which will match any of 3, .3, 0.3, and 23.4 and will return NUM. What's the value of NUM? It's taken from line 3, which includes the file "example5.h", generated from the lemon parser. On Line 10, yylval.dval is assigned the value of "yytext" after it's converted to a float. The structure of yylval is defined in "lexglobal.h" on line 2.

"lexglobal.h" with line numbers added:

lexglobal.h


    1  #ifndef YYSTYPE
    2  typedef union {
    3    double    dval;
    4    struct symtab *symp;
    5  } yystype;
    6  # define YYSTYPE yystype
    7  # define YYSTYPE_IS_TRIVIAL 1
    8  #endif
    
    9  /* extern YYSTYPE yylval; */
    10  YYSTYPE yylval;
    

yystype is the union of dval and symtab. Again, symtab is not used in these examples, but should you move to a calculator with variables that can be assigned, you'd use this. See Reference 3 for a full calculator example with flex and bison.

Again looking at lines 9 through 11 in lexer.l;


    ...
    9  [0-9]+|[0-9]*\.[0-9]+    {                      col += (int) strlen(yytext);
    10                                                  yylval.dval = atof(yytext);
    11                                                  return NUM; }
    ...

Both the type of token, NUM, and its value must be passed along. We need to know it's a number, but we also need to know the value of the number.

Unlike what we need with PLUS, MINUS, TIME, and DIVIDE, we only need to know the particular identifier has been found. Therefore, in lexer.l, lines 16 through 19 only return the token value.


    16  "+"           {  return PLUS; }
    17  "-"           {  return MINUS; }
    18  "*"           {  return TIMES; }
    19  "/"           {  return DIVIDE; }
    
    20  \n      { col = 0; ++line; return NEWLINE; }
    
    21  .       { col += (int) strlen(yytext); return yytext[0]; }
    22  %%

Line 20 will match on a NEWLINE. Although not used, line numbers keep track of the variable "line" and col is used to track the number of columns. This is a good idea; it is helpful when debugging.

The driver, main_part5, contains a lot more code. The low level read statement is used on stdin. This could easily be changed to accept input coming in on a socket descriptor, so if you had a Web scraping program that scans input from a TCP socket, the socket descriptor would replace "fileno(stdin)" on line 33.

main_part5


    
    1  #include <stdio.h>
    2  #include <unistd.h>
    3  #include <sys/types.h>
    4  #include <sys/stat.h>
    5  #include <fcntl.h>
    6  #include <stdlib.h>
    
    7  #define BUFS 1024
    
    8  /**
    9   * We have to declare these here - they're not  in any header files
    10   * we can include.  yyparse() is declared with an empty argument list
    11   * so that it is compatible with the generated C code from bison.
    12   *
    13   */
    
    14  extern FILE *yyin;
    15  typedef struct yy_buffer_state *YY_BUFFER_STATE;
    
    16  extern "C" {
    17    int             yylex( void );
    18    YY_BUFFER_STATE yy_scan_string( const char * );
    19    void            yy_delete_buffer( YY_BUFFER_STATE );
    20  }
    
    21  int main(int argc,char** argv)
    22  {
    23    int n;
    24    int yv;
    25    char buf[BUFS+1];
    26    void* pParser = ParseAlloc (malloc);
    
    27    struct Token t0,t1;
    28    struct Token mToken;
    
    29    t0.n=0;
    30    t0.value=0;
    
    31    std::cout << "Enter an expression like 3+5 <return>" << std::endl;
    32    std::cout << "  Terminate with ^D" << std::endl;
    
    33    while ( ( n=read(fileno(stdin), buf, BUFS )) >  0)
    34      {
    35        buf[n]='\0';
    36        yy_scan_string(buf);
    37        // on EOF yylex will return 0
    38        while( (yv=yylex()) != 0)
    39          {
    40            std::cout << " yylex() " << yv << " yylval.dval " << yylval.dval << std::endl;
    41            t0.value=yylval.dval;
    42            Parse (pParser, yv, t0);
    43          }
    
    44      }
    
    45    Parse (pParser, 0, t0);
    46    ParseFree(pParser, free );
    
    47  }

Line 16, 'extern "C"', is necessary because "lexer.l" was run through flex to create C code, as opposed to C++ code:


    $ flex lexer.l

See the flex manual, Reference 7. Yes, "flex++" will output C++ code. However, for complex scanning, C code may be faster. "main_part5", which is compiled as a C++ program, makes the transition smoothly.

The parser should always terminate input with 0 in the second parameter to "Parse(pParser,0,..". When there is no more input coming into flex, it will return a zero, so the while loop below on line 38 terminates with a zero. Then the read statement, line 33, looks for more input. This is something you would want to do when reading from a socket, since it may have been delayed.

But if the initial read (line 33 for the first time) isn't successful, flex has no chance of returning a zero. Therefore, line 45 has a zero as the second parameter.


    ...
    33    while ( ( n=read(fileno(stdin), buf, BUFS )) >  0)
    
    ...
    38        while( (yv=yylex()) != 0)
    39          {
    40            std::cout << " yylex() " << yv << " yylval.dval " << yylval.dval << std::endl;
    41            t0.value=yylval.dval;
    42            Parse (pParser, yv, t0);
    43          }
    ...
    45    Parse (pParser, 0, t0);
    46    ParseFree(pParser, free );

Summary

lemon is fast, completely in the public domain, well tested in SQLite, and thread safe. Parser generators can help developers write reusable code for complex tasks in a fraction of the time they would need for writing the complete program from scratch. The syntax file, the file that holds the grammar, can be modified to suit multiple needs.

Although I have had no problems with lemon.c, there are a few compiler warnings regarding signed and unsigned integers when compiling it with the -Wall -W flags:


    [chirico@third-fl-71 lemon_examples]$ gcc -Wall -W -O2 -s -pipe lemon.c
    lemon.c: In function `resolve_conflict':
    lemon.c:973: warning: unused parameter `errsym'
    lemon.c: In function `main':
    lemon.c:1342: warning: unused parameter `argc'
    lemon.c: At top level:
    lemon.c:2308: warning: return type defaults to `int'
    lemon.c: In function `preprocess_input':
    lemon.c:2334: warning: comparison between signed and unsigned
    lemon.c:2352: warning: control reaches end of non-void function
    lemon.c:2311: warning: `start' might be used uninitialized in this function
    lemon.c:2313: warning: `start_lineno' might be used uninitialized in this function
    lemon.c: In function `Parse':
    lemon.c:2393: warning: comparison between signed and unsigned
    lemon.c: In function `tplt_open':
    lemon.c:2904: warning: implicit declaration of function `access'
    lemon.c: In function `append_str':
    lemon.c:3019: warning: comparison between signed and unsigned
    lemon.c:3011: warning: unused variable `i'
    lemon.c: In function `translate_code':
    lemon.c:3109: warning: control reaches end of non-void function

This can be an inconvenience when adding the parse.c file to existing code. A fix is on the way. Since I expect the changes to be cleaned up soon, this version of lemon.c is the same version that you'd get from the author's site, which will make it easier to apply the patch.

There are times when a parser like lemon or bison may be a little too much. These are powerful tools. An interesting alternative, if you're a C++ programmer and you only need to do inline parsing, is the spirit library. See Reference 9.

Examples for this article

The complete source for these examples, including the parser itself, can be downloaded here.

References

  1. An example desktop calculator from scratch
  2. An example of a flex and bison parser
  3. The home of the lemon parser generator
  4. The home of SQLite
  5. A glossary of parser terms
  6. A good introduction to parsers
  7. The GNU flex manual
  8. The GNU bison manual
  9. The spirit parser
  10. Getting a C++ Bison parser to use a C Flex lexer
  11. The Lex-YACC-HOWTO

 


[BIO] Mike Chirico, a father of triplets (all girls) lives outside of Philadelphia, PA, USA. He has worked with Linux since 1996, has a Masters in Computer Science and Mathematics from Villanova University, and has worked in computer-related jobs from Wall Street to the University of Pennsylvania. His hero is Paul Erdos, a brilliant number theorist who was known for his open collaboration with others.
Mike's notes page is souptonuts.

Copyright © 2004, Mike Chirico. Released under the Open Publication license

Published in Issue 106 of Linux Gazette, September 2004

<-- prev | next -->
Tux