Functional parsers

Functional parsers

Purely functional parsers are an alternative of stream parsers where the used stream type is a lazy non-destructive type. These streams are lazy values, as in classical stream parsers, but the values are not removed as long as the parsing advances.

To make them work, the parsers of purely functional streams return, not the simple values, but a value of type option : “None” meaning “no match” (the equivalent of the exception “Parse.Failure” of normal streams) and “Some (r, s)” meaning “the result is r and the remaining stream is s”.

Syntax

The syntax of purely functional parsers, when loading “pa_fstream.cmo”, is the following:

        expression ::= fparser
                     | match-with-fparser
           fparser ::= "fparser" pos-opt "[" parser-cases "]"
                     | "fparser" pos-opt parser-case
match-with-fparser ::= "match" expression "with" fparser
      parser-cases ::= parser-cases parser-case
                     | <nothing>
       parser-case ::= "[:" stream-pattern ":]" pos-opt "->" expression
                     | "[:" ":]" pos-opt "->" expression
    stream-pattern ::= stream-patt-comp
                     | stream-patt-comp ";" stream-pattern
  stream-patt-comp ::= "`" pattern
                     | "`" pattern "when" expression
                     | pattern "=" expression
                     | pattern
                     | "when" expression
           pos-opt ::= pattern
                     | <nothing>

Notice that, unlike classical parsers, there is no difference, in a stream pattern, between the first stream pattern component and the other ones. In particular, there is no “question mark” syntax and expression optionnally ending those components. Moreover, the “lookahead” case is not necessary, we see further why. The syntaxes “pattern when” and “let..in” inside stream patterns we see in classical parsers are not implemented.

Streams

The functional parsers are functions taking as parameters functional streams, which are values of type “Fstream.t   a” for some type “a”. It is possible to build functional streams using the functions defined in the module “Fstream”:

Fstream.from

Fstream.from f” returns a stream built from the function “f”. To create a new stream element, the function “f” is called with the current stream count, starting with zero. The user function “f” must return either “Some <value>” for a value or “None” to specify the end of the stream.

Fstream.of_list

Return a stream built from the list in the same order.

Fstream.of_string

Return a stream of the characters of the string parameter.

Fstream.of_channel

Return a stream of the characters read from the input channel parameter.

Semantics of parsers

Fparser

The purely functional parsers act like classical parsers, with a recursive descent algorithm, except that:

  • If the first stream pattern component matches the beginning of the stream, there is no error if the following stream patterns components do not match: the control simply passes to the next parser case with the initial stream.
  • If the semantic actions are of type “t”, the result of the parser is of type “option (t * Fstream.t)”, not just “t” like in classical parsers. If a stream pattern matches, the semantic action is evaluated, giving some result “e” and the result of the parser is “Some (e, strm)” where “strm” is the remaining stream.
  • If no parser case matches, the result of the parser is “None”.

Error position

A difficulty, with purely functional parsers, is how to find the position of the syntax error, when the input is wrong. Since the system tries all parsers cases before returning “None”, and that the initial stream is not affected, it is not possible to directly find where the error happened. This is a problem for parsing using backtracking (here, it is limited backtracking, but the problem is the same).

The solution is to use the function “Fstream.count_unfrozen” applied to the initial stream. Like its name says, it returns the number of unfrozen elements of the stream, which is exactly the longest match found. If the input is a stream of characters, the return of this function is exactly the position in number of characters from the beginning of the stream.

However, it is not possible to know directly which rule failed and therefore it is not possible, as in classical parsers, to specify and get clear error messages. Future versions of purely functional parsers may propose solutions to resolve this problem.

Notice that, if using the “count_unfrozen” method, it is not possible to reuse that same stream to call another parser, and hope to get the right position of the error, if another error happens, since it may test less terminals than the first parser. Use a fresh stream in this case, if possible.