Extensions of printing¶
Extensions of printing
Camlp5 provides extensions kits to pretty print programs in revised syntax and normal syntax. Some other extensions kits also allow to rebuild the parsers, or the EXTEND statements in their initial syntax. The pretty print system is itself extensible, by adding new rules. We present here how it works in the Camlp5 sources.
The pretty print system of Camlp5 uses the library modules Pretty, an original system to format output) and Extfun, another original system of extensible functions.
This chapter is designed for programmers that want to understand how the pretty printing of OCaml programs work in Camlp5, want to adapt, modify or debug it, or want to add their own pretty printing extensions.
Introduction
The files doing the pretty printings are located in Camlp5 sources in the directory “etc”. Peruse them if you are interested in creating new ones. The main ones are:
- “etc/pr_r.ml”: pretty print in revised syntax.
- “etc/pr_o.ml”: pretty print in normal syntax.
- “etc/pr_rp.ml”: rebuilding parsers in their original revised syntax.
- “etc/pr_op.ml”: rebuilding parsers in their original normal syntax.
- “etc/pr_extend.ml”: rebuilding EXTEND in its original syntax.
We present here how this system works inside these files. First, the general principles. Second, more details of the implementation.
Principles
Using module Pretty
All functions in OCaml pretty printing take a parameter named “the
printing context” (variable pc
). It is a record holding :
- The current indendation :
pc.ind
- What should be printed before, in the same line :
pc.bef
- What should be printed after, in the same line :
pc.aft
- The dangling token, useful in normal syntax to know whether
parentheses are necessary :
pc.dang
A typical pretty printing function calls the function
horiz_vertic
of the library module Pretty. This
function takes two functions as parameter:
- The way to print the data in one only line (horizontal printing)
- The way to print the data in two or more lines (vertical printing)
Both functions catenate the strings by using the function sprintf
of the library module Pretty
which controls whether the printed
data holds in the line or not. They generally call, recursively,
other pretty printing functions with the same behaviour.
Let us see an example (fictitious) of printing an OCaml application.
Let us suppose we have an application expression “e1 e2
” to
pretty print where e1
and e2
are sub-expressions. If both
expressions and their application holds on one only line, we want to
see:
e1 e2
On the other hand, if they do not hold on one only line, we want to
see e2
in another line with, say, an indendation of 2 spaces:
e1
e2
Here is a possible implementation. The function has been named
expr_app
and can call the function expr
to print the
sub-expressions e1
and e2
:
value expr_app pc e1 e2 =
horiz_vertic
(fun () ->
let s1 = expr {(pc) with aft = ""} e1 in
let s2 = expr {(pc) with bef = ""} e2 in
sprintf "%s %s" s1 s2)
(fun () ->
let s1 = expr {(pc) with aft = ""} e1 in
let s2 =
expr
{(pc) with
ind = pc.ind + 2;
bef = tab (pc.ind + 2)}
e2
in
sprintf "%s\n%s" s1 s2)
;
The first function is the horizontal printing. It ends with a
sprintf
separating the printing of e1
and e2
by a space.
The possible “before part” (pc.bef
) and “after part” (pc.aft
)
are transmitted in the calls of the sub-functions.
The second function is the vertical printing. It ends with a
sprintf
separating the printing of e1
and e2
by a
newline. The second line starts with an indendation, using the
“before part” (pc.bef
) of the second call to expr
.
The pretty printing library function Pretty.horiz_vertic
calls
the first (horizontal) function, and if it fails (either because
s1
or s2
are too long or hold newlines, or because the final
string produced by sprintf
is too long), calls the second
(vertical) function.
Notice that the parameter pc
contains a field pc.bef
(what
should be printed before in the same line), which in both cases is
transmitted to the printing of e1
(since the syntax
{(pc) with aft = ""}
is a record with pc.bef
kept). Same
for the field pc.aft
transmitted to the printing of e2
.
Using EXTEND_PRINTER statement
This system is combined to the extensible printers to allow the extensibility of the pretty printing.
The code above actually looks like:
EXTEND_PRINTER
pr_expr:
[ [ <:expr< $e1$ $e2$ >> ->
horiz_vertic
(fun () ->
let s1 = curr {(pc) with aft = ""} e1 in
let s2 = next {(pc) with bef = ""} e2 in
sprintf "%s %s" s1 s2)
(fun () ->
let s1 = curr {(pc) with aft = ""} e1 in
let s2 =
next
{(pc) with
ind = pc.ind + 2;
bef = tab (pc.ind + 2)}
e2
in
sprintf "%s\n%s" s1 s2) ] ]
;
END;
The variable “pc
” is implicit in the semantic actions of the
syntax “EXTEND_PRINTER
”, as well as two other variables:
“curr
” and “next
”.
These parameters, “curr
” and “next
”, correspond to the pretty
printing of, respectively, the current level and the next level.
Since the application in OCaml is left associative, the first
sub-expression is printed at the same (current) level and the second
one is printed at the next level. We also see a call to next
in
the last (2nd) case of the function to treat the other cases in the
next level.
Dangling else, bar, semicolon
In normal syntax, there are cases where it is necessary to enclose
expressions between parentheses (or between begin and end, which is
equivalent in that syntax). Three tokens may cause problems: the
“else
”, the vertical bar “|
” and the semicolon “;
”. Here
are examples where the presence of these tokens constrains the
previous expression to be parenthesized. In these three examples,
removing the begin..end enclosers would change the meaning of the
expression because the dangling token would be included in that
expression:
Dangling else:
if a then begin if b then c end else d
Dangling bar:
function
A ->
begin match a with
B -> c
| D -> e
end
| F -> g
Dangling semicolon:
if a then b
else begin
let c = d in
e
end;
f
The information is transmitted by the value pc.dang
. In the first
example above, while displaying the “then
” part of the outer
“if
”, the sub-expression is called with the value pc.dang
set
to "else"
to inform the last sub-sub-expression that it is going
to be followed by that token. When a “if
” expression should be
displayed without “else
” part, and that its “pc.dang
” is
“else”, it should be enclosed with spaces.
This problem does not exist in revised syntax. While pretty printing
in revised syntax, the parameter pc.dang
is not necessary and
remains the empty string.
By level
As explained in the chapter about the extensible
printers (with the EXTEND_PRINTER statement),
printers contain levels. The global printer variable of expressions
is named “pr_expr
” and contain all definitions for pretty
printing expressions, organized by levels, just like the parsing of
expressions. The definition of “pr_expr
” actually looks like
this:
EXTEND_PRINTER
pr_expr:
[ "top"
[ (* code for level "top" *) ]
| "add"
[ (* code for level "add" *) ]
| "mul"
[ (* code for level "mul" *) ]
| "apply"
[ (* code for level "apply" *) ]
| "simple"
[ (* code for level "add" *) ] ]
;
END;
The Prtools module
The Prtools module is defined inside Camlp5 for pretty printing kits. It provides variables and functions to process comments, and meta-functions to process lists (horizontally, vertically, paragraphly).
Comments
value comm_bef : int -> MLast.loc -> string;
- “
comm_bef ind loc
” get the comment from the source just before the given location “loc
”. This comment may be reindented using “ind
”. Returns the empty string if no comment found. value source : ref string;
- The initial source string, which must be set by the pretty printing kit. Used by [comm_bef] above.
value set_comm_min_pos : int -> unit;
- Set the minimum position of the source where comments can be found, (to prevent possible duplication of comments).
Meta functions for lists
type pr_fun 'a = pr_context -> 'a -> string;
- Type of printer functions.
value hlist : pr_fun 'a -> pr_fun (list 'a);
- [hlist elem pc el] returns the horizontally pretty printed string of a list of elements; elements are separated with spaces. The list is displayed in one only line. If this function is called in the context of the [horiz] function of the function [horiz_vertic] of the module Printing, and if the line overflows or contains newlines, the function internally fails (the exception is catched by [horiz_vertic] for a vertical pretty print).
value hlist2 : pr_fun 'a -> pr_fun 'a -> pr_fun (list 'a);
- horizontal list with a different function from 2nd element on.
value hlistl : pr_fun 'a -> pr_fun 'a -> pr_fun (list 'a);
- horizontal list with a different function for the last element.
value vlist : pr_fun 'a -> pr_fun (list 'a);
- [vlist elem pc el] returns the vertically pretty printed string of a list of elements; elements are separated with newlines and indentations.
value vlist2 : pr_fun 'a -> pr_fun 'a -> pr_fun (list 'a);
- vertical list with different function from 2nd element on.
value vlist3 : pr_fun ('a * bool) -> pr_fun ('a * bool) -> pr_fun (list 'a);
- vertical list with different function from 2nd element on, the boolean value being True for the last element of the list.
value vlistl : pr_fun 'a -> pr_fun 'a -> pr_fun (list 'a);
- vertical list with different function for the last element.
value plist : pr_fun 'a -> int -> pr_fun (list ('a * string));
- [plist elem sh pc el] returns the pretty printed string of a list of elements with separators. The elements are printed horizontally as far as possible. When an element does not fit on the line, a newline is added and the element is displayed in the next line with an indentation of [sh]. [elem] is the function to print elements, [el] a list of pairs (element * separator) (the last separator being ignored).
value plistb : pr_fun 'a -> int -> pr_fun (list ('a * string));
- [plist elem sh pc el] returns the pretty printed string of the list of elements, like with [plist] but the value of [pc.bef] corresponds to an element already printed, as it were on the list. Therefore, if the first element of [el] does not fit in the line, a newline and a tabulation is added after [pc.bef].
value plistl : pr_fun 'a -> pr_fun 'a -> int -> pr_fun (list ('a * string));
- paragraph list with a different function for the last element.
value hvlistl : pr_fun 'a -> pr_fun 'a -> pr_fun (list 'a);
- applies “
hlistl
” if the context is horizontal; else applies “vlistl
”.
Miscellaneous
value tab : int -> string;
- [tab ind] is equivalent to [String.make ind ‘ ‘]
value flatten_sequence : MLast.expr -> option (list MLast.expr);
- [flatten_sequence e]. If [e] is an expression representing a sequence, return the list of expressions of the sequence. If some of these expressions are already sequences, they are flattened in the list. If that list contains expressions of the form let..in sequence, this sub-sequence is also flattened with the let..in applying only to the first expression of the sequence. If [e] is a let..in sequence, it works the same way. If [e] is not a sequence nor a let..in sequence, return None.
Example : repeat..until
This pretty prints the example
repeat..until
statement programmed in the chapter Syntax
extensions (first version generating a “while
”
statement).
The code
The pattern generated by the “repeat” statement is recognized
(sequence ending with a “while
” whose contents is the same than
the beginning of the sequence) by the function “is_repeat” and the
repeat statement is pretty printed in its initial form using the
function “horiz_vertic” of the Pretty module. File
“pr_repeat.ml
”:
#load "pa_extprint.cmo";
#load "q_MLast.cmo";
open Pcaml;
open Pretty;
open Prtools;
value eq_expr_list el1 el2 =
if List.length el1 <> List.length el2 then False
else List.for_all2 eq_expr el1 el2
;
value is_repeat el =
match List.rev el with
[ [<:expr< while not $e$ do { $list:el2$ } >> :: rel1] ->
eq_expr_list (List.rev rel1) el2
| _ -> False ]
;
value semi_after pr pc = pr {(pc) with aft = sprintf "%s;" pc.aft};
EXTEND_PRINTER
pr_expr:
[ [ <:expr< do { $list:el$ } >> when is_repeat el ->
match List.rev el with
[ [<:expr< while not $e$ do { $list:el$ } >> :: _] ->
horiz_vertic
(fun () ->
sprintf "%srepeat %s until %s%s" pc.bef
(hlistl (semi_after curr) curr
{(pc) with bef = ""; aft = ""} el)
(curr {(pc) with bef = ""; aft = ""} e)
pc.aft)
(fun () ->
let s1 = sprintf "%srepeat" (tab pc.ind) in
let s2 =
vlistl (semi_after curr) curr
{(pc) with
ind = pc.ind + 2;
bef = tab (pc.ind + 2);
aft = ""}
el
in
let s3 =
sprintf "%suntil %s" (tab pc.ind)
(curr {(pc) with bef = ""} e)
in
sprintf "%s\n%s\n%s" s1 s2 s3)
| _ -> assert False ] ] ]
;
END;
Compilation
ocamlc -pp camlp5r -I +camlp5 -c pr_repeat.ml
Testing
Getting the same files “foo.ml” and “bar.ml” of the repeat syntax example:
$ cat bar.ml
#load "./foo.cmo";
value x = ref 42;
repeat
print_int x.val;
print_endline "";
x.val := x.val + 3
until x.val > 70;
$ camlp
Without the pretty printing kit:
$ camlp5r pr_r.cmo bar.ml
#load "./foo.cmo";
value x = ref 42;
do {
print_int x.val;
print_endline "";
x.val := x.val + 3;
while not (x.val > 70) do {
print_int x.val;
print_endline "";
x.val := x.val + 3
}
};
With the pretty printing kit:
$ camlp5r pr_r.cmo ./pr_repeat.cmo bar.ml -l 75
#load "./foo.cmo";
value x = ref 42;
repeat
print_int x.val;
print_endline "";
x.val := x.val + 3
until x.val > 70;