Module Earley_core__Earley
Earley is a parser combinator library implemented using the Earley algorithm. It focuses mainly on efficiency and is indended to be used in conjunction with the pa_ocaml parser and syntax extention mechanism.
Types and exceptions
type blank
= Earley_core.Input.buffer -> int -> Earley_core.Input.buffer * int
As
Earley
does scannerless parsing, a notion ofblank
function is used to discard meaningless parts of the input (e.g. comments or spaces). Ablank
function takes as input abuffer
and a position (represented as anint
) and returns a couple of abuffer
and a position corresponding to the next meaningful character.WARNING: a blank function must return a normalized pair (b,p), which means 0 <= p < Input.line_num b. You can use Input.normalize to ensure this.
exception
Parse_error of Earley_core.Input.buffer * int
The exception
Parse_error(buf,pos,msgs)
is raised whenever parsing fails. It contains the positionpos
(and the corresponding bufferbuf
) of the furthest reached position in the input.
val give_up : unit -> 'a
give_up ()
can be called by the user to force the parser to reject a possible parsing rule.
val handle_exception : ?error:(unit -> 'b) -> ('a -> 'b) -> 'a -> 'b
handle_exception fn v
applies the functionfn
tov
and handles theParse_error
exception. In particular, a parse error message is presented to the user in case of a failure, thenerror ()
is called. The defaulterror
isfun () -> exit 1
.
Atomic parsers
val char : ?name:string -> char -> 'a -> 'a grammar
char ~name c v
is a grammar that accepts only the characterc
, and returnsv
as a semantic value. An optionalname
can be given to the grammar for reference in error messages.
val string : ?name:string -> string -> 'a -> 'a grammar
string s v
is a grammar that accepts only the stringstr
, and returnsv
as a semantic value. An optionalname
can be given to the grammar for reference in error messages.
val keyword : ?name:string -> string -> (char -> bool) -> 'a -> 'a grammar
keyword s forbidden v
is simalar to string, but the parsing fails ifforbidden c
returnstrue
whenc
is the next available character.
val eof : 'a -> 'a grammar
eof v
is a grammar that only accepts the end of file and returnsv
as a semantic value. Note that the end of file can be parsed one or more times (i.e. the input ends with infinitely many end of file symbols.
val any : char grammar
any
is a grammar that accepts a single character (but fails on the end of file) and returns its value.
val in_charset : ?name:string -> Earley_core.Charset.charset -> char grammar
in_charset cs
is a grammar that parses any character of thecs
charset, and returns its value. An optionalname
can be given to the grammar for reference in error messages.
val not_in_charset : ?name:string -> Earley_core.Charset.charset -> unit grammar
not_in_charset cs
is similar toin_charset cs
but it accepts the characters that are not incs
.
val blank_not_in_charset : ?name:string -> Earley_core.Charset.charset -> unit grammar
blank_not_in_charset cs
is the same asnot_in_charset
but testing with blank_test.
val empty : 'a -> 'a grammar
empty v
is a grammar that does not parse anything and returnsv
as a semantic value. Note that this grammar never fails.
type 'a fpos
= Earley_core.Input.buffer -> int -> Earley_core.Input.buffer -> int -> 'a
type for a function waiting for the start and end positions (i.e. buffer and index) of an item, in general resulting from parsing
val empty_pos : 'a fpos -> 'a grammar
empty_pos v
is similar to the above except that the action wait for the position of a complete sequence build usingfsequence
ofsequence
.For instance,
sequence_position g1 g2 f
below can be defined asfsequence g1 (fsequence g2 (empty_pos f'))
. wheref' = fun b p b' p' a2 a1 = f b p b' p' a1 a2
to give the result of g1 and g2 in the expected order.
val fail : unit -> 'a grammar
fail ()
is a grammar that always fail, whatever the input.
val black_box : (Earley_core.Input.buffer -> int -> 'a * Earley_core.Input.buffer * int) -> Earley_core.Charset.charset -> bool -> string -> 'a grammar
black_box fn cs accept_empty name
is a grammar that uses the functionfn
to parses the input buffer.fn buf pos
should start parsingbuf
at positionpos
, and return a couple containing the new buffer and position of the first unread character. The character setcs
must contain at least the characters that are accepted as first character byfn
, and no less. The booleanaccept_empty
must be true if the function accept the empty string. Thename
argument is used for reference in error messages. Note that the functonfn
should usegive_up ()
in case of a parse error.WARNING: fn must return a triple (x,b,p) when (b,p) is normalized, which means 0 <= p < Input.line_num b. You can use Input.normalize to ensure this.
val debug : string -> unit grammar
debug msg
is a dummy grammar that always succeeds and printsmsg
onstderr
when used. It is useful for debugging.
val regexp : ?name:string -> string -> string array grammar
regexp ?name re
is a grammar that uses the regexpre
to parse the input buffer. The value returnes is the array of the contents of the groups.
Blanks management
val no_blank : blank
no_blank
is ablank
function that does not discard any character of the input buffer.
val blank_regexp : string -> blank
blank_regexp re
builds a blank from the regexpre
.
val blank_grammar : unit grammar -> blank -> blank
blank_grammar gr bl
produces ablank
function using the grammargr
and theblank
functionbl
. It parses as much of the input as possible using the grammargr
with theblank
functionbl
, and returns the reached position.
val change_layout : ?old_blank_before:bool -> ?new_blank_after:bool -> 'a grammar -> blank -> 'a grammar
change_layout ~old_blank_before ~new_blank_after gr bl
replaces the currentblank
function withbl
, while parsing using the grammargr
. The optional parameterold_blank_before
(true
by default) forces the application of the old blank function, before starting to parse withgr
. Note that the new blank function is always called before the first terminal ofgr
. Similarly, the opt- -ional parameternew_blank_after
(true
by default) forces a call to the new blank function after the end of the parsing ofgr
. Note that the old blank function is always called after the last terminal.
Support for recursive grammars
val declare_grammar : string -> 'a grammar
declare_grammar name
returns a new grammar that can be used in the definition of other grammars, but that cannot be run on input before it has been initialized withset_grammar
. Thename
argument is used for reference to the grammar in error messages.
val set_grammar : 'a grammar -> 'a grammar -> unit
set_grammar gr grdef
set the definiton of grammargr
(previously declared withdeclare_grammar
) to begrdef
.Invalid_argument
is raised ifset_grammar
is used on a grammar that was not created withdeclare_grammar
. The behavious is undefined if a grammar is set twice withset_grammar
.
Parsing functions
val parse_buffer : 'a grammar -> blank -> Earley_core.Input.buffer -> 'a
parse_buffer gr bl buf
parses the bufferbuf
using the grammargr
and the blank functionbl
. The exceptionParse_error
may be raised in case of error.
val parse_string : ?filename:string -> 'a grammar -> blank -> string -> 'a
parse_string ~filename gr bl str
parses the stringstr
using the grammargr
and the blank functionbl
. An optionalfilename
can be provided for reference to the input in error messages. The exceptionParse_error
may be raised in case of error.
val parse_channel : ?filename:string -> 'a grammar -> blank -> Stdlib.in_channel -> 'a
parse_channel ~filename gr bl ch
parses the contenst of the input channelch
using the grammargr
and the blank functionbl
. Afilename
can be provided for reference to the input in case of an error.parse_channel
may raise theParse_error
exception.
val parse_file : 'a grammar -> blank -> string -> 'a
parse_file gr bl fn
parses the filefn
using the grammargr
and the blank functionbl
. The exceptionParse_error
may be raised in case of error.
val partial_parse_buffer : 'a grammar -> blank -> ?blank_after:bool -> Earley_core.Input.buffer -> int -> 'a * Earley_core.Input.buffer * int
partial_parse_buffer gr bl buf pos
parses input from the bufferbuf
starting a positionpos
, using the grammargr
and the blank functionbl
. A triple is returned containing the new buffer, the position that was reached during parsing, and the semantic result of the parsing. The optional argumentblank_after
,true
by default, indicates if the returned position if after the final blank or not. Note that this function should not be used in the defi- nition of a grammar using theblack_box
function.
module WithPP : functor (PP : Earley_core.Input.Preprocessor) -> sig ... end
A functor providing support for using and
Input
preprocessor.
Debuging and flags
val debug_lvl : int Stdlib.ref
debug_lvl
is a flag that can be set forEarley
to display debug data onstderr
. The default value is0
, and bigger numbers acti- vate more and more debuging informations.
Greedy combinator
Sequencing combinators
val sequence : 'a grammar -> 'b grammar -> ('a -> 'b -> 'c) -> 'c grammar
sequence g1 g2 f
is a grammar that first parses usingg1
, and then parses usingg2
. The results of the sequence is then obtained by applyingf
to the results ofg1
andg2
.
val sequence_position : 'a grammar -> 'b grammar -> ('a -> 'b -> 'c) fpos -> 'c grammar
sequence_position g1 g2 f
is a grammar that first parses usingg1
, and then parses usingg2
. The results of the sequence is then obtained by applyingf
to the results ofg1
andg2
, and to the positions (i.e. buffer and index) of the corresponding parsed input.Remark:
sequence g1 g2 f
is equivalent tosequence_position g1 g2 (fun _ _ _ _ -> f)
.
val fsequence : 'a grammar -> ('a -> 'b) grammar -> 'b grammar
fsequence g1 g2
is a grammar that first parses usingg1
, and then parses usingg2
. The results of the sequence is then obtained by applying the result ofg1
to the result ofg2
.Remark:
fsequence g1 g2
is equivalent tosequence g1 g2 (fun x f -> f x)
.
val fsequence_position : 'a grammar -> ('a -> 'b) fpos grammar -> 'b grammar
same as fsequence, but the result of
g2
also receive the position of the result ofg1
.
val fsequence_ignore : 'a grammar -> 'b grammar -> 'b grammar
same as fsequence, but the result of
g2
receives nothing, meaning we forget the result ofg1
.
val sequence3 : 'a grammar -> 'b grammar -> 'c grammar -> ('a -> 'b -> 'c -> 'd) -> 'd grammar
sequence3
is similar tosequence
, but it composes three grammars into a sequence.Remark:
sequence3 g1 g2 g3 f
is equivalent tosequence (sequence g1 g2 f) g3 (fun f x -> f x)
.
val simple_dependent_sequence : 'a grammar -> ('a -> 'b grammar) -> 'b grammar
simple_dependent_sequence g1 g2
is a grammar that first parses usingg1
, which returns a valuea
, and then continues to parse withg2 a
and return its result.
val dependent_sequence : ('a * 'b) grammar -> ('a -> ('b -> 'c) grammar) -> 'c grammar
dependent_sequence g1 g2
is a grammar that first parses usingg1
, which returns a value(a,b)
, and then continues to parse withg2 a
and return its result applied tob
. compared to the above function, allow memoizing the second grammar
val option : 'a -> 'a grammar -> 'a grammar
option v g
tries to parse the input asg
, and returnsv
in case of failure.
val fixpoint : 'a -> ('a -> 'a) grammar -> 'a grammar
fixpoint v g
parses a repetition of one or more times the input parsed byg
. The valuev
is used as the initial value (i.e. to finish the sequence).if parsing X with g returns a function gX, parsing X Y Z with fixpoint a g will return gX (gY (gZ a)).
This consumes stack proportinal to the input length ! use revfixpoint ...
val fixpoint' : 'a -> 'b grammar -> ('b -> 'a -> 'a) -> 'a grammar
val fixpoint1 : 'a -> ('a -> 'a) grammar -> 'a grammar
as
fixpoint
but parses at leat once with the given grammar
val fixpoint1' : 'a -> 'b grammar -> ('b -> 'a -> 'a) -> 'a grammar
val list0 : 'a grammar -> unit grammar -> 'a list grammar
listN g sep
parses sequences ofg
separated bysep
of length at leastN
, forN=0,1
or2
.
val list1 : 'a grammar -> unit grammar -> 'a list grammar
val list2 : 'a grammar -> unit grammar -> 'a list grammar
val alternatives : 'a grammar list -> 'a grammar
alternatives [g1;...;gn]
tries to parse using all the grammars[g1;...;gn]
and keeps only the first success.
val apply : ('a -> 'b) -> 'a grammar -> 'b grammar
apply f g
applies functionf
to the value returned by the grammarg
.
val apply_position : ('a -> 'b) fpos -> 'a grammar -> 'b grammar
apply_position f g
applies functionf
to the value returned by the grammarg
and the positions at the beginning and at the end of the input parsed input.
val position : 'a grammar -> (string * int * int * int * int * 'a) grammar
position g
tranforms the grammarg
to add information about the position of the parsed text.
val test : ?name:string -> Earley_core.Charset.t -> (Earley_core.Input.buffer -> int -> 'a * bool) -> 'a grammar
test c f
perform a testf
on the input buffer. Do not parse anything (position are unchanged). The charsetc
should contains all character accepted as at the position given to f
val blank_test : ?name:string -> Earley_core.Charset.t -> ('a * bool) fpos -> 'a grammar
blank_test c f
same as above except thatf
is applied tobuf' pos' buf pos
where(buf', pos')
is the position before the blank. The charset c should contains all character accepted as at the position (buf,pos). This allow to test the presence of blank or even to read the blank and return some information
val with_blank_test : 'a -> 'a grammar
a test that fails if there is no blank
val no_blank_test : 'a -> 'a grammar
a test that fails if there are some blank
val grammar_family : ?param_to_string:('a -> string) -> string -> ('a -> 'b grammar) * (('a -> 'b grammar) -> unit)
grammar_family to_str name
returns a pair(gs, set_gs)
, wheregs
is a finite family of grammars parametrized by a value of type'a
. A namename
is to be provided for the family, and an optional functionto_str
can be provided to print the parameter and display better error messages.
val grammar_prio : ?param_to_string:('b -> string) -> string -> ('b -> 'c grammar) * (((('b -> bool) * 'c grammar) list * ('b -> 'c grammar list)) -> unit)
Similar to the previous one, with an optimization.
grammar_prio to_str name
returns a pair(gs, set_gs)
, wheregs
is a finite family of grammars parametrized by a value of type'a
.set_gs
requires two lists of grammars to set the value of the grammar:- the first list are grammar that can only be activated by the parameter (if the given function return true)
- the second list is used as for grammar family
val grammar_prio_family : ?param_to_string:(('a * 'b) -> string) -> string -> ('a -> 'b -> 'c grammar) * (('a -> (('b -> bool) * 'c grammar) list * ('b -> 'c grammar list)) -> unit)
A mixture of the two above
val accept_empty : 'a grammar -> bool
accept_empty g
returnstrue
if the grammarg
accepts the empty input andfalse
otherwise.
val grammar_info : 'a grammar -> bool * Earley_core.Charset.t
val give_name : string -> 'a grammar -> 'a grammar
give a name to the grammar. Usefull for debugging.