Module Earley_core__Earley
Earley is a parser combinator library implemented using the Earley algorithm. It focuses mainly on efficiency and is indended to be used in conjunction with the pa_ocaml parser and syntax extention mechanism.
Types and exceptions
type blank= Earley_core.Input.buffer -> int -> Earley_core.Input.buffer * intAs
Earleydoes scannerless parsing, a notion ofblankfunction is used to discard meaningless parts of the input (e.g. comments or spaces). Ablankfunction takes as input abufferand a position (represented as anint) and returns a couple of abufferand a position corresponding to the next meaningful character.WARNING: a blank function must return a normalized pair (b,p), which means 0 <= p < Input.line_num b. You can use Input.normalize to ensure this.
exceptionParse_error of Earley_core.Input.buffer * intThe exception
Parse_error(buf,pos,msgs)is raised whenever parsing fails. It contains the positionpos(and the corresponding bufferbuf) of the furthest reached position in the input.
val give_up : unit -> 'agive_up ()can be called by the user to force the parser to reject a possible parsing rule.
val handle_exception : ?error:(unit -> 'b) -> ('a -> 'b) -> 'a -> 'bhandle_exception fn vapplies the functionfntovand handles theParse_errorexception. In particular, a parse error message is presented to the user in case of a failure, thenerror ()is called. The defaulterrorisfun () -> exit 1.
Atomic parsers
val char : ?name:string -> char -> 'a -> 'a grammarchar ~name c vis a grammar that accepts only the characterc, and returnsvas a semantic value. An optionalnamecan be given to the grammar for reference in error messages.
val string : ?name:string -> string -> 'a -> 'a grammarstring s vis a grammar that accepts only the stringstr, and returnsvas a semantic value. An optionalnamecan be given to the grammar for reference in error messages.
val keyword : ?name:string -> string -> (char -> bool) -> 'a -> 'a grammarkeyword s forbidden vis simalar to string, but the parsing fails ifforbidden creturnstruewhencis the next available character.
val eof : 'a -> 'a grammareof vis a grammar that only accepts the end of file and returnsvas a semantic value. Note that the end of file can be parsed one or more times (i.e. the input ends with infinitely many end of file symbols.
val any : char grammaranyis a grammar that accepts a single character (but fails on the end of file) and returns its value.
val in_charset : ?name:string -> Earley_core.Charset.charset -> char grammarin_charset csis a grammar that parses any character of thecscharset, and returns its value. An optionalnamecan be given to the grammar for reference in error messages.
val not_in_charset : ?name:string -> Earley_core.Charset.charset -> unit grammarnot_in_charset csis similar toin_charset csbut it accepts the characters that are not incs.
val blank_not_in_charset : ?name:string -> Earley_core.Charset.charset -> unit grammarblank_not_in_charset csis the same asnot_in_charsetbut testing with blank_test.
val empty : 'a -> 'a grammarempty vis a grammar that does not parse anything and returnsvas a semantic value. Note that this grammar never fails.
type 'a fpos= Earley_core.Input.buffer -> int -> Earley_core.Input.buffer -> int -> 'atype for a function waiting for the start and end positions (i.e. buffer and index) of an item, in general resulting from parsing
val empty_pos : 'a fpos -> 'a grammarempty_pos vis similar to the above except that the action wait for the position of a complete sequence build usingfsequenceofsequence.For instance,
sequence_position g1 g2 fbelow can be defined asfsequence g1 (fsequence g2 (empty_pos f')). wheref' = fun b p b' p' a2 a1 = f b p b' p' a1 a2to give the result of g1 and g2 in the expected order.
val fail : unit -> 'a grammarfail ()is a grammar that always fail, whatever the input.
val black_box : (Earley_core.Input.buffer -> int -> 'a * Earley_core.Input.buffer * int) -> Earley_core.Charset.charset -> bool -> string -> 'a grammarblack_box fn cs accept_empty nameis a grammar that uses the functionfnto parses the input buffer.fn buf posshould start parsingbufat positionpos, and return a couple containing the new buffer and position of the first unread character. The character setcsmust contain at least the characters that are accepted as first character byfn, and no less. The booleanaccept_emptymust be true if the function accept the empty string. Thenameargument is used for reference in error messages. Note that the functonfnshould usegive_up ()in case of a parse error.WARNING: fn must return a triple (x,b,p) when (b,p) is normalized, which means 0 <= p < Input.line_num b. You can use Input.normalize to ensure this.
val debug : string -> unit grammardebug msgis a dummy grammar that always succeeds and printsmsgonstderrwhen used. It is useful for debugging.
val regexp : ?name:string -> string -> string array grammarregexp ?name reis a grammar that uses the regexpreto parse the input buffer. The value returnes is the array of the contents of the groups.
Blanks management
val no_blank : blankno_blankis ablankfunction that does not discard any character of the input buffer.
val blank_regexp : string -> blankblank_regexp rebuilds a blank from the regexpre.
val blank_grammar : unit grammar -> blank -> blankblank_grammar gr blproduces ablankfunction using the grammargrand theblankfunctionbl. It parses as much of the input as possible using the grammargrwith theblankfunctionbl, and returns the reached position.
val change_layout : ?old_blank_before:bool -> ?new_blank_after:bool -> 'a grammar -> blank -> 'a grammarchange_layout ~old_blank_before ~new_blank_after gr blreplaces the currentblankfunction withbl, while parsing using the grammargr. The optional parameterold_blank_before(trueby default) forces the application of the old blank function, before starting to parse withgr. Note that the new blank function is always called before the first terminal ofgr. Similarly, the opt- -ional parameternew_blank_after(trueby default) forces a call to the new blank function after the end of the parsing ofgr. Note that the old blank function is always called after the last terminal.
Support for recursive grammars
val declare_grammar : string -> 'a grammardeclare_grammar namereturns a new grammar that can be used in the definition of other grammars, but that cannot be run on input before it has been initialized withset_grammar. Thenameargument is used for reference to the grammar in error messages.
val set_grammar : 'a grammar -> 'a grammar -> unitset_grammar gr grdefset the definiton of grammargr(previously declared withdeclare_grammar) to begrdef.Invalid_argumentis raised ifset_grammaris used on a grammar that was not created withdeclare_grammar. The behavious is undefined if a grammar is set twice withset_grammar.
Parsing functions
val parse_buffer : 'a grammar -> blank -> Earley_core.Input.buffer -> 'aparse_buffer gr bl bufparses the bufferbufusing the grammargrand the blank functionbl. The exceptionParse_errormay be raised in case of error.
val parse_string : ?filename:string -> 'a grammar -> blank -> string -> 'aparse_string ~filename gr bl strparses the stringstrusing the grammargrand the blank functionbl. An optionalfilenamecan be provided for reference to the input in error messages. The exceptionParse_errormay be raised in case of error.
val parse_channel : ?filename:string -> 'a grammar -> blank -> Stdlib.in_channel -> 'aparse_channel ~filename gr bl chparses the contenst of the input channelchusing the grammargrand the blank functionbl. Afilenamecan be provided for reference to the input in case of an error.parse_channelmay raise theParse_errorexception.
val parse_file : 'a grammar -> blank -> string -> 'aparse_file gr bl fnparses the filefnusing the grammargrand the blank functionbl. The exceptionParse_errormay be raised in case of error.
val partial_parse_buffer : 'a grammar -> blank -> ?blank_after:bool -> Earley_core.Input.buffer -> int -> 'a * Earley_core.Input.buffer * intpartial_parse_buffer gr bl buf posparses input from the bufferbufstarting a positionpos, using the grammargrand the blank functionbl. A triple is returned containing the new buffer, the position that was reached during parsing, and the semantic result of the parsing. The optional argumentblank_after,trueby default, indicates if the returned position if after the final blank or not. Note that this function should not be used in the defi- nition of a grammar using theblack_boxfunction.
module WithPP : functor (PP : Earley_core.Input.Preprocessor) -> sig ... endA functor providing support for using and
Inputpreprocessor.
Debuging and flags
val debug_lvl : int Stdlib.refdebug_lvlis a flag that can be set forEarleyto display debug data onstderr. The default value is0, and bigger numbers acti- vate more and more debuging informations.
Greedy combinator
Sequencing combinators
val sequence : 'a grammar -> 'b grammar -> ('a -> 'b -> 'c) -> 'c grammarsequence g1 g2 fis a grammar that first parses usingg1, and then parses usingg2. The results of the sequence is then obtained by applyingfto the results ofg1andg2.
val sequence_position : 'a grammar -> 'b grammar -> ('a -> 'b -> 'c) fpos -> 'c grammarsequence_position g1 g2 fis a grammar that first parses usingg1, and then parses usingg2. The results of the sequence is then obtained by applyingfto the results ofg1andg2, and to the positions (i.e. buffer and index) of the corresponding parsed input.Remark:
sequence g1 g2 fis equivalent tosequence_position g1 g2 (fun _ _ _ _ -> f).
val fsequence : 'a grammar -> ('a -> 'b) grammar -> 'b grammarfsequence g1 g2is a grammar that first parses usingg1, and then parses usingg2. The results of the sequence is then obtained by applying the result ofg1to the result ofg2.Remark:
fsequence g1 g2is equivalent tosequence g1 g2 (fun x f -> f x).
val fsequence_position : 'a grammar -> ('a -> 'b) fpos grammar -> 'b grammarsame as fsequence, but the result of
g2also receive the position of the result ofg1.
val fsequence_ignore : 'a grammar -> 'b grammar -> 'b grammarsame as fsequence, but the result of
g2receives nothing, meaning we forget the result ofg1.
val sequence3 : 'a grammar -> 'b grammar -> 'c grammar -> ('a -> 'b -> 'c -> 'd) -> 'd grammarsequence3is similar tosequence, but it composes three grammars into a sequence.Remark:
sequence3 g1 g2 g3 fis equivalent tosequence (sequence g1 g2 f) g3 (fun f x -> f x).
val simple_dependent_sequence : 'a grammar -> ('a -> 'b grammar) -> 'b grammarsimple_dependent_sequence g1 g2is a grammar that first parses usingg1, which returns a valuea, and then continues to parse withg2 aand return its result.
val dependent_sequence : ('a * 'b) grammar -> ('a -> ('b -> 'c) grammar) -> 'c grammardependent_sequence g1 g2is a grammar that first parses usingg1, which returns a value(a,b), and then continues to parse withg2 aand return its result applied tob. compared to the above function, allow memoizing the second grammar
val option : 'a -> 'a grammar -> 'a grammaroption v gtries to parse the input asg, and returnsvin case of failure.
val fixpoint : 'a -> ('a -> 'a) grammar -> 'a grammarfixpoint v gparses a repetition of one or more times the input parsed byg. The valuevis used as the initial value (i.e. to finish the sequence).if parsing X with g returns a function gX, parsing X Y Z with fixpoint a g will return gX (gY (gZ a)).
This consumes stack proportinal to the input length ! use revfixpoint ...
val fixpoint' : 'a -> 'b grammar -> ('b -> 'a -> 'a) -> 'a grammarval fixpoint1 : 'a -> ('a -> 'a) grammar -> 'a grammaras
fixpointbut parses at leat once with the given grammar
val fixpoint1' : 'a -> 'b grammar -> ('b -> 'a -> 'a) -> 'a grammarval list0 : 'a grammar -> unit grammar -> 'a list grammarlistN g sepparses sequences ofgseparated bysepof length at leastN, forN=0,1or2.
val list1 : 'a grammar -> unit grammar -> 'a list grammarval list2 : 'a grammar -> unit grammar -> 'a list grammarval alternatives : 'a grammar list -> 'a grammaralternatives [g1;...;gn]tries to parse using all the grammars[g1;...;gn]and keeps only the first success.
val apply : ('a -> 'b) -> 'a grammar -> 'b grammarapply f gapplies functionfto the value returned by the grammarg.
val apply_position : ('a -> 'b) fpos -> 'a grammar -> 'b grammarapply_position f gapplies functionfto the value returned by the grammargand the positions at the beginning and at the end of the input parsed input.
val position : 'a grammar -> (string * int * int * int * int * 'a) grammarposition gtranforms the grammargto add information about the position of the parsed text.
val test : ?name:string -> Earley_core.Charset.t -> (Earley_core.Input.buffer -> int -> 'a * bool) -> 'a grammartest c fperform a testfon the input buffer. Do not parse anything (position are unchanged). The charsetcshould contains all character accepted as at the position given to f
val blank_test : ?name:string -> Earley_core.Charset.t -> ('a * bool) fpos -> 'a grammarblank_test c fsame as above except thatfis applied tobuf' pos' buf poswhere(buf', pos')is the position before the blank. The charset c should contains all character accepted as at the position (buf,pos). This allow to test the presence of blank or even to read the blank and return some information
val with_blank_test : 'a -> 'a grammara test that fails if there is no blank
val no_blank_test : 'a -> 'a grammara test that fails if there are some blank
val grammar_family : ?param_to_string:('a -> string) -> string -> ('a -> 'b grammar) * (('a -> 'b grammar) -> unit)grammar_family to_str namereturns a pair(gs, set_gs), wheregsis a finite family of grammars parametrized by a value of type'a. A namenameis to be provided for the family, and an optional functionto_strcan be provided to print the parameter and display better error messages.
val grammar_prio : ?param_to_string:('b -> string) -> string -> ('b -> 'c grammar) * (((('b -> bool) * 'c grammar) list * ('b -> 'c grammar list)) -> unit)Similar to the previous one, with an optimization.
grammar_prio to_str namereturns a pair(gs, set_gs), wheregsis a finite family of grammars parametrized by a value of type'a.set_gsrequires two lists of grammars to set the value of the grammar:- the first list are grammar that can only be activated by the parameter (if the given function return true)
- the second list is used as for grammar family
val grammar_prio_family : ?param_to_string:(('a * 'b) -> string) -> string -> ('a -> 'b -> 'c grammar) * (('a -> (('b -> bool) * 'c grammar) list * ('b -> 'c grammar list)) -> unit)A mixture of the two above
val accept_empty : 'a grammar -> boolaccept_empty greturnstrueif the grammargaccepts the empty input andfalseotherwise.
val grammar_info : 'a grammar -> bool * Earley_core.Charset.tval give_name : string -> 'a grammar -> 'a grammargive a name to the grammar. Usefull for debugging.