Module Earley_core.Internals
exceptionErrorException raised for parse error. Can be raise in the action of a grammar using
give_up
type blank= Input.buffer -> int -> Input.buffer * intA blank function is just a function progressing in a buffer
type position= Input.buffer * intPositions
type 'a fpos= Input.buffer -> int -> Input.buffer -> int -> 'atype of a function waiting for positions
type _ pos=|Idt : ('a -> 'a) pos|Simple : 'a -> 'a pos|WithPos : 'a fpos -> 'a pos|Error : 'a posType for action with or without position and its combinators
type 'a input= Input.buffer -> int -> 'a * Input.buffer * intFor terminals: get the start position and returns a value with the final position
type errpos= (Input.buffer * int) option Stdlib.refA reference to record the last error useful when a terminal calls another parsing
type 'a input2= errpos -> blank -> Input.buffer -> int -> 'a inputType for Ter2 terminals: get both the position before and after blank
type 'a test= ('a * bool) fposType for tests: get also both position and return a boolean and a value
type pos2={buf : Input.buffer;col : int;buf_ab : Input.buffer;col_ab : int;}Position record stored in the elements of the earley table. We store the position before and after the blank
val apply_pos_start : 'a pos -> pos2 -> pos2 -> 'aGet the position before and after the parsed text annd apply it to the combinator
val apply_start : (Input.buffer -> int -> Input.buffer -> int -> 'a) -> pos2 -> pos2 -> 'a
type info= (bool * Charset.t) Utils.Fixpoint.tType of the information computed on a rule: the boolean tells if the grammar can parse an empty string and the charset, the first accepted characteres when the rule is used to parse something.
type 'a cref= 'a Container.Ref.containerUse the container references, because they can be reset
type 'a tref= ('a * Input.buffer * int) option crefa reference to the store the result of reading a terminal
module rec Types : sig ... endA BNF grammar is a list of rules. The type parameter
'acorresponds to the type of the semantics of the grammar. For example, parsing using a grammar of typeint grammarwill produce a value of typeint.
and StackContainer : Container.Param with type ('b, 'a) elt = ('a, 'b) Types.pre_stackUse container to store a point to the rule in stack, we use recursive module for that
include Types
and _ symbol=|Term :{input : 'a input;info : info;memo : 'a tref;name : string;}-> 'a symbolterminal symbol just read the input buffer
|Ter2 :{input : 'a input2;info : info;memo : 'a tref;name : string;}-> 'a symbolTer2 correspond to a recursive call to the parser. We can change the blank function for instance, or parse input as much as possible. In fact it is only in the combinators in earley.ml that we see the use Ter2 to call the parser back.
|Test :{input : 'a test;info : info;memo : 'a tref;name : string;}-> 'a symbolTest on the input, can for instance read blanks, usefull for things like ocamldoc (but not yet used by earley-ocaml).
|NonTerm :{mutable rules : 'a rule list;mutable info : info;memo : 'a ntref;name : string;}-> 'a symbolNon terminals trough a mutable field to define recursive rule lists
The symbol, a more general concept that terminals
and _ prerule=|Empty : 'a pos -> 'a preruleEmpty rule.
|Next : info * 'a symbol * ('a, 'c) next -> 'c preruleSequence of a symbol and a rule, with a possible name for debugging, the information on the rule, the symbol to read, an action and the rest of the rule
|Dep : ('a -> 'b rule) -> ('a -> 'b) preruleDependant rule, gives a lot of power! but costly! use only when no other solution is possible
BNF rule.
and ('a, 'c) next=|Arg : ('a -> 'c) rule -> ('a, 'c) next|Pos : ('a -> 'c) fpos rule -> ('a, 'c) next|Ign : 'c rule -> ('a, 'c) nextand 'a rule={rule : 'a prerule;ptr : 'a StackContainer.container;adr : int;}Each rule holds a container to associate data to the rule in O(1). see below the description of the type ('a,'b) pre_stack
type _ final=|D :{start : pos2;position in buffer, before and after blank
stack : ('c, 'r) stack;tree of stack representing what should be done after reading the
restof the ruleacts : 'b -> 'c;action to produce the final 'c.
rest : 'b rule;remaining to parse, will produce 'b
full : 'c rule;full rule. rest is a suffix of full.
}-> 'r finalType of an active element of the earley table. In a description of earley, each table element is
(start, end, done * rest)meaning we parsed the string fromstarttoendwith the ruledoneand it remains to parserest. The*therefore denotes the current position. Earley is basically a dynamic algorithm producing all possible elements.We depart from this representation in two ways:
- we do not represent
done, we keep the whole whole rule:full = done rest
- we never keep
end. It is only used when we finish parsing of a rule and we have an element(start, end, done * Empty)then, we look for other elements of the form(start', end', done' * rest')whereend' = start,rest' = nonterm' rest''andnonterm'is a non terminal containing thedonerule. We represent this situation by a stack in the element(start, end, done * Empty), which is maintained to lists all the elements satisfying the above property (no more, no less, each element only once)
The type
'a finalrepresent an element of the earley table whereendis the current position in the string being parsed.- we do not represent
and (_, _) element=|B : ('a -> 'r) pos -> ('a, 'r) elementEnd of the stack, with an action to produce the final parser value
|C :{start : pos2;stack : ('c, 'r) stack;acts : ('a -> 'b -> 'c) pos;Here we wait
x:'afrom parentsrest : 'b rule;full : 'c rule;}-> ('a, 'r) elementCons cell of the stack with a record similar to D's
Type of the element that appears in stacks. Note: all other elements no reachable from stacks will be collected by the GC, which is what we want to release memory.
The type is similar to the previous:
('a, 'r) element, means that from a value of type 'a, comming from our parent in the stack, we could produce a value of type'rusingrest.The action needs to be prametrised by the future position which is unknown.
and ('a, 'b) stack= ('a, 'b) element list Stdlib.refStack themselves are an acyclic graph of elements (sharing is important to be preserved). We need a reference for the stack construction.
val eq_rule : a b. 'a rule -> 'b rule -> ('a, 'b) Container.eqRule equlity
val eq_C : a b. ('a, 'b) element -> ('a, 'b) element -> boolEquality for stack element: as we keep the invariant, we only need physical equality. You may uncomment the code below to check this.
val eq_D : 'a final -> 'a final -> boolEquality on a final needs to do a comparison as it is used to test if a new element is already present.
val idtEmpty : a. unit -> ('a -> 'a) ruleSome rules/grammar contruction that we need already here
val symbol_name : a. 'a symbol -> stringHandling of symbol, rule and grammar names
val rule_name : a. 'a rule -> stringval grammar_name : a. ?delim:bool -> 'a grammar -> stringval grammar_delim_name : a. 'a grammar -> stringval keep_all_names : bool Stdlib.refval grammar_to_rule : a. ?name:string -> 'a grammar -> 'a ruleConverting a grammar to a rule
val force : 'a Utils.Fixpoint.t -> 'aBasic constants/functions for rule information
val iempty : (bool * Charset.charset) Utils.Fixpoint.tval any : (bool * Charset.charset) Utils.Fixpoint.tval rule_info : a. 'a rule -> infomanagment of info = accept empty + charset accepted as first char
val symbol_info : a. 'a symbol -> infoval compose_info : 'a symbol -> 'b rule -> infoval grammar_info : a. 'a rule list -> infoval print_rule : a b. ?rest:'b rule -> Stdlib.out_channel -> 'a rule -> unitPrinting
val print_pos : Stdlib.out_channel -> pos2 -> unitval print_final : bool -> Stdlib.out_channel -> 'a final -> unitval print_final_no_pos : Stdlib.out_channel -> 'a final -> unitval print_final : Stdlib.out_channel -> 'a final -> unitval print_element : a b. Stdlib.out_channel -> ('a, 'b) element -> unitval print_rule : Stdlib.out_channel -> 'a rule -> unit
module K : sig ... endThis type is the state of the parsing table for the current position it only hold
'a finalelements whoseendare the current position, the keys are the buffer uid's, column position and uid of thefullandrestrules
module HK : sig ... endtype 'a cur= 'a final HK.ttype 'a reads= 'a final Input.OrdTbl.t Stdlib.refType of a table with pending reading, that is elements resulting from reading the string with some symbols. We need this table, because two terminal symbols could read different length of the input from the same point
type 'a sct= 'a StackContainer.tableStack construction. The type below, denotes table associate stack to rule. Recall we construct stack for elements whose
endare the current position
type tmemo= unit Container.Ref.tableTable to memoize the treatment of terminals and non terminals. This is important to garanty that results of actions are physicaly equals when they comes from the same parsing. To do this, terminals must read the input only once and the result must be memoized.
val add_stack_hook : a b. 'b sct -> 'a rule -> (('a, 'b) element -> unit) -> unitadd_stack_hook sct rule fnadds intablea hookfnfor the givenrule.fnwill be called each time an element is added to the stack pointed by thatrule. The hook is run on the existing elements if the stack. The stack is created if it did not exists yet
val add_stack : a b. 'b sct -> 'a rule -> ('a, 'b) element -> ('a, 'b) stackadd_stack sct rule elementadd the givenelementto the stack of the givenrulein the tablesct. Runs all hooks if any. Creates the stack if needed
val find_stack : a b. 'b sct -> 'a rule -> ('a, 'b) stackfind_stack sct rulefinds the stack to associate to the given rule, again the stack is created if needed
val size : 'a final -> (Stdlib.Obj.t, Stdlib.Obj.t) stack -> intComputes the size of an element stack, taking in account sharing
val size_tables : ('a, 'b final) Stdlib.Hashtbl.t -> 'c final Input.OrdTbl.t -> intComputes the size of the two tables, forward reading and the current elements. The other tables do not retain stack pointers
val size : 'a final -> intwrap up the size function
val tail_key : a. 'a rule -> intA fonction to fetch the key of the tail of a rule, needed to get the key of an element representing a complete parsing
val final : b c r. pos2 -> ('b -> 'c) -> ('c, 'r) stack -> 'b rule -> 'c rule -> 'r finalConstructor for the final type which includes its hash ley
val elt_key : 'a final -> int * int * intGet the key of an element
val final_key : pos2 -> 'a rule -> int * int * intGet the key of the final element when parsing is finished
val good : char -> 'a rule -> boolTest is a given char is accepted by the given rule
val max_stack : int Stdlib.refval add : string -> pos2 -> char -> 'a final -> 'a cur -> boolAdds an element in the current table of elements, return true if new
val cns_pos : a b c. ('a -> ('b -> 'c) -> ('a -> 'b) fpos -> 'c) fposval cns_ign : a b c. 'a -> ('b -> 'c) -> 'b -> 'cval combine : a c d. ('c -> 'd) -> ('a -> ('a -> 'c) -> 'd) posThis one for prediction
val combine_pos : a c d. ('c -> 'd) -> ('a -> ('a -> 'c) fpos -> 'd) posval combine_ign : a c d. ('c -> 'd) -> ('a -> 'c -> 'd) pos
exceptionParse_error of Input.buffer * intException for parse error, can also be raise by Ter2 terminals, but no other terminal handles it
val update_errpos : (Input.buffer * int) option Stdlib.ref -> Input.buffer -> int -> unitval pred_prod_lec : a. errpos -> 'a final -> 'a cur -> 'a reads -> 'a sct -> tmemo -> blank -> pos2 -> char -> unitThis is now the main function computing all the consequences of the element given at first argument. It needs
- the element
elt0being added - the
errposto record advaces in the parsing - the four tables:
elements forward sct tmemo - the
blank(to pass it to Ter2 terminals) - the position
cur_posand current charatercfor the action and the good test
It performs prediction/completion/lecture in a recursive way.
- the element
val count : int Stdlib.refval parse_buffer_aux : a. ?errpos:errpos -> bool -> blank -> 'a grammar -> Input.buffer -> int -> 'a * Input.buffer * intThe main parsing loop
val internal_parse_buffer : a. ?errpos:errpos -> ?blank_after:bool -> blank -> 'a grammar -> Input.buffer -> int -> 'a * Input.buffer * intFunction to call the parser in a terminal