Module Earley_core.Internals
exception
Error
Exception raised for parse error. Can be raise in the action of a grammar using
give_up
type blank
= Input.buffer -> int -> Input.buffer * int
A blank function is just a function progressing in a buffer
type position
= Input.buffer * int
Positions
type 'a fpos
= Input.buffer -> int -> Input.buffer -> int -> 'a
type of a function waiting for positions
type _ pos
=
|
Idt : ('a -> 'a) pos
|
Simple : 'a -> 'a pos
|
WithPos : 'a fpos -> 'a pos
|
Error : 'a pos
Type for action with or without position and its combinators
type 'a input
= Input.buffer -> int -> 'a * Input.buffer * int
For terminals: get the start position and returns a value with the final position
type errpos
= (Input.buffer * int) option Stdlib.ref
A reference to record the last error useful when a terminal calls another parsing
type 'a input2
= errpos -> blank -> Input.buffer -> int -> 'a input
Type for Ter2 terminals: get both the position before and after blank
type 'a test
= ('a * bool) fpos
Type for tests: get also both position and return a boolean and a value
type pos2
=
{
buf : Input.buffer;
col : int;
buf_ab : Input.buffer;
col_ab : int;
}
Position record stored in the elements of the earley table. We store the position before and after the blank
val apply_pos_start : 'a pos -> pos2 -> pos2 -> 'a
Get the position before and after the parsed text annd apply it to the combinator
val apply_start : (Input.buffer -> int -> Input.buffer -> int -> 'a) -> pos2 -> pos2 -> 'a
type info
= (bool * Charset.t) Utils.Fixpoint.t
Type of the information computed on a rule: the boolean tells if the grammar can parse an empty string and the charset, the first accepted characteres when the rule is used to parse something.
type 'a cref
= 'a Container.Ref.container
Use the container references, because they can be reset
type 'a tref
= ('a * Input.buffer * int) option cref
a reference to the store the result of reading a terminal
module rec Types : sig ... end
A BNF grammar is a list of rules. The type parameter
'a
corresponds to the type of the semantics of the grammar. For example, parsing using a grammar of typeint grammar
will produce a value of typeint
.
and StackContainer : Container.Param with type ('b, 'a) elt = ('a, 'b) Types.pre_stack
Use container to store a point to the rule in stack, we use recursive module for that
include Types
and _ symbol
=
|
Term :
{
input : 'a input;
info : info;
memo : 'a tref;
name : string;
}
-> 'a symbol
terminal symbol just read the input buffer
|
Ter2 :
{
input : 'a input2;
info : info;
memo : 'a tref;
name : string;
}
-> 'a symbol
Ter2 correspond to a recursive call to the parser. We can change the blank function for instance, or parse input as much as possible. In fact it is only in the combinators in earley.ml that we see the use Ter2 to call the parser back.
|
Test :
{
input : 'a test;
info : info;
memo : 'a tref;
name : string;
}
-> 'a symbol
Test on the input, can for instance read blanks, usefull for things like ocamldoc (but not yet used by earley-ocaml).
|
NonTerm :
{
mutable rules : 'a rule list;
mutable info : info;
memo : 'a ntref;
name : string;
}
-> 'a symbol
Non terminals trough a mutable field to define recursive rule lists
The symbol, a more general concept that terminals
and _ prerule
=
|
Empty : 'a pos -> 'a prerule
Empty rule.
|
Next : info * 'a symbol * ('a, 'c) next -> 'c prerule
Sequence of a symbol and a rule, with a possible name for debugging, the information on the rule, the symbol to read, an action and the rest of the rule
|
Dep : ('a -> 'b rule) -> ('a -> 'b) prerule
Dependant rule, gives a lot of power! but costly! use only when no other solution is possible
BNF rule.
and ('a, 'c) next
=
|
Arg : ('a -> 'c) rule -> ('a, 'c) next
|
Pos : ('a -> 'c) fpos rule -> ('a, 'c) next
|
Ign : 'c rule -> ('a, 'c) next
and 'a rule
=
{
rule : 'a prerule;
ptr : 'a StackContainer.container;
adr : int;
}
Each rule holds a container to associate data to the rule in O(1). see below the description of the type ('a,'b) pre_stack
type _ final
=
|
D :
{
start : pos2;
position in buffer, before and after blank
stack : ('c, 'r) stack;
tree of stack representing what should be done after reading the
rest
of the ruleacts : 'b -> 'c;
action to produce the final 'c.
rest : 'b rule;
remaining to parse, will produce 'b
full : 'c rule;
full rule. rest is a suffix of full.
}
-> 'r final
Type of an active element of the earley table. In a description of earley, each table element is
(start, end, done * rest)
meaning we parsed the string fromstart
toend
with the ruledone
and it remains to parserest
. The*
therefore denotes the current position. Earley is basically a dynamic algorithm producing all possible elements.We depart from this representation in two ways:
- we do not represent
done
, we keep the whole whole rule:full = done rest
- we never keep
end
. It is only used when we finish parsing of a rule and we have an element(start, end, done * Empty)
then, we look for other elements of the form(start', end', done' * rest')
whereend' = start
,rest' = nonterm' rest''
andnonterm'
is a non terminal containing thedone
rule. We represent this situation by a stack in the element(start, end, done * Empty)
, which is maintained to lists all the elements satisfying the above property (no more, no less, each element only once)
The type
'a final
represent an element of the earley table whereend
is the current position in the string being parsed.- we do not represent
and (_, _) element
=
|
B : ('a -> 'r) pos -> ('a, 'r) element
End of the stack, with an action to produce the final parser value
|
C :
{
start : pos2;
stack : ('c, 'r) stack;
acts : ('a -> 'b -> 'c) pos;
Here we wait
x:'a
from parentsrest : 'b rule;
full : 'c rule;
}
-> ('a, 'r) element
Cons cell of the stack with a record similar to D's
Type of the element that appears in stacks. Note: all other elements no reachable from stacks will be collected by the GC, which is what we want to release memory.
The type is similar to the previous:
('a, 'r) element
, means that from a value of type 'a, comming from our parent in the stack, we could produce a value of type'r
usingrest
.The action needs to be prametrised by the future position which is unknown.
and ('a, 'b) stack
= ('a, 'b) element list Stdlib.ref
Stack themselves are an acyclic graph of elements (sharing is important to be preserved). We need a reference for the stack construction.
val eq_rule : a b. 'a rule -> 'b rule -> ('a, 'b) Container.eq
Rule equlity
val eq_C : a b. ('a, 'b) element -> ('a, 'b) element -> bool
Equality for stack element: as we keep the invariant, we only need physical equality. You may uncomment the code below to check this.
val eq_D : 'a final -> 'a final -> bool
Equality on a final needs to do a comparison as it is used to test if a new element is already present.
val idtEmpty : a. unit -> ('a -> 'a) rule
Some rules/grammar contruction that we need already here
val symbol_name : a. 'a symbol -> string
Handling of symbol, rule and grammar names
val rule_name : a. 'a rule -> string
val grammar_name : a. ?delim:bool -> 'a grammar -> string
val grammar_delim_name : a. 'a grammar -> string
val keep_all_names : bool Stdlib.ref
val grammar_to_rule : a. ?name:string -> 'a grammar -> 'a rule
Converting a grammar to a rule
val force : 'a Utils.Fixpoint.t -> 'a
Basic constants/functions for rule information
val iempty : (bool * Charset.charset) Utils.Fixpoint.t
val any : (bool * Charset.charset) Utils.Fixpoint.t
val rule_info : a. 'a rule -> info
managment of info = accept empty + charset accepted as first char
val symbol_info : a. 'a symbol -> info
val compose_info : 'a symbol -> 'b rule -> info
val grammar_info : a. 'a rule list -> info
val print_rule : a b. ?rest:'b rule -> Stdlib.out_channel -> 'a rule -> unit
Printing
val print_pos : Stdlib.out_channel -> pos2 -> unit
val print_final : bool -> Stdlib.out_channel -> 'a final -> unit
val print_final_no_pos : Stdlib.out_channel -> 'a final -> unit
val print_final : Stdlib.out_channel -> 'a final -> unit
val print_element : a b. Stdlib.out_channel -> ('a, 'b) element -> unit
val print_rule : Stdlib.out_channel -> 'a rule -> unit
module K : sig ... end
This type is the state of the parsing table for the current position it only hold
'a final
elements whoseend
are the current position, the keys are the buffer uid's, column position and uid of thefull
andrest
rules
module HK : sig ... end
type 'a cur
= 'a final HK.t
type 'a reads
= 'a final Input.OrdTbl.t Stdlib.ref
Type of a table with pending reading, that is elements resulting from reading the string with some symbols. We need this table, because two terminal symbols could read different length of the input from the same point
type 'a sct
= 'a StackContainer.table
Stack construction. The type below, denotes table associate stack to rule. Recall we construct stack for elements whose
end
are the current position
type tmemo
= unit Container.Ref.table
Table to memoize the treatment of terminals and non terminals. This is important to garanty that results of actions are physicaly equals when they comes from the same parsing. To do this, terminals must read the input only once and the result must be memoized.
val add_stack_hook : a b. 'b sct -> 'a rule -> (('a, 'b) element -> unit) -> unit
add_stack_hook sct rule fn
adds intable
a hookfn
for the givenrule
.fn
will be called each time an element is added to the stack pointed by thatrule
. The hook is run on the existing elements if the stack. The stack is created if it did not exists yet
val add_stack : a b. 'b sct -> 'a rule -> ('a, 'b) element -> ('a, 'b) stack
add_stack sct rule element
add the givenelement
to the stack of the givenrule
in the tablesct
. Runs all hooks if any. Creates the stack if needed
val find_stack : a b. 'b sct -> 'a rule -> ('a, 'b) stack
find_stack sct rule
finds the stack to associate to the given rule, again the stack is created if needed
val size : 'a final -> (Stdlib.Obj.t, Stdlib.Obj.t) stack -> int
Computes the size of an element stack, taking in account sharing
val size_tables : ('a, 'b final) Stdlib.Hashtbl.t -> 'c final Input.OrdTbl.t -> int
Computes the size of the two tables, forward reading and the current elements. The other tables do not retain stack pointers
val size : 'a final -> int
wrap up the size function
val tail_key : a. 'a rule -> int
A fonction to fetch the key of the tail of a rule, needed to get the key of an element representing a complete parsing
val final : b c r. pos2 -> ('b -> 'c) -> ('c, 'r) stack -> 'b rule -> 'c rule -> 'r final
Constructor for the final type which includes its hash ley
val elt_key : 'a final -> int * int * int
Get the key of an element
val final_key : pos2 -> 'a rule -> int * int * int
Get the key of the final element when parsing is finished
val good : char -> 'a rule -> bool
Test is a given char is accepted by the given rule
val max_stack : int Stdlib.ref
val add : string -> pos2 -> char -> 'a final -> 'a cur -> bool
Adds an element in the current table of elements, return true if new
val cns_pos : a b c. ('a -> ('b -> 'c) -> ('a -> 'b) fpos -> 'c) fpos
val cns_ign : a b c. 'a -> ('b -> 'c) -> 'b -> 'c
val combine : a c d. ('c -> 'd) -> ('a -> ('a -> 'c) -> 'd) pos
This one for prediction
val combine_pos : a c d. ('c -> 'd) -> ('a -> ('a -> 'c) fpos -> 'd) pos
val combine_ign : a c d. ('c -> 'd) -> ('a -> 'c -> 'd) pos
exception
Parse_error of Input.buffer * int
Exception for parse error, can also be raise by Ter2 terminals, but no other terminal handles it
val update_errpos : (Input.buffer * int) option Stdlib.ref -> Input.buffer -> int -> unit
val pred_prod_lec : a. errpos -> 'a final -> 'a cur -> 'a reads -> 'a sct -> tmemo -> blank -> pos2 -> char -> unit
This is now the main function computing all the consequences of the element given at first argument. It needs
- the element
elt0
being added - the
errpos
to record advaces in the parsing - the four tables:
elements forward sct tmemo
- the
blank
(to pass it to Ter2 terminals) - the position
cur_pos
and current charaterc
for the action and the good test
It performs prediction/completion/lecture in a recursive way.
- the element
val count : int Stdlib.ref
val parse_buffer_aux : a. ?errpos:errpos -> bool -> blank -> 'a grammar -> Input.buffer -> int -> 'a * Input.buffer * int
The main parsing loop
val internal_parse_buffer : a. ?errpos:errpos -> ?blank_after:bool -> blank -> 'a grammar -> Input.buffer -> int -> 'a * Input.buffer * int
Function to call the parser in a terminal