TinyMLBy Forrest Briggs, Feb 20, 2008
TinyML - Lexer, Parser, Interpreter, and Type Checker in Under 700 Lines of SML
Overview TinyML is a polymorphic functional programming language with static type checking, implemented in less than 700 lines of SML code. This includes:
Download Source TinyML is open source. Click here to download the TinyML source files. Examples Below are some examples of code that TinyML can parse, type check and run: Basic int artimetic operators, function application, and order of operations: * 3 (+ 1 (- 5 7)) ⇒ type: int ⇒ ~3 if not (and (< 1 2) (or true false)) then 4 else + 5 6 ⇒ type: int ⇒ 11 (fn x => + x 2) 5 ⇒ type: int ⇒ 7 let x = 3 in + 1 x ⇒ type: int ⇒ 4 nil is ['tv97], meaning a list of elements of type 'tv97, which is like the type 'a in SML):
head [1, 2, 3, 4] ⇒ type: int ⇒ 1 tail [1, 2, 3, 4] ⇒ type: [int] ⇒ [2, 3, 4] isempty (tail [1, 2, 3]) ⇒ type: bool ⇒ false nil ⇒ type: ['tv97] ⇒ [] I is applied to itself several times, before being applied to 3. This illustrates the type checker's capability to handle
let-polymoprhism; each occurence of I in this expression has a different type:
I (I I) I 3 ⇒ type: int ⇒ 3 pair 1 false is {product int bool}, meaning a product of an int and a bool.
The last example constructs a list of pairs of ints, which has type [{product int int}].
pair 1 false
⇒ type: {product int bool}
⇒ (1,false)
fst (pair 1 false)
⇒ type: int
⇒ 1
snd (pair 1 false)
⇒ type: bool
⇒ false
[pair 1 2, pair 3 4]
⇒ type: [{product int int}]
⇒ [(1,2),(3,4)]
letrec expressions, which are used to define recursive functions. Notice that in the second example, we define the length function,
but do not apply it to a list. The type checker determines the general polymorphic type ['tv1833] → int for length, meaning it will return an int
given a list of elements of any type. The last example first defines the curried, higher-order function foldl, uses it to define the function reverse, then reverses the
list [1, 2, 3].
letrec factorial x = if (eqInt 0 x) then 1 else * x (factorial (- x 1)) in factorial 5 ⇒ type: int ⇒ 120 letrec length l = if isempty l then 0 else + 1 (length (tail l)) in length ⇒ type: (['tv1833] -> int) ⇒ closure letrec length l = if isempty l then 0 else + 1 (length (tail l)) in length [true, true, false, true] ⇒ type: int ⇒ 4 let foldl = fn f => letrec folder acc = fn l => if (isempty l) then acc else folder (f (head l) acc) (tail l) in folder in let reverse = foldl cons nil in reverse [1, 2, 3] ⇒ type: [int] ⇒ [3, 2, 1] val declarations. The matchPair and listCase
functions provide a syntactically unsugared form of pattern matching. The matchPair function
takes a pair and a function that takes two arguments- the first and second elements of the pair. matchPair calls
the function it is passed, and passes the elements of the pair to it as arguments. Similarly, listcase takes a function
that accepts the head and tail of the list as arguments, as well as a value to use if the list is nil.
val matchPair = fn p => fn f => f (fst p) (snd p) matchPair (pair 1 5) (fn fst => fn snd => + fst snd) ⇒ type: int ⇒ 6 val listcase = fn l => fn nilcase => fn f => if (isempty l) then nilcase else f (head l) (tail l) listcase [1, 2, 3] 0 (fn h => fn t => h) ⇒ type: int ⇒ 1 val declaration, TinyML infers the type of the declared value. However, if an annotation
is present, it type checks the declaration to make sure the value is well typed, then ignores the type it infers and extends the environment with the user
supplied type annotation. This is a hack, but it makes it possible to define arbitrary recursive algebraic data types (and it took very little
code to implement). As long as this functionality is used carefully, there are no problems at runtime, and the type checker can infer the types of
new expressions that don't have annotations, but which involve values that were declared with annotations. In the following example, several annotated
value declarations define a new type, 'a option (which is a built-in type in SML).
val some : 'a -> {option 'a} = fn x => [x]
val none : {option 'a} = []
val isSome : {option 'a} -> bool = fn x => not (isempty x)
val valOf : {option 'a} -> 'a = fn x => head x
val isNone = fn opt => not (isSome opt)
none
⇒ type: {option 'tv55217}
⇒ []
some 44
⇒ type: {option int}
⇒ [44]
isSome (some 44)
⇒ type: bool
⇒ true
valOf (some 44)
⇒ type: int
⇒ 44
'a tree, and a function sumTree,
which adds up the values of all of the leaves in a tree of ints:
val emptyTree : {tree 'a} = pair none none
val leaf : 'a -> {tree 'a} = fn x => pair (some x) none
val node : {tree 'a} -> {tree 'a} -> {tree 'a} = fn c1 => fn c2 => pair none (some (pair c1 c2))
val isEmptyTree : {tree 'a} -> bool = fn t => and (isNone (fst t)) (isNone (snd t))
val isLeaf : {tree 'a} -> bool = fn t => isSome (fst t)
val isNode : {tree 'a} -> 'a = fn t => isSome (snd t)
val leafVal : {tree 'a} -> 'a = fn t => valOf (fst t)
val leftChild : {tree 'a} -> {tree 'a} = fn t => fst (valOf (snd t))
val rightChild : {tree 'a} -> {tree 'a} = fn t => snd (valOf (snd t))
val treeMatch = fn t => fn emptyCase => fn leafCase => fn nodeCase =>
if isEmptyTree t then emptyCase else
if isLeaf t then leafCase (leafVal t) else
nodeCase (leftChild t) (rightChild t)
val sumTree = letrec sumTree t =
treeMatch t
0
(fn leaf => leaf)
(fn left => fn right => + (sumTree left) (sumTree right))
in sumTree
node (leaf 3) (leaf 6)
⇒ type: {tree int}
⇒ ([],[(([3],[]),([6],[]))])
sumTree (node (leaf 3) (leaf 6))
⇒ type: int
⇒ 9
Syntax This is an informal description of the syntax for expressions: e1 e2 -- the function e1 applied to e2 (e) -- order of application if e1 then e2 else e2 -- conditional let var_name = e1 in e2 -- let expression letrec func_name var_name = e1 in e2 -- letrec expression [e1, e2, .., eN] -- listFunction application is implicitly left associative, so a b c d is short for ((a b) c) d.
Most of the syntax of TinyML is syntactic sugar for a simpler lambda calculus representation. In particular, for the purposes of evaluation and type checking, if-then-else expressions are converted into applications of cond, like this:
if A then B else C ⇔ cond A B C let expressions are converted into an applied annonymous function, like this:
let v = M in N ⇔ (fn v => N) M letrec expressions are converted into an application of the Y combinator, like this:
letrec f x = M in N ⇔ (fn f => N) (Y (fn f => fn x => M))List expressions are converted into applications of cons:
[1, 2, 3, 4, 5] ⇔ (cons 1) ((cons 2) ((cons 3) ((cons 4) ((cons 5) nil))))Value declarations can have optional type annotations: val var_name = e -- value declaration without type annotation val varname : type = e -- value declaration with type annotationAfter processing a value declaration without a type annotation, the environment contains the new declaration with its infered type. If an annotation is included, it overrides the stored type of the value. Providing inconsistent annotations can lead to runtime type errors that will not be found by the type checker, so use them carefully! This is an informal description of the syntax of types:
int | bool | a | b | c | blah -- primitive type identifiers
'a | 'b | 'blah -- type variable (any type identifier which starts with ')
(t) -- precedence
t1 -> t2 -- function (right associative)
{tycons a b c d ...} -- typeconstuctor with argument types, for example {list int}, or {product int it}
[a] -- shorthand for {list a}
Unlike function application, type precedence is implicitly right associative,
so a → b &rarr c &rarr d is short for a &rarr (b &rarr (c &rarr d)),
not ((a &rarr b) &rarr c) &rarr d.
Built-in Functions This is a list of the functions and values that are built in to TinyML (without loading any additional declarations). Note that all multi-argument functions are curried, so for example, + can be seen as a function that takes two numbers and
returns their sum, or it can be seen as a function that takes one number, and returns a function that takes another number and adds to the first number.
Issues Here are a few of TinyML's biggest flaws:
Acknowledgements The type inference code is based on the monomorphic inference code at: www.cs.cornell.edu/Courses/cs611/2006fa/hw/typeCheck.sml The evaluator and environment data structure are based on code from Harvey Mudd College CS 131, by Profs. Chris Stone and Melissa O'Neill, amongst others. The parser is based on Andreas Rossberg's parser for F+rec, from: www.ps.uni-sb.de/courses/info1-ws99/uebung/sml/inter+rec.zip Contact Author If you would like to contact me about TinyML, send an email to frostytrees+tinyml@gmail.com. My main site is www.laserpirate.com, and I also have a portfolio. Source Code The full source code for TinyML is listed below. If you download the source, you get additional examples and documentation.
(*
tinyml.sml - By Forrest Briggs - Dec 8., 2007
This structure defines a complete implementation of a small lazy functional language, including parser, evaluator and type-inference. The language has implicit left-associative application, curried functions, if-then-else, let, letrec and lambda. The type inference system supports let-polymorphism, but the implementation is simplified by restricting type inference to combinator expressions. A transformation from the full syntax to combinator expressions is provided, so the type inference system can be used with any expression. The type system supports int, bool, product and list by default, and can be extended with arbitrary recursive algebraic types by annotating values in code.
Acknowledgements: The type inference code is based on the monomorphic inference code at: www.cs.cornell.edu/Courses/cs611/2006fa/hw/typeCheck.sml. The evaluator and Env data structure is based on code from Harvey Mudd College CS 131. The parser is based on Andreas Rossberg's parser for F+rec, from http://www.ps.uni-sb.de/courses/info1-ws99/uebung/sml/inter+rec.zip
*)
(****************************************************************************
The Env structure is a map from strings to values.
****************************************************************************)
structure Env =
let structure StringKey : ORD_KEY =
struct type ord_key = string; val compare = String.compare; end
structure Map = BinaryMapFn (StringKey)
in struct
type 'a env = 'a Map.map
exception KeyNotFound of string
val empty : 'a env = Map.empty
val fromList : (string * 'a) list -> 'a env = fn args => foldl Map.insert' empty args
val toList : 'a env -> (string * 'a) list = Map.listItemsi
fun extend env s v = Map.insert(env, s, v)
fun lookup e x = case Map.find(e,x) of SOME v => v | NONE => raise (KeyNotFound x)
fun exists e x = (lookup e x; true) handle exn => false
end end
structure TinyML = struct
local
open TextIO
open Env
in
(****************************************************************************
datastructures for types, expressions, and values
****************************************************************************)
(* this datatype defines the types of expressions as defined by both c_expr and ml_expr *)
datatype ty =
TYCons of string * ty list
| TYVar of string
| Subst of ty ref (* this constructor does not appear in the fully resolved
types of expressions. it is used to assist type inference *)
fun TYIdent s = TYCons (s, [])
val TYInt = TYIdent "int"
val TYBool = TYIdent "bool"
fun TYPair (x, y) = TYCons ("product", [x,y])
fun TYList t = TYCons ("list", [t])
fun TYArrow (x, y) = TYCons ("->", [x,y])
(* combinator expression *)
datatype c_expr =
CEInt of int
| CEBool of bool
| CEVar of string
| CEApp of c_expr * c_expr
(* tiny-ml expressions and lazy values *)
datatype ml_expr =
MLValue of ml_value
| MLVar of string
| MLApp of ml_expr * ml_expr
| MLLambda of string * ml_expr (* fn x = e *)
| MLLet of string * ml_expr * ml_expr (* let x = e1 in e2 *)
| MLLetRec of string * string * ml_expr * ml_expr (* letrec fun f x = e1 in e2 *)
and ml_value =
VInt of int
| VBool of bool
| VPair of ml_value * ml_value
| VList of ml_value list
| VClosure of string * ml_expr * lazyval_memo env
| VBuiltIn of ml_value -> ml_value
and ml_lazyval =
VSuspended of ml_expr * lazyval_memo env
| VEvaluated of ml_value
withtype lazyval_memo = ml_lazyval ref
(****************************************************************************
string conversion functions for types, cexprs, ml_values, and ml_exprs
****************************************************************************)
fun ty2string t = case t of
TYCons (s,l) =>
(case (s, l) of
("->", [t1,t2]) => "("^(ty2string t1)^" -> "^(ty2string t2)^")"
| ("list", [t1]) => "["^(ty2string t1)^"]"
| _ => (case l of [] => s | f::r => "{"^s^" "^(ty2string f)^(String.concat (map (fn t' => " "^(ty2string t')) r))^ "}"))
| TYVar s => s
| Subst r => "=>"^(ty2string (!r))
fun c_expr2string ce = case ce of
(CEInt x) => Int.toString(x)
| (CEBool b) => Bool.toString(b)
| (CEVar s) => s
| (CEApp (e1, e2)) => "(" ^ (c_expr2string e1) ^ " " ^ (c_expr2string e2) ^ ")"
fun ml_value2string v = case v of
VInt i => Int.toString i
| VBool b => Bool.toString b
| VPair (v1, v2) => "("^(ml_value2string v1)^","^(ml_value2string v2)^")"
| VList l => (case l of [] => "[]" | _ => "[" ^ (ml_value2string (List.hd l)) ^(String.concat (map (fn e => ","^(ml_value2string e)) (List.tl l)))^ "]")
| VClosure (s,e,env) => "closure "(* ^s^" "^(ml_expr2string e)*)
| VBuiltIn _ => "builtin"
and ml_expr2string e = case e of
MLValue v => ml_value2string v
| MLVar s => s
| MLApp (e1, e2) => "("^(ml_expr2string e1)^" "^(ml_expr2string e2)^")"
| MLLambda (s, e) => "fn "^s^" => "^(ml_expr2string e)
| MLLet (s, e1, e2) => "let "^s^" = "^(ml_expr2string e1)^" in "^(ml_expr2string e2)
| MLLetRec(f, x, e1, e2) => "letrec "^f^" "^x^" = "^(ml_expr2string e1)^" in "^(ml_expr2string e2)
(****************************************************************************
polymorphic type inference on combinator expressions
****************************************************************************)
(* return a new type variable *)
val newTypeVar : unit -> ty =
let val counter = ref 0
in fn() =>
let val _ = if !counter = valOf Int.maxInt then (print "RAN OUT OF TYPE VARIABLES!... STARTING OVER at 0"; counter := 0) else ()
in (counter := !counter + 1; Subst(ref(TYVar("'tv" ^ Int.toString(!counter)))))
end
end
(* resolve one substitution *)
fun resolveSubst t = case t of
Subst(ref(TYVar _)) => t
| Subst(r as ref t') => let val t'' = resolveSubst t' in (r := t''; t'') end
| _ => t
(* resolve all substitutions *)
fun collapseSubsts t = case resolveSubst t of
TYCons (s,l) => TYCons(s, map collapseSubsts l)
| Subst(ref t') => collapseSubsts t'
| TYVar a => t
(* returns true if the type variable identifier a occurs in t *)
fun tyVarOccurs a t = case resolveSubst t of
TYCons (_,l) => foldl (fn (t',result) => tyVarOccurs a t' orelse result) false l
| Subst(ref (TYVar b)) => a = b
| Subst(ref t1) => tyVarOccurs a t1
| _ => raise Fail "Error"
fun zip nil _ = nil
| zip _ nil = nil
| zip (b::bs) (c::cs) = (b, c)::(zip bs cs)
exception CannotUnify
exception CircularTypes
(* attempts to unify the types t1 and t2. as a side effect, type variables are substituted
to accomplish the unification, or an exception is raised if the types cannot be unified *)
fun unify t1 t2 = case(resolveSubst t1, resolveSubst t2) of
(TYCons (s1, l1), TYCons (s2, l2)) => if not (s1 = s2) then raise CannotUnify else
if not (List.length l1 = List.length l2) then raise CannotUnify else
foldl (fn ((t1, t2), _) => unify t1 t2) () (zip l1 l2)
| (Subst(r as ref (TYVar a)), Subst(s as ref (TYVar b))) => if a = b then () else r := Subst s
| (Subst(r as ref (TYVar a)), t2) => if tyVarOccurs a t2 then raise CircularTypes else r := t2
| (t1, Subst(r as ref (TYVar a))) => if tyVarOccurs a t1 then raise CircularTypes else r := t1
| _ => raise CannotUnify
(* replaces all type variables in t with fresh type variables (but in such a way
that if the same variable tyVarOccurs multiple times in t, it is always replaced with the same new variable.
the variable s maintains a list of variables that have already been replaced *)
fun generalizeTyVars t = generalizeTyVars' t []
and generalizeTyVars' t s = case (collapseSubsts t) of
TYCons (x, l) => let fun generalizeTyVarList [] s = ([], s)
| generalizeTyVarList (f::r) s =
let val (f', s') = generalizeTyVars' f s
val (r', s'') = generalizeTyVarList r s'
in (f'::r', s'')
end
val (l', s') = generalizeTyVarList l s
in (TYCons(x, l'), s')
end
| TYVar i =>
(case List.find (fn (name, _) => name = i) s of
SOME (n, tv') => (tv', s)
| NONE => let val newTv = newTypeVar()
val s' = (i, newTv) :: s
in (newTv, s')
end)
| Subst _ => raise Fail "collapseSubsts should prevent this"
(* checks the type the expression e, in the type environment env (but does not resolveSubst subsitutions) *)
fun inferTypeNoSubst env e = case e of
CEInt n => TYInt
| CEBool b => TYBool
| CEVar x => let val (t, _) = generalizeTyVars (Env.lookup env x) in t end
| CEApp(e1,e2) => let val t1 = inferTypeNoSubst env e1
val t2 = inferTypeNoSubst env e2
val t3 = newTypeVar()
in (unify t1 (TYArrow(t2, t3)); t3)
end
(* use inferType to get the type of an expression.
calls inferTypeNoSubst, followed by collapseSubsts, which resolves substitutions *)
fun inferType(e, env):ty = collapseSubsts(inferTypeNoSubst env e)
(****************************************************************************
tiny-ml to combinator expression transform
****************************************************************************)
(* returns a list of variables that are introduced by lambda abstractions in le *)
fun varsBoundInExpr le = case le of
(MLValue _ | MLVar _ ) => []
| MLApp (e1, e2) => (varsBoundInExpr e1) @ (varsBoundInExpr e2)
| MLLambda (s, e) => s :: (varsBoundInExpr e)
| MLLet (s, e1, e2) => s :: (varsBoundInExpr e2)
| MLLetRec (f, s, e1, e2) => f :: (varsBoundInExpr e2)
(*transform let and letrec to equivelent forms with lambda and Y *)
fun simplifyLet s e1 e2 = MLApp (MLLambda (s, e2), e1)
fun simplifyLetrec f s e1 e2 = MLApp (MLLambda (f, e2), MLApp(MLVar "Y", MLLambda (f, MLLambda (s, e1))))
(* transforms a ml_expr into another ml_expr, but one which never uses the
MLLambda constructor. to finish the transformation, call ml_expr2c_expr *)
fun transformMLtoSKIBC le = case le of
MLValue v => MLValue v
| MLVar s => MLVar s
| MLApp(e1, e2) => MLApp(transformMLtoSKIBC e1, transformMLtoSKIBC e2)
| MLLet(s, e1, e2) => transformMLtoSKIBC (simplifyLet s e1 e2)
| MLLetRec(f, s, e1, e2) => transformMLtoSKIBC (simplifyLetrec f s e1 e2)
| MLLambda(s, e) => eliminate s (transformMLtoSKIBC e)
and eliminate varname expr = case expr of
MLApp(e1, e2) =>
let fun varIsInList (varname:string, l) = List.exists (fn x => x = varname) l
fun varIsBound v e = (varIsInList (v, (varsBoundInExpr e)))
fun varIsFree v e = not (varIsBound v e)
in if (varIsBound varname expr) then
MLApp(MLVar "K", expr)
else if (varIsFree varname e1) andalso (varIsFree varname e2) then
MLApp(MLApp(MLVar "S", eliminate varname e1), eliminate varname e2)
else if (varIsFree varname e1) then
MLApp(MLApp(MLVar "C", eliminate varname e1),e2)
else
MLApp(MLApp(MLVar "B", e1), eliminate varname e2)
end
| MLVar x => if varname = x then MLVar "I" else MLApp(MLVar "K", expr)
| (MLLet _ | MLLetRec _) => transformMLtoSKIBC expr
| _ => MLApp(MLVar "K", expr)
(* directly converts a ml_expr into a c_expr. will raise an exception if given
a ml_expression that uses a type constructor without a direct analog *)
fun ml_expr2c_expr le = case le:ml_expr of
MLVar v => CEVar v
| MLApp (e1, e2) => CEApp(ml_expr2c_expr e1, ml_expr2c_expr e2)
| (MLLambda _ | MLLet _ | MLLetRec _ ) => raise Fail "incompatible constructor"
| MLValue v =>
case v : ml_value of
VInt i => CEInt i
| VBool b => CEBool b
| _ => raise Fail "unimplemented"
(* convert a ml_expression to a c_expr by applying the transform rules *)
fun ml2ce le = ml_expr2c_expr (transformMLtoSKIBC le)
(****************************************************************************
tiny-ml evaluator
****************************************************************************)
exception RuntimeError
fun suspend venv (MLVar x) = Env.lookup venv x
| suspend venv e = ref (VSuspended (e, venv : lazyval_memo env))
and force lv = case !lv of
VEvaluated v => v
| VSuspended (expr, venv) => let val v = eval (venv, expr)
in lv := VEvaluated v; v
end
and eval(venv, MLValue v) = v
| eval(venv, MLVar x) = force (Env.lookup venv x)
| eval(venv, MLLambda (s, e)) = VClosure(s, e, venv) (* this is ineffiecient; it should only capture free vars in e *)
| eval(venv, MLLet (s, e1, e2)) = eval(venv, simplifyLet s e1 e2)
| eval(venv, MLLetRec (f, s, e1, e2) )= eval(venv, simplifyLetrec f s e1 e2)
| eval(venv, MLApp(e1, e2)) =
(case eval(venv, e1) of
VClosure(x, e', venv') => eval(Env.extend venv' x (suspend venv e2), e' )
| VBuiltIn f => f (eval(venv, e2))
| _ =>
raise (Fail ("Attempt to apply non-function: "^(ml_expr2string e1))))
(****************************************************************************
tiny-ml tokenizer
****************************************************************************)
datatype token = LPAREN | RPAREN | BOOL of bool | IDENT of string | INT of int | IF | THEN | ELSE | FN | LET | LETREC | IN | EQUALS | VAL | FNARROW | TYARROW | LBRACE | RBRACE | LBRACKET | RBRACKET | COMMA | COLON
(* convert a list of strings to a list of tokens *)
fun strings2toks toks = map (fn s => case s of
"(" => LPAREN | ")" => RPAREN | "true" => BOOL true | "false" => BOOL false | "if" => IF | "then" => THEN | "else" => ELSE | "fn" => FN | "let" => LET | "letrec" => LETREC | "in" => IN | "=" => EQUALS | "val" => VAL | "->" => TYARROW | "`>" => FNARROW | "[" => LBRACE | "]" => RBRACE | "{" => LBRACKET | "}" => RBRACKET | "," => COMMA | ":" => COLON
| s => (case (Int.fromString s) of SOME i => INT i | NONE => (IDENT s))
) toks
(* these functions split a string and convert the parts to tokens. call getTokens *)
fun tagMultiCharTokenDelims s = let fun matchArrow inChars outChars = case inChars of [] => outChars | #"-" :: #">" :: r => matchArrow r (outChars@[#"$", #"-", #">", #"$"]) | #"=" :: #">" :: r => matchArrow r (outChars@[#"$", #"`", #">", #"$"]) | f :: r => matchArrow r (outChars@[f]) in String.implode (matchArrow (String.explode s) []) end
fun tagSpecialTokens s = String.concat (map (fn c => case c of #"(" => "$($" | #")" => "$)$" | #"]" => "$]$" | #"[" => "$[$" | #"}" => "$}$" | #"{" => "${$" | #"=" => "$=$" | #"," => "$,$" | #":" => "$:$" | _ => Char.toString c) (String.explode (tagMultiCharTokenDelims s)))
fun isTokenDelim t = case t of #" " => true | #"\t" => true | #"\n" => true | #"$" => true | _ => false
fun getTokens s = strings2toks (String.tokens isTokenDelim (tagSpecialTokens s))
fun tok2string t = case t of IDENT x => x | INT x => Int.toString x | BOOL b => Bool.toString b | LPAREN => "(" | RPAREN => ")" | IF => "if" | THEN => "then" | ELSE => "else" | FN => "fn" | LET => "let" | LETREC => "letrec" | IN => "in" | EQUALS => "=" | VAL => "val" | TYARROW => "->" | FNARROW => "=>" | LBRACE => "[" | RBRACE => "]" | LBRACKET => "{" | RBRACKET => "}" | COMMA => "," | COLON => ":"
fun toks2string toks = String.concat (map (fn t => (tok2string t) ^ " ") toks)
(****************************************************************************
tiny-ml recursive descent parser
****************************************************************************)
fun matchToken(t', t::ts) = if t' = t then ts else raise Fail ("expected "^(tok2string t')^" before: "^ (toks2string (t::ts)))
| matchToken(t', []) = raise Fail ("expected "^(tok2string t')^", got nothing")
fun parseIdent(IDENT x :: ts) = (x,ts)
| parseIdent ts = raise Fail ("expected identifier, got: "^(toks2string ts))
fun parseToks(INT c :: ts) = (MLValue(VInt c), ts)
| parseToks(BOOL b :: ts) = (MLValue(VBool b), ts)
| parseToks(IDENT x :: ts) = (MLVar x, ts)
| parseToks(LPAREN :: ts) = let val (e,ts) = parseExp ts
val ts = matchToken(RPAREN,ts)
in (e,ts)
end
| parseToks(LBRACE :: ts) = let val (eList,ts) = parseExprList ts
val listExpr = foldr (fn (e,currList) => MLApp(MLApp(MLVar "cons", e), currList) )
(MLVar "nil") eList
in (listExpr, ts)
end
| parseToks ts = raise Fail ("expected expression, got: "^(toks2string ts))
and parseApp ts = let val (e,ts) = parseToks ts
in parseApp' e ts
end
and parseApp' e1 (ts as INT _:: _) = parseApp'' e1 ts
| parseApp' e1 (ts as BOOL _:: _) = parseApp'' e1 ts
| parseApp' e1 (ts as IDENT _ :: _) = parseApp'' e1 ts
| parseApp' e1 (ts as LPAREN :: _) = parseApp'' e1 ts
| parseApp' e1 (ts as LBRACE :: _) = parseApp'' e1 ts
| parseApp' e1 ts = (e1,ts)
and parseApp'' e1 ts = let val (e2,ts) = parseToks ts
in parseApp' (MLApp(e1,e2)) ts
end
and parseExp(FN :: ts) = let val (x,ts) = parseIdent ts
val ts = matchToken(FNARROW,ts)
val (e,ts) = parseExp ts
in (MLLambda(x, e), ts)
end
| parseExp(LET :: ts) = let val (x,ts) = parseIdent ts
val ts = matchToken(EQUALS,ts)
val (e1,ts) = parseExp ts
val ts = matchToken(IN,ts)
val (e2,ts) = parseExp ts
in (MLLet(x, e1, e2), ts)
end
| parseExp(LETREC :: ts) = let val (f,ts) = parseIdent ts
val (x,ts) = parseIdent ts
val ts = matchToken(EQUALS,ts)
val (e1,ts) = parseExp ts
val ts = matchToken(IN,ts)
val (e2,ts) = parseExp ts
in (MLLetRec(f, x, e1, e2), ts)
end
| parseExp(IF :: ts) = let val (e1,ts) = parseExp ts
val ts = matchToken(THEN,ts)
val (e2,ts) = parseExp ts
val ts = matchToken(ELSE,ts)
val (e3,ts) = parseExp ts
in (MLApp(MLApp(MLApp(MLVar "cond", e1), e2), e3), ts)
end
| parseExp ts = parseApp ts
and startParse ts = case parseExp ts of (e,[]) => e
| (e,extraTs) => raise Fail ("got extra tokens: "^(toks2string extraTs))
and parseExprList toks =
case toks of
[] => raise (Fail "expected more tokens in expression list")
| RBRACE::extraTs => ([], extraTs)
| _ => let val (e, extraTs) = parseExp toks
in case extraTs of
RBRACE::extraTs => ([e], extraTs)
| COMMA::extraTs => let val (restOfEs, extraTs) = parseExprList extraTs
in (e :: restOfEs, extraTs)
end
| _ => raise (Fail "expected , or ]")
end
(* call this function to parse a string to an ml_expr *)
fun parseMLExpr s = startParse (getTokens s)
handle exn =>
case exn of Fail e => raise Fail ("Error while parsing: "^s^"\n"^e)
| _ => raise exn
(****************************************************************************
predefined functions and their types
****************************************************************************)
(* make some type vars to use in our initial tenv types *)
val (tv1, tv2, tv3) = (TYVar "a", TYVar "b", TYVar "c")
val mainTEnv = foldl (fn ((s, v), env) => Env.extend env s v) Env.empty
[
("+", TYArrow(TYInt, TYArrow(TYInt, TYInt)) ),
("-", TYArrow(TYInt, TYArrow(TYInt, TYInt)) ),
("*", TYArrow(TYInt, TYArrow(TYInt, TYInt)) ),
("/", TYArrow(TYInt, TYArrow(TYInt, TYInt)) ),
("<", TYArrow(TYInt, TYArrow(TYInt, TYBool)) ),
(">", TYArrow(TYInt, TYArrow(TYInt, TYBool)) ),
("lte", TYArrow(TYInt, TYArrow(TYInt, TYBool)) ),
("gte", TYArrow(TYInt, TYArrow(TYInt, TYBool)) ),
("eqInt", TYArrow(TYInt, TYArrow(TYInt, TYBool)) ),
("and", TYArrow(TYBool, TYArrow(TYBool, TYBool)) ),
("or", TYArrow(TYBool, TYArrow(TYBool, TYBool)) ),
("eqBool", TYArrow(TYBool, TYArrow(TYBool, TYBool)) ),
("not", TYArrow(TYBool, TYBool) ),
("I", TYArrow(tv1, tv1) ),
("K", TYArrow(tv1, TYArrow(tv2, tv1)) ),
("S", TYArrow(TYArrow(tv1,TYArrow(tv2,tv3)),TYArrow(TYArrow(tv1,tv2),TYArrow(tv1,tv3))) ),
("B", TYArrow(TYArrow(tv1, tv2), TYArrow(TYArrow(tv3,tv1), TYArrow(tv3, tv2))) ),
("C", TYArrow(TYArrow(tv1, TYArrow(tv2, tv3)), TYArrow(tv2, TYArrow(tv1, tv3))) ),
("Y", TYArrow(TYArrow(TYArrow(tv1,tv2),TYArrow(tv1,tv2)),TYArrow(tv1,tv2)) ),
("nil", TYList(tv1) ),
("cons", TYArrow(tv1, TYArrow(TYList(tv1), TYList(tv1))) ),
("head", TYArrow(TYList(tv1), tv1) ),
("tail", TYArrow(TYList(tv1), TYList(tv1)) ),
("isempty", TYArrow(TYList(tv1), TYBool) ),
("pair", TYArrow(tv1, TYArrow(tv2, TYPair(tv1, tv2))) ),
("fst", TYArrow(TYPair(tv1, tv2), tv1) ),
("snd", TYArrow(TYPair(tv1, tv2), tv2) ),
("cond", TYArrow(TYBool, TYArrow(tv1, TYArrow(tv1, tv1))) )
]
fun builtin_iii f = VBuiltIn (fn v1 => VBuiltIn(fn v2 => case (v1,v2) of (VInt i1, VInt i2) => VInt (f i1 i2) | _ => raise RuntimeError))
fun builtin_iib f = VBuiltIn (fn v1 => VBuiltIn(fn v2 => case (v1,v2) of (VInt i1, VInt i2) => VBool (f i1 i2) | _ => raise RuntimeError))
fun builtin_bbb f = VBuiltIn (fn v1 => VBuiltIn(fn v2 => case (v1,v2) of (VBool b1, VBool b2) => VBool (f b1 b2) | _ => raise RuntimeError))
val mainVEnv = foldl (fn ((s, v), env) => Env.extend env s (ref (VEvaluated v)) ) Env.empty
[
("+", builtin_iii (fn x => fn y => x + y) ),
("-", builtin_iii (fn x => fn y => x - y) ),
("*", builtin_iii (fn x => fn y => x * y) ),
("/", builtin_iii (fn x => fn y => x div y) ),
("<", builtin_iib (fn x => fn y => x < y) ),
(">", builtin_iib (fn x => fn y => x > y) ),
("lte", builtin_iib (fn x => fn y => x <= y) ),
("gte", builtin_iib (fn x => fn y => x >= y) ),
("eqInt", builtin_iib (fn x => fn y => x = y) ),
("and", builtin_bbb (fn x => fn y => x andalso y) ),
("or", builtin_bbb (fn x => fn y => x orelse y) ),
("eqBool", builtin_bbb (fn x => fn y => x = y) ),
("not", VBuiltIn (fn VBool b => VBool (not b) | _ => raise RuntimeError) ),
("I", eval(Env.empty, parseMLExpr "fn x => x") ),
("K", eval(Env.empty, parseMLExpr "fn x => fn y => x") ),
("S", eval(Env.empty, parseMLExpr "fn f => fn g => fn x => f x (g x)") ),
("B", eval(Env.empty, parseMLExpr "fn f => fn g => fn x => f (g x)") ),
("C", eval(Env.empty, parseMLExpr "fn f => fn g => fn x => f x g") ),
("Y", VClosure ("f", parseMLExpr "let g = (fn x => f (x x)) in g g", Env.empty) ),
("nil", VList [] ),
("cons", VBuiltIn (fn v1 => VBuiltIn(fn v2 =>
case v2 of VList l => VList (v1::l) | _ => raise RuntimeError)) ),
("head", VBuiltIn (fn l => case l of VList (f::r) => f | _ => raise RuntimeError) ),
("tail", VBuiltIn (fn l => case l of VList (f::r) => VList r | _ => raise RuntimeError) ),
("isempty", VBuiltIn (fn l => case l of VList l => VBool (List.null l)
| _ => raise RuntimeError) ),
("pair", VBuiltIn (fn v1 => VBuiltIn(fn v2 => VPair(v1, v2))) ),
("fst", VBuiltIn (fn v1 => case v1 of VPair(p1, p2) => p1 | _ => (print ("fst called with non-pair "^(ml_value2string v1)); raise RuntimeError)) ),
("snd", VBuiltIn (fn v1 => case v1 of VPair(p1, p2) => p2 | _ => (print "snd called with non-pair"; raise RuntimeError)) ),
("cond", VBuiltIn (fn VBool b =>
VClosure ("e1",MLLambda ("e2",MLVar (if b then "e1" else "e2")), Env.empty)
| _ => raise RuntimeError) )
]
(****************************************************************************
parser for types with the following syntax:
int | bool | a | b | c | blah -- primitive type identifiers
'a | 'b | 'blah -- type variable (any type identifier which starts with ')
( t ) -- precedence
t1 -> t2 -- function (right associative)
{tycons a b c d ...} -- typeconstuctor with argument types, for example {list int}, or {product int it}
[a] -- shorthand for {list a}
****************************************************************************)
fun parseTypeBody(IDENT s :: ts) = if (List.hd (String.explode s) = #"'") then (TYVar s,ts)
else (TYIdent s,ts)
| parseTypeBody(LPAREN :: ts) = let val (t,ts) = parseType ts
val ts = matchToken(RPAREN,ts)
in (t,ts)
end
| parseTypeBody(LBRACE :: ts) = let val (t,ts) = parseType ts
val ts = matchToken(RBRACE,ts)
in (TYList t,ts)
end
| parseTypeBody(LBRACKET :: ts) = let val (x, ts) = parseIdent ts
val (tyl, ts) = parseTYList ts
in (TYCons(x, tyl), ts)
end
| parseTypeBody ts = raise Fail ("expected type, got: "^(toks2string ts))
and parseType ts = let val (t,ts) = parseTypeBody ts
in parseType' t ts
end
and parseType' t1 (TYARROW :: ts) = let val (t2,ts) = parseType ts
in (TYArrow(t1,t2), ts)
end
| parseType' t1 ts = (t1,ts)
and parseTYList toks =
case toks of
[] => raise (Fail "expected more tokens in type list")
| RBRACKET::r => ([], r)
| _ => let val (t,extraTs) = parseType toks
val (rest, extraTs) = parseTYList extraTs
in (t :: rest, extraTs)
end
fun parseTY s =
let val (t,extraTs) = parseType (getTokens s)
in case extraTs of [] => t | _ => raise Fail ("got extra tokens:"^(toks2string extraTs))
end
handle exn =>
case exn of Fail e => raise Fail ("Error while parsing ty: "^s^"\n"^e)
| _ => raise exn
fun testParseType s = ty2string (parseTY s)
(****************************************************************************
parser and evaluator for declarations 'val x = e or val x : ty = e'
****************************************************************************)
(* parse an expression of the form val x = e, or val x : ty = e. returns (x, e, ty option) *)
fun parseDecl(VAL :: ts) = let val (x,ts) = parseIdent ts
in case ts of
[] => raise Fail ("expected token VAL IDENT")
| EQUALS::r => let val (e, ts) = parseExp r
in ((x, e, NONE), ts)
end
| COLON:: r => let val (ty, ts) = parseType r
val ts = matchToken(EQUALS,ts)
val (e,ts) = parseExp ts
in ((x, e, SOME ty), ts)
end
| _ => raise Fail ("expected : or = after VAL IDENT")
end
| parseDecl ts = raise Fail ("expected val, got: "^(toks2string ts))
(* parse a list of tokens that contains a series of val declarations and return a list of them *)
fun parseDeclList ts = case parseDecl ts of
(decl,[]) => [decl]
| (decl, extraTs) => decl::parseDeclList extraTs
(* evaluate and typecheck a decl, and extend venv and tenv with the name *)
fun extendEnvsWithDecl venv tenv (x, e, annotation) =
let val result = eval (venv, e)
val venv' = Env.extend venv x (ref (VEvaluated result))
val ty = inferType (ml2ce e, tenv)
val ty = if isSome annotation then (valOf annotation) else ty
val tenv' = Env.extend tenv x ty
val _ = print ("declared "^x^"\t\t: "^(ty2string ty)^" --> "^(ml_value2string result)^"\n")
in (venv', tenv')
end
(* extend venv and tenv with all of the val declarations in the list decls *)
fun extendEnvsWithDeclList venv tenv decls =
foldl (fn (decl, (ve, te)) => extendEnvsWithDecl ve te decl) (venv, tenv) decls
(* call this function to parse a string to an ml_expr *)
fun parseDeclsAndExtendEnvs venv tenv s =
extendEnvsWithDeclList venv tenv (parseDeclList (getTokens s))
(****************************************************************************
read declarations from a file to a tenv and venv
****************************************************************************)
fun readFileToString filename =
let val file = openIn filename
fun readLinesFromFile inputFile =
if endOfStream inputFile then [] before closeIn inputFile
else inputLine inputFile :: readLinesFromFile inputFile
val lines = map (fn x => valOf x) (readLinesFromFile file)
val s = String.concat lines
(* convert newlines and tabs to spaces *)
val s' = implode (map (fn c => if c = #"\n" orelse c = #"\r" orelse c = #"\t" then #" " else c) (explode s))
in s'
end
fun getEnvsFromFile filename = parseDeclsAndExtendEnvs mainVEnv mainTEnv (readFileToString filename)
(****************************************************************************
tests
****************************************************************************)
fun run venv tenv s =
let val _ = print (s^"\n")
val mle = parseMLExpr s
val _ = print ("--> "^(ml_expr2string mle) ^ "\n")
val ce = ml2ce mle
(*val _ = print (c_expr2string ce)
val _ = print "\n"*)
val ty = inferType (ce, tenv)
val _ = print "--> type: "
val _ = print ((ty2string ty)^"\n")
val result = eval(venv, mle)
val _ = print ("--> "^(ml_value2string result))
val _ = print "\n\n"
in ()
end
fun runTest s = run mainVEnv mainTEnv s
val tests =
[
"+ 1 (* 3 4)",
"if not (and (< 1 2) (or true false)) then 4 else (+ 5 6)",
"(fn x => + x 2) 5",
"let x=3 in + 1 x",
"nil",
"cons 1 nil",
"(cons 3 (cons 2 (cons 1 nil)))",
"let L = (cons 3 (cons 2 (cons 1 nil))) in head L",
"let L = (cons 3 (cons 2 (cons 1 nil))) in tail L",
"let L = (cons 3 (cons 2 (cons 1 nil))) in isempty (tail L)",
"(((I I I I I 3)))",
"pair 1 2",
"fst (pair 1 2)",
"snd (pair 1 2)",
"[pair 1 2, pair 3 4]",
"letrec fact x = (if (eqInt 0 x) then 1 else fact (- x 1)) in fact",
"letrec fact x = (if (eqInt 0 x) then 1 else (* x (fact (- x 1)))) in fact 5",
"letrec length l = (if isempty l then 0 else + 1 (length (tail l))) in length",
"letrec y2 f = f (y2 f) in y2",
("let L = (cons 3 (cons 2 (cons 1 nil))) in "^
"let foldl = fn f => "^
" letrec folder acc = fn l => "^
" if (isempty l) then acc else folder (f (head l) acc) (tail l) "^
" in folder "^
"in foldl cons nil L ")
]
fun runTestN n = runTest (List.nth (tests, n))
fun runAllTests () = map (fn t => runTest t) tests
(*** test declarations ***)
fun testDecls() =
let val (venv, tenv) = (mainVEnv, mainTEnv)
val (venv, tenv) = parseDeclsAndExtendEnvs venv tenv "val x = 3 val y = + 1 2 val z = true val id = fn x => x"
val (venv, tenv) = parseDeclsAndExtendEnvs venv tenv "val inc = + 1"
val (venv, tenv) = parseDeclsAndExtendEnvs venv tenv "val length = letrec length l = (if isempty l then 0 else + 1 (length (tail l))) in length"
val (venv, tenv) = parseDeclsAndExtendEnvs venv tenv "val foldl = fn f => letrec folder acc = fn l => if (isempty l) then acc else folder (f (head l) acc) (tail l) in folder"
in run venv tenv (*"let L = (cons 3 (cons 2 (cons 1 nil))) in length L"*) "id id"
end
end end
|